Can AI Exist in Medicine Without Human Oversight?

This transcript has been edited for clarity.

Eric J. Topol, MD: Hello. Welcome to this episode of Medicine and the Machine. This is Eric Topol, here from Medscape with my partner and great friend, Abraham Verghese. Today we have the delight of speaking with Melanie Mitchell, a professor at Portland State in Oregon and at the Santa Fe Institute in New Mexico, and one of the go-to's on artificial intelligence (AI). She may be the most clear-eyed person out there, and she's here today to give us the real skinny on what we know and what we don't know about AI. It's going to be a fun discussion. Melanie, welcome.

Melanie Mitchell, PhD: Thanks for having me here.

Topol: I read your book, Artificial Intelligence: A Guide for Thinking Humans, about a year ago, and now it's out in paperback. It provides a kind of grounding force on how AI can help or potentially hurt. Before we get into the medical and healthcare applications of AI, where are we in the field in general?

SUGGESTED for you

Mitchell: There has been a dramatic rise in progress in AI over the past decade or so, largely due to what are called deep neural networks, deep learning, where "deep" in this sense means the number of layers in a neural network. Neural networks are roughly modeled on the visual cortex in the brain, which is arrayed in a series of layers, and "deep" means the number of layers of simulated neurons. This, combined with very fast parallel computation and lots and lots of big data for training, has produced machines that can accomplish many tasks very well, such as speech recognition, object and image recognition, facial recognition, even powering autonomous vehicles, language translation tasks, and robotics.

The improvement in the field has been dramatic, but AI is still quite narrow: Each machine can do its own narrow task, but no machine has the kind of general intelligence that can take knowledge in one domain and transfer it to another domain, which is the hallmark of human intelligence. So I would say that AI is doing quite well on the task front, but in terms of general intelligence, we're still quite far away.

Topol: One of the sentences in your book that grabbed me was "Deep learning is due less to new breakthroughs in AI than to the availability of huge amounts of data. Thank you, Internet and very fast parallel computer hardware." We have a deficiency of large, annotated data sets in healthcare, as opposed to facial recognition and most of the other current uses for AI. Do you think that is a real bottleneck? Is that what's going to hold us back?

Mitchell: That's one of the bottlenecks, I would say. Deep neural networks — the thing that's given AI its oomph in the past decade — relies on large, labeled training sets. Many areas in medicine and other fields don't have that, so they can't take advantage of AI.

But there are some areas of medicine in which, as you know, AI is showing a lot of progress. I read an article today about diabetic retinopathy and the deep neural networks that are able to spot that better than human physicians. I don't know exactly what "better than" means, but some pockets of AI applications are really promising in medicine. But you're right — medicine doesn't yet have what's called supervised learning, where the machine is given tens of thousands of examples that are labeled by a human. Without that, it's hard to apply these new deep learning techniques.

Topol: For our listeners, a device and an algorithm have been approved for diagnosing diabetic retinopathy in grocery stores, where an untrained person can scan a person's retina and — with a cloud algorithm and deep learning, tested in what was the first prospective trial of AI in medicine — get a very accurate diagnosis of diabetic retinopathy. With half of people with diabetes never getting screened for diabetic retinopathy, that's definitely an advance.

Abraham Verghese, MD: I really enjoyed your book. It was a wonderful introduction to AI for someone who's not very computer literate. But your journey was the most fascinating part of it, the way that you came to AI yourself, having not been a computer scientist. Maybe before we go much further, you can talk about your own personal journey through this field.

Mitchell: My journey is different from most people's in computer science, who start out being obsessed with computers and video games and whatnot. I was never interested in computers at all. I was a mathematics major in college. I didn't know what I wanted to do, but I read Douglas Hofstadter's book Godel, Escher, Bach: An Eternal Golden Braid, which was published in 1979. It was an amazing work of nonfiction. It's about the emergence of consciousness or self-awareness, or understanding from a substrate of neurons in which there is no consciousness or self-awareness in an individual neuron. But somehow, put together, you get the emergence of this fantastic phenomenon we call intelligence. That book made me decide that I really wanted to work in AI with Hofstadter. I persistently tried to convince him to let me join his research group, which he finally did, and I ended up doing a PhD with him. Without having taken a computer science course before I got to graduate school — I don't think you can do that anymore — I managed to get a PhD in computer science.

Verghese: I think the fact that you were not a computer scientist when you started lends something to the book. You have a wonderful way of telling the story without assuming too much of your listeners and yet challenging us.

I reviewed a paper recently that was about trying to set up an ethical framework for every future AI application in medicine. The point of the paper was that too often we come to the ethical implications after the fact, rather than prospectively considering the ethical implications at every level from the nature of the researchers to the nature of the database, the dataset, and so on. Maybe we can dig into that, because I think every time people discuss machine learning in medicine, one of the big concerns is the lack of transparency for the clinician who is presented this diagnosis of diabetic retinopathy with no ability to question it in the way we might question a CT scan or a biopsy report or something like that.

Mitchell: The whole area of ethics and AI is exploding, with people thinking about the complexities of it. It's not a simple matter of making sure the data don't have some kind of bias. For example, with facial recognition, which is a very prominent area of AI application, we're seeing that with these machines, which learn from human-labeled data that's often been downloaded from the Web, these algorithms can have bias against non-White, non-male faces. It's not just a matter of fixing the data; that kind of bias goes way down into the depths of the way the data are collected and even the technology of cameras that favor lighter-skinned people in terms of how the image is processed, how the data are collected, how the people who are building the algorithms actually go about dealing with the data. I think that's true in the medical areas as well. How to make sure these machines aren't picking up all of the bias that goes into human society is a complex dilemma. I don't think anyone has the answer.

One of the possible solutions I've seen is to train the machines to have our kind of ethics, our moral values. But that brings up the question of what our moral values are. How do we believe in what our own ethics are and how do we train a machine to do that when it's so hard to train a machine to have a broad human concept in any domain? How do we train machines to have moral concepts? How do we make these machines into moral philosophers? It's quite difficult. Right now we have to have humans in the loop. Machines just don't have the ability to make these decisions autonomously without humans being able to make sure in some way that they're not biased or making some kind of mistake that's very unhuman-like. That also makes it difficult to talk about how to certify a machine as unbiased or trustworthy. I think that's the biggest obstacle to letting the machines be autonomous, even in the case of diagnosing diabetic retinopathy. Machines are not ready yet. They're not smart enough to be allowed to be autonomous, and we tend to trust them more than we should.

Topol: That's a critical point. Our podcast is called Medicine "and" the Machine, not "vs" the machine. Ever since Garry Kasparov and IBM's Deep Blue chess match, it's been man vs machine, when in fact, in medicine, as you aptly pointed out, Melanie, you still need to have oversight. In fact, you need to have training to understand the nuances. Reading your book might be a good start for lots of physicians, but certainly in the curriculum of medical schools, it would seem to be important to understand why oversight is so crucial, such as before you sign off that this treatment is indicated, this diagnosis is correct, that kind of thing.

The New York Times recently published an article, "Can We Make Our Robots Less Biased Than We Are?" One of the things that drives me a little cuckoo about computer scientists and AI is that for every AI problem, they have an AI solution. So if it's privacy, check AI; security, check AI. You name it — bias, ethics, all using AI. You kind of touched on this but I wonder if you could go a little deeper. To be able to detect through the network that there's bias when it's almost all embedded in our human culture? I don't see exactly how that's going to work. Can you help explain that a little more? Because it seems like it's asking too much of deconstructing a network.

Mitchell: One of the things about deep neural networks is that they're very opaque. Somebody described them as like a pile of linear algebra because they're a bunch of numbers or vectors, if you like that term, that are being processed and you can have millions, even billions, of parameters — weighted connections in these networks — that are really hard to examine and understand what's going on. It would be like if I sliced open the top of your head and looked at your neurons firing and tried to understand how you're thinking and how you made a decision.

Obviously that's not the right level at which to understand such a thing. No one has been able to figure out exactly how to get these networks to either explain themselves or be more transparent about how they're making decisions. That's one of the problems. And, as you say, when you have a hammer, everything looks like a nail. For people in computer science, their hammer is technology and so they think, okay, we just have to apply more technology to solve this. Let's get AI systems that look at neural networks and figure out how to explain them, or let's just look at AI systems to determine what the bias is. But I think people are recognizing more and more that it's not just a technical problem, it's a problem of society. It's a humanities problem. It's a problem of understanding human society, human nature, as much as it is of technology. There are all kinds of new nonprofit institutes cropping up, trying to bring together people from technology, and also from the humanities and the social sciences, to really think about these problems.

I think that, as you said, in the future, physicians, medical students, anyone who's using these kinds of technological systems for human problems are going to have to learn how to be critical thinkers about this technology and also about the interaction with society.

Topol: No question. One of my many favorite quotes in your book was by Mitch Kapor, the Lotus 1-2-3 guy. "Human intelligence is a marvelous, subtle, and poorly understood phenomenon. There is no danger of duplicating any time soon." And the fact is, what we're talking about here is that we need human intelligence to screen these algorithms. So, for example, the classic example of medical bias is the Optum study, published in Science in October 2019. There was discrimination against Blacks because they weren't using as many medical resources as Whites. [Black persons were getting low risk scores for many chronic conditions not because their risk was lower but because the Optum algorithm was basing its findings, in large part, on bills and insurance claims.] It is a lesson on how vital it is for human intelligence to kick in from the outset and not rely on some algorithm that would be so widely disseminated and could do so much harm.

Mitchell: Common sense is an ill-defined term, but we kind of know what it means. It has now become a buzzword in AI — what machines lack is common sense, and people are talking about how we can define this. Machines can make statistical correlations, but they're lacking this underlying notion of how the world works, and your example is a great one for that. Anyone with common sense would not make those kinds of errors. How do we give machines the kind of knowledge that even a young child has? That's the paradox of AI. We have machines that can beat the best "Go" player in the world, but it's still a challenge to give a machine the common sense of an 18-month-old baby. In fact, that precise project is being carried out by DARPA (Defense Advanced Research Projects Agency), which funds a lot of AI. They call it Foundations of Human Common Sense, and they're trying to get machines to mimic the developmental trajectory of babies. That is a great example of the paradox we all face.

Verghese: Just to turn that on its head, it strikes me that we're living at a time when at the highest levels, we're seeing a dearth of intelligence in terms of our responses to COVID. So on the one hand, we have tremendous scientific breakthroughs and clarity of understanding of a disease, and then we have a social disorder at the very highest levels. Do you see a role for AI to be a sort of broker for society so that opinions don't become facts? Can a machine help us sift through all of this and get to what the facts really are, as opposed to the opinions that are causing deaths every day because they're unfounded and they're incorrect much of the time?

Mitchell: That's a great question. The humans have common sense, but there's a dark mirror of that, which is human cognitive biases, which can get us into trouble. We see that all the time in our society, especially now with so much distrust of science, which is at the root of a lot of the problems we're seeing. Can AI fight that or help solve that? I'm dubious. There's been a lot of talk, again, of using the hammer of technology for all these problems. "Let's use AI to combat misinformation." That's Facebook and Twitter; they're all trying to do that. But I think it's a problem that transcends technology and is beyond what AI can do now. I believe we need a more humane solution to that problem, but I don't know what it is.

Topol: Can you tell the story of Clever Hans?

Mitchell: Clever Hans was a horse, sometime in the early 1900s in Germany, who was supposedly able to do arithmetic. The trainer would give Clever Hans an arithmetic problem and Clever Hans would pound out the solution with its hoofs. It turned out that Clever Hans was actually picking up on subtle body signals from the trainer; Clever Hans couldn't do arithmetic but he could pick up the sensitive signal body language of the trainer who was unconsciously somehow giving clues to the solution. People in AI have now taken this metaphor of Clever Hans and said that sometimes what the AI systems are doing is learning subtle clues that aren't solving the problem we want them to solve, but they give the impression that they are solving it.

There have been some examples in the medical field. For instance, you get a machine learning system to try to distinguish x-rays that have some pathology from x-rays that don't have the pathology. But it turns out that the x-rays with the pathology have some weird statistical anomaly or clue that they're from a different x-ray machine or there's a different setting being used that is not apparent to humans, but that the machine is able to pick up on to make the predictions.

Topol: I'm familiar with an example of that. It might not quite fit into the Clever Hans model, but they scanned skin lesions that were potentially cancerous. The AI system had remarkable accuracy, but it turned out that was because only the cancerous ones had a ruler in the image. The cancerous lesions were picked up because the ruler was there.

Verghese: On the other hand, I think AI has done a good job of predicting mortality, sometimes better than physicians who tend to overestimate the potential success of a treatment or the 30-day mortality. That's another example where the machine is unemotional in a sense and doesn't mix our intrinsic hope and wish for the patient with an inflated sense of how well they will do on this therapy or how much life they have left. Those kinds of applications are interesting to me because they counter our bias as opposed to making a clever diagnosis. They also are sobering because this is a very human interaction otherwise, patient and physician. I believe we need more objectivity in some of those gray areas, such as predicting mortality and so on.

Mitchell: That's a great point. Machines have biases. Humans have biases. They're not exactly the same. So if they work together, they can actually do better than either one of them working apart. But the human has to have a good understanding of what the machine's biases are, if it's possible, and the human's own self-awareness. This human-computer interaction area is key for us to understand more theoretically how we humans and machines can complement each other.

Topol: Abraham's example is notable because, on the one hand, you can override some of the biases, but on the other, the physician might pick up on the fact that a patient has the will to live. No algorithm is going to pick up things like that. So it goes both ways. This synergy between what machines can bring into medicine and what the human factor can enrich truly makes the sum of the parts greater than either alone; but that synergy still isn't accepted. We have to keep working on it.

Mitchell: One of the problems is that machines have been hyped a bit too much, they've been oversold. This leads people to trust them too much to be autonomous, to not need that human factor, but then ends up disappointing people so they say, "Well, machines can't be trusted at all." I see that as a big problem for adopting machines in a field like medicine. "How can I trust this machine when it's made these errors? I was sold on this idea that it was going to do much better than it would." I think IBM Watson was one example of that. It raised expectations too high. I think we have to figure out exactly how we can represent what the machines can actually do, what their limitations are, and what their role should be. That's on us, the AI people, to make much clearer. I don't think we've done a great job of that.

Verghese: I believe the burden is on our side to deal with whatever AI presents to us as a recommendation, just as the recommendation of a serum sodium level is just a number that informs us. It's not a manifest instruction to do something. Ultimately that responsibility should remain with us. And as you say, the hype on AI and machine learning is such that, too often, what is spit out is taken as being gospel when it really is just another factor that you need to weigh in your decision-making.

Topol: Speaking of hype, in your book, I connected with your discussion of Ray Kurzweil and The Singularity Is Near. Perhaps no one has hyped AI, and the idea that this will be the transformation of life for the future and the singularity, more than Ray. Can you add a little color to that?

Mitchell: Ray Kurzweil has been promoting AI for decades. In one of his first books, published at least 30 years ago, he predicted that AI would reach human-level intelligence by a certain date — I think it's in 2029. But he goes beyond that and says it will be billions of times smarter than humans by 2045. He calls out the singularity, when machine and human intelligences merge. This idea goes back to science fiction, where machines are able to improve themselves, and that goes into a cycle where they get smarter and smarter and eventually they become billions of times smarter than we are. Kurzweil has written a lot of books that have been popular. No serious researcher in AI believes this, but it has infiltrated the popular concept that we're on the cusp of having machines that are smarter than we are, and this is going to be an existential threat to us, to our culture, our society, our civilization. Any number of books have riffed on that idea, but I don't believe that's a very serious concern. Andrew Ng, who's a well-known AI pioneer, said that worrying about singularity now is like worrying about overpopulation on Mars. It's just not the biggest problem with AI right now.

Natural Language, Gameshows, and Medicine

Verghese: Your book helped me understand natural language processing. Here we are in 2020, and with most clinicians I know, for every hour we spend with the patient, we spend another 2 hours on the electronic medical record, not to mention another hour at night with our inbox. It seems to me that we have all these brilliant breakthroughs with AI and retinopathy, but the area of greatest need is to have someone seamlessly capture what we do so that we can get on with the job. I know both you and Eric are aware of many different applications and advances that may be coming soon, but would you talk about the advances in natural language processing that may benefit the practice of medicine?

Mitchell: Natural language means human language, and there have been some advances that already benefit medicine, such as speech to text, where you can transcribe your medical records by dictating them. My brother used to be a medical transcriber. He used to listen to doctors who dictated a recording of their notes and he would transcribe them, but he's been put out of a job by natural language processing systems that do the same thing. That's been a big advance and surprisingly has worked very well using brute-force statistical methods. These methods don't actually understand speech, but they transcribe it. The same is true with translation between languages. There are a lot of great translation systems that work pretty well and they've been developed with brute-force statistics.

But I think what you're asking for are applications that could do something more, that could actually understand the language. Sort of what IBM Watson was supposed to do: read the medical literature and answer your questions about it because you don't have time to read all the papers that are published. That's turns out to be harder than translation or speech-to-text, because it requires some kind of understanding. That's lacking in these other systems now. It is hard to pin down exactly what that understanding is. Being able to make sense of an academic paper, to be able to summarize it or answer your questions about it, would be incredibly useful, but it's something that AI is still fairly far away from.

Verghese: MD Anderson had an experience with the IBM Watson that was far from successful, but it seems to me they might have anticipated the problems or at least gotten further along before they tried to implement it on that scale, don't you think?

Mitchell: Watson was promoted so widely. It was the system that won Jeopardy! but it turned out that the ability to do well at that game depended a lot on the fact that the questions were so well posed and also that so much training was available from previous games. Compare that to the real world; in medicine or law or any of these fields, questions and answers are so much more open-ended. The questions are not as well posed and there's not a clear answer all the time. So it's much, much harder than people expected it to be. That sentence is the history of AI: It's much harder than people expected it to be, for everything you want to talk about.

A Replication Crisis

Topol: Speaking of harder, replications become more of an issue in medicine. Earlier this year, researchers from China published a mortality model that claimed that three biomarkers — C-reactive protein, the lymphocyte count, and lactic dehydrogenase — could predict COVID mortality with an accuracy over 90%. But in the past week, in Nature Machine Intelligence, three reports from the United States, the Netherlands, and France said that's a bunch of hooey; we can't even get 50% accuracy with those markers in our patients. So that's a replication issue, one that's really intriguing. Researchers from MIT just published [research showing that] with forced cough recordings in people without symptoms, you could detect from the cough, with 98% accuracy, whether a person had COVID or not. The study involved over 4000 people but was without a replication. What do you make of all that, Melanie? Replication has been a big issue in medical research and in social science. Now we're starting to see it crop up in AI in medicine and health.

Mitchell: People are starting to talk about the replication crisis in AI. We've heard about replication crises in other fields, but it's now an issue in AI because it's so hard to replicate a lot of the work that's been done. It's like any other experimental field: You don't think about doing something on a computer as being an experimental process, but it is. There are so many different parameters in these systems, and in the data and the way the data are processed, that often researchers make some assumptions that they don't put in the paper and that they don't even realize are there. This makes it hard to replicate.

Nowadays, in order to publish in certain journals, you have to give your code. You have to give all the details of your data, how you process your data and everything, because of this problem of replication. It's a big issue. One problem is that in computer science, we're never trained in experimental science. We're not trained with statistics or how to do replicable science.

Topol: That's a really important point.

Verghese: I have a question for both of you. It seems like many things that were presented as barriers to AI have been overcome. What do you think is coming down the pike? Where will we be 10 years from now with AI and medicine?

Mitchell: One of the most important areas right now is this idea of learning in a more humanlike way. We talked about supervised learning, where you have to have lots of human-labeled examples. Clearly, that's not how humans learn. In AI people talk about this idea of transfer learning; that's kind of a buzzword that means being able to take what you learned in one domain and apply it in a different domain that you haven't learned about. In "humanspeak," that's what we call learning. That's what learning is. If I learn about something in a narrow domain and I can't apply it anywhere else, I haven't really learned the abstract concepts, the fundamental concepts. This idea of getting away from supervised learning and being able to transfer what I've learned to another domain, that's really key. That's something I hope we will make a lot of progress on in the next 10 years. It is a big focus of research right now. There's also an area called meta-learning, which has to do with learning how to learn and is also promising. These things are just at the beginning. Another is being able to understand and to quantify how trustworthy a system is, what its strengths and vulnerabilities are, and being able to communicate that to non–computer scientists. That's going to be key and is something we don't yet know how to do.

Verghese: What do you predict, Eric?

Topol: I'm always the optimist and always have to reset on reality. But I'm hoping that imminently we can eliminate keyboards. That would be the singular big advance in the next couple of years that will change our time with patients and allow us to truly be present, as you've written about so eloquently, Abraham. Longer term, the great ability of AI could be that through data and algorithms that are validated to the hilt, replicated, and under continuous surveillance, hospitals will not be like they are today, where people are sitting in hospital rooms when they could just as well be at home. These advances should hit in the next several years. I'm excited about this, but it's going to be a long struggle to prove these advances.

We've relied too heavily on in silico demonstrations and not on a real-world setting. Go back to diabetic retinopathy, which is one of the leading advances. When that was studied in silico, by four different groups, the accuracy was as high as 98%-99%. When they did a prospective trial — the first, by the way, in medicine, in the real world — the accuracy was good, but it was closer to 90%. That's what we have to be cognizant of, that there's going to be a drop-off of these pristine datasets. And that's why I turn back to Melanie. If we only can do supervised learning, we're going to be held back. Do you think we can transfer to these new ways of learning? Can we make this jump imminently?

Mitchell: I think within the next decade, we'll see a lot of progress. I don't think we're going to get to what we might call human-level intelligence, but I do think we'll see some progress on making machines more trustworthy, more able to do this kind of transfer, more able to learn in a not-so-supervised way, however that pans out.

I'm optimistic like you. I think there are some amazing potential applications for AI. A lot of people ask, why even do this? AI could be so harmful and we've seen so many bad things about it. Why do this research? I think that true, but there are also some amazing possible benefits. Medicine is one of the main areas in which we'll see that in the near term. There are so many problems with people in areas that don't have the kind of healthcare we have in a big city, for example; or people who would be much better off at home than in a hospital; or elderly who can't access care the way younger, healthy people can. There's a lot of potential for AI systems to assist doctors, not to replace them in any sense. I think we'll get a lot more AI as assistance, to help physicians broaden their ability to care for people, and I'm quite optimistic about that.

Topol: One other point deserves mention. When we talk about medicine and the machine, we don't give enough respect to patients and the machine. We always think of it as the doctors and nurses and other clinicians and not the ability for people to generate their own data, whether through sensors, access to their electronic records, and all the different modes for getting data on themselves. The support of algorithms could be a big change in the future that I think Abraham was getting to.

This has been a fun discussion. You've pointed out these areas of concern but also areas of optimism for deep learning and deep neural networks as we go forward in the rest of our world, and especially where it can take us in medicine. Thanks so much for being with us, Melanie.

Verghese: I just want to recommend your book to our listeners. It's a beautiful introduction and a deep dive into AI for people who are not computer scientists. Great to have you on this podcast.

Eric J. Topol, MD, is one of the top 10 most cited researchers in medicine and frequently writes about technology in healthcare, including in his latest book, Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again.

Abraham Verghese, MD, is a critically acclaimed best-selling author and a physician with an international reputation for his focus on healing in an era when technology often overwhelms the human side of medicine.

Melanie Mitchell, PhD, is an award-winning author and a professor at the Santa Fe Institute and Portland State University. She helped launch the Santa Fe Institute's Complexity Explorer platform, which offers online courses, tutorials, and other resources related to the field of complex systems. More than 25,000 students have taken her online course "Introduction to Complexity."

Follow Medscape on Facebook, Twitter, Instagram, and YouTube

Can AI Exist in Medicine Without Human Oversight?

Natural Language, Gameshows, and Medicine

A Replication Crisis

Tables

Authors and Disclosures

Authors and Disclosures

Authors

Eric J. Topol, MD

Abraham Verghese, MD

Melanie Mitchell, PhD

Comments

Email This

Feedback