Artificial intelligence is slowly being integrated into almost every aspect of our lives, and that includes medicine. In the last few years it’s become widely used in the field, particularly in administrative support and in diagnostics and imaging. To understand more, GBH Morning Edition host Mark Herz spoke with Marzyeh Ghassemi, an MIT computer science professor, about the promise and potential problems as AI becomes more important in medicine. What follows is a lightly edited transcript.
Mark Herz: So let’s start with some perhaps disambiguation. Are we talking about large language models (LLMs) here or not at all?
Marzyeh Ghassemi: I think LLMs are one kind of machine learning model that’s become very popular. And the reason that they’ve become really popular is there is a user-facing publicly available front end. So, if you wanted to interact with any other kind of AI model, you might have to have a little bit more knowledge. Having this publicly available front end I think has really brought LLMs into the forefront, but we’re talking about every different kind of deep model, which includes large language models.
Herz: You’ve done a lot of research looking at how the data that’s fed into medical-based AI might be introducing problems. So, what are those problems and how do they arise?
Ghassemi: The problems and how they arise are very deeply linked. We have a thing in machine learning called asperius correlation, and these are basically just shortcuts. Like, maybe you can’t see an animal so well on the horizon. You can’t tell if it’s a fox or a dog, but because you’re in a public park and it’s an urban area, you think it must be a dog. It can’t be a fox, right? In that case, you’re using background knowledge, not the actual features of the animal to make a decision. And models do exactly the same thing. The problem is the data that we’re feeding into these models has shortcuts that we are not aware of.
So for example, we know that African-Americans in the United States access health care often a lot later than other Americans. So what that means is on average, African-Americans might have pneumonia that’s more advanced than other Americans once they finally get a chest X-ray. Models don’t know that that association — worse pneumonia in only Black patients — is an undesirable thing. What they learn is that a Black patient has to be much sicker in order for them to predict that this is somebody with pneumonia. And this is just one example. This happens in all spaces. It happens in dermatology, with diagnosing skin cancer, or any space you can think of. The data we’ve fed in has these biases about the processes that were used to generate them.
Herz: So it sounds a little bit like a garbage in, garbage out situation. What’s the fix? Better data, more data, different kinds of data?
Ghassemi: I think there are different places we have to do these fixes. So the first is if we had perfect data that showed us perfect behavior, everything would be fine, right? We could just say, here’s a perfect doctor in a perfect healthcare system, do what they do. We don’t have that and we’re unlikely to ever get that. So instead of saying, “We just have to have perfect data,” which we’re unlikely to get, I think we need to pivot a little bit into thinking about how we train these models. Do you really want to optimize models (which is what we do right now) to mimic the behaviors that they’re seeing in humans? Maybe not. Maybe we need step away from this large amount of unsupervised data model and think a little more about where we want to inject human expertise and human supervision.
Herz: So what does that look like? What’s the change you’re talking about there?
Ghassemi: There’s a couple of ways we could do the change. So first, we can try to gather data sets that are gold standards for behavior. This is reasonably labor intensive because it would require experts, like doctors pouring over medical records, lawyers pouring over legal cases, teachers pouring over educational records. In each case, you’d have to label when a human expert doing a specific behavior was correct and that’s a desirable thing you want a model to learn. If we had something like that, we already have techniques in the machine learning toolbox with reinforcement learning to try to reward the model for being closer to ideal behavior, not just observed behavior.
However, there’s also some thought about deploying models in spaces where they’re generating content for humans to approve or they’re verifying human behavior. And that’s more of a policy question. As a doctor, as an expert in your field, how do you want AI to interact with you? Do you want it to create something that you then need to go back and edit the way that we’re seeing clinical notes now being generated with these speech to text models? Or do you wanted to suggest ways for you to be more efficient in your own note taking or highlight that certain things might be incorrect? I think one of the issues with health care more broadly and deployments we’re seeing right now is decisions about what kinds of models should be deployed are primarily focused on efficiency and money saving, not on making sure that doctors have more room for creativity, connecting with patients, or that they have improved resources or tools. I’d like to see a shift both in the way that we gather data and train these models in expert areas, but also a shift in how we choose what kind of areas these models are going to be applied to.
Herz: Well, what are the best uses of medical AI right now, or that you see coming down the pike that really would help doctors and would give patients a better experience in the end?
Ghassemi: I think some of the current solutions that look to automate really annoying or tedious parts of clinicians’ jobs, the intention is very good. It’s tedious to take a lot of notes. However, the actual deployments that we’ve seen that have measured improvements to clinical behavior or patient outcomes haven’t shown any benefit. So I don’t know that that’s where we want to actually focus a lot of our attention, even though that’s where maybe the most cost savings could happen.
Where I really see a lot of fantastic opportunity is identifying spaces where either humans don’t have a fundamental capacity, like early breast cancer detection where it’s a sub-clinical presentation. These are spaces where humans cannot do or have been proven not to be good at a very specific clinical task. And there, AI can really help close the gap. There’s a lot of examples in women’s health where we know that, for example, endometriosis has a very long diagnostic time. We know that there’s a lots of variation in chronic health conditions, for example, in diabetic care, where you sometimes have to stack different medications as you are diabetic for longer amounts of time. In those cases, you could imagine that AI systems can help you find ideal treatments for patients much sooner.