Researchers from IBM and Pfizer have published details on a new AI model that interprets written speech, which they claim can predict whether a person will develop Alzheimer’s seven years before they show symptoms.
The idea is attractive for its simplicity: The model’s only input is a written sample from the “cookie-theft picture description task,” a common cognitive test that asks participants to describe what’s happening in a drawing (three guesses what the drawing is of). Researchers trained the AI to pore over participants’ responses, picking up on hints of cognitive decline like repetition, misspellings, two-word sentences, and limited vocabulary.
Now, hold your applause: The model is in early days, and it isn’t any better than current cognitive assessments. The initial study—based on data gathered from just 270 Americans over the course of four decades—showed the AI could predict a future Alzheimer’s diagnosis 70% of the time. The commonly-used mini-mental state exam, by comparison, catches between six and seven out of 10 cases of mild cognitive impairment, a precursor to dementia.
But the AI model could still be useful as a smoother, faster way for physicians to monitor their older patients’ cognition over the course of years—especially if it’s validated in a more diverse population.
The biggest hurdle in treating Alzheimer’s and other forms of dementia is catching it early. By the time a person has developed full-blown dementia, their brain has suffered so much damage that it can’t be undone. And, in a catch-22, the reason it’s untreatable is because almost every dementia drug in the pipeline has failed in the last two decades, in part because they’ve been tested on people who were already too sick. (The one exception is Biogen’s drug aducanumab, which is expected to gain approval from the US Food and Drug Administration any day now.)
Finding a way to catch dementia patients earlier requires identifying subtle changes in a person’s cognitive function over time. The problem is that these changes are so slight, they’re hard to catch in a quick doctor’s visit. If IBM’s early work holds up, language monitoring could be a much easier way to flag cognitive decline early—which would get more people into clinical trials that could lead to successful therapies.
But that’s a big if. The findings are based on data from just a few hundred racially and geographically homogeneous participants. In order to find enough long-term data to train their model, the IBM and Pfizer researchers turned to the Framingham Heart Study, which has been tracking the health of residents from the eponymous small town in Massachusetts since 1948. Because the researchers needed data from people close to their 80s, the model draws on early cohorts of Framingham participants, who are mostly white people of European descent.
The way these participants speak and write surely differs from other groups of English speakers, so it’s hard to say whether the model will work as well across the general population. “Unless the research is repeated on at least one more culturally and linguistically distinct group, it is difficult to make any large claims, even if the cognitive deficiencies of Alzheimer’s disease are universally common,” said Monojit Choudhury, a principal researcher at Microsoft Research Lab India who specializes in language AI.
The researchers acknowledge the limitations of their data. “Our hope is that newly accessible datasets become available that expand on the geographical, socioeconomical and racial diversity of data on which we can continue to train our algorithms,” they wrote in a press release. An IBM spokesperson said the team is actively looking for more diverse datasets.
“We had to work with the data that was available. Framingham as a town has its language and ethnic distribution in its population, and we can’t change that after the fact,” said Ajay Royyuru, who heads IBM’s healthcare research.
But, he argued, the AI model’s simple design makes it easier to find more data. All the researchers need is the text from participants’ descriptions of the cookie-theft picture. They don’t need data from any other tests or participants’ medical histories.
“The data is actually going to be more easily accessible if you’re doing it through this means of analysis because all you need is written text,” Royyuru said. “That can be collected from a much more diverse population.”
Even so, the question is not whether the data would be easy to collect from a large, diverse population, but whether it already has been. There aren’t many other longitudinal studies on aging, and those that do exist have their own limitations.
The Einstein Aging Study, for example, follows 2,600 residents of the Bronx in New York City—a sample that’s likely to be more racially diverse, but still geographically limited. The Framingham study has made an effort to increase racial diversity by recruiting new cohorts in 1994 and 2003, but it will take years for long-term data to roll in from those groups. Most of the new participants are in their 30s, 40s, and 50s. If Alzheimer’s diagnosis wants a boost from language AI models, it’ll need a lot more data from a wider array of sources.