At its core, Facebook is a giant repository of information about the people who use it, from their age and phone number, to their music preferences, to their shopping habits, and everything else they’ve added to the platform over the years. And from Facebook status updates alone, it may also be possible to help determine whether users have certain diseases, a new study has found.
The study, published June 17 in the journal PLOS One by researchers from the University of Pennsylvania, found that using Facebook statuses improved how accurately demographic information about the user could predict disease in 18 disease categories—especially diabetes and various mental health conditions. In other words, if a patient’s age and gender suggested they were at risk of a disease, Facebook statuses with certain words in them made that prediction more accurate.
“Because such content is constantly being created outside the context of health care systems and clinical studies, it can reveal disease markers in patients’ daily lives that are otherwise invisible to clinicians and medical researchers,” the researchers wrote about the potential applications of their work.
The Facebook statuses were particularly adept—and better than demographic information on its own—at predicting diabetes, pregnancy, anxiety, psychoses, chronic pulmonary disease, sexually transmitted disease, and drug abuse, researchers discovered. They also helped predict alcohol abuse and depression, but were terrible at predicting obesity.
For some of the diseases, the words that users used in statuses were quite obvious: alcohol abuse was indicated by words like “drink,” “drunk,” bottle.” For depression, it would be words like “pain,” “crying,” “tears” but also “stomach,” “head,” “hurt” (indicating “somatization,” or the physical expression of psychological conditions) For diabetes, however, the words were less clear: the disease was predicted by religious language (god, family, pray), even when controlling for demographics. The top 25% of patients who mentioned those topics were 15 times more likely to be diagnosed with diabetes than the bottom 25% of those who mentioned them. “This association may be specific to our patient cohort and suggests the potential to explore the role of religion in diabetes management or control,” the researchers wrote.
There is a big caveat to the study, which the researchers acknowledge. There were only 999 participants, and they skewed largely female (76%) and African American (71%)—so the sample was far from representative of the general population. Facebook currently has over 2.38 billion monthly active users, hailing from just about every country and background in the parts of the world connected to the internet. Those in the study also had to be somewhat active Facebook users. In order for models like the one the researches built to have powerful predictive capabilities, they would have to have a lot more data across many different demographics, and probably other social networks as well (different groups use different social networks and in varying ways).
There are also concerns about how models like these could be used, and Facebook’s own research shows what could happen. On the one hand, they could be used as a force for good: Facebook already uses data from its platform to determine whether a user is having suicidal thoughts, and directing the user to help (although public health experts have questioned this approach as well, particularly the question of user consent). But in the past, the company also landed in hot water after Facebook was discovered showing advertisers that it could determine when teenagers were feeling anxious or insecure.
Facebook also raised some eyebrows last year on the news that it had been planning a partnership with hospitals to pair medical data with Facebook’s own. While that partnership could’ve helped in some ways (for example, it could determine that users didn’t have nearby friends who could help care for them), patient privacy is highly regulated in many countries, and Facebook, which has faced several privacy scandals recently, does not have the best record when it comes to keeping user information secure.
The researchers also noted that mining the “social mediome,” which they compare to the human genome, could introduce challenges with communicating risk to patients, just like there are with genetic testing, where genetic information is at times misinterpreted.