How to tell whether a Twitter user is pro-choice or pro-life without reading any of their tweets

Your network speaks louder than your words.
Your network speaks louder than your words.
Image: Reuters/Jim Young
We may earn a commission from links on this page.

While the US government narrowly avoided a shutdown over continued funding of Planned Parenthood last week, a group of activists coalesced online when Seattle-based writer Lindy West posted the following Tweet:

In response, women began sharing their abortion stories publicly, generating a stream of backlash as pro-life advocates responded by creating their own hashtag—#ShoutYourAdoption.

By analyzing more than 100,000 tweets published throughout this heated debate, we’ve identified how the movement spread, initially through the feminist and conservative communities and then to a dense cluster of users labeling themselves as #Gamergate. Additionally, we show how we can accurately infer a user’s views on abortion using their connections to other users—without ever reading their tweets or profile.

Below is an hour-by-hour visualization displaying the spread of #ShoutYourAbortion and #ShoutYourAdoption. Each point represents a user. Points are connected by lines where one user has chosen to follow the other. Color represents communities: groups of densely interconnected users.

Image: Brian Clifton, Emma Pierson, Gilad Lotan

Spreading through communities

When @thelindywest first tweeted about #ShoutYourAbortion on the afternoon of Sept. 19, it was shared mostly amongst her followers on Twitter. It wasn’t until the following day that the hashtag began to spread further, aided by two prominent accounts, @bitchmedia and @fakedansavage (the light blue feminists community), as well as another distinct community of feminists (dark blue) huddled around @lexi4prez. We label these communities “feminists” because their members are disproportionately likely to identify as feminists in their profiles.

Image for article titled How to tell whether a Twitter user is pro-choice or pro-life without reading any of their tweets
Image: Brian Clifton, Emma Pierson, Gilad Lotan

We identified several other densely interconnected groups. There are four “conservative” communities of users who are more likely to describe themselves using words like conservative, Christian or Catholic, and a Gamergate community which contains members who are more likely to use the word Gamergate (a controversy about sexism in the video game industry) in their profiles.

People in these communities have dramatically different views on the abortion debate: 90% of users with strong views in the Gamergate and conservative communities are pro-life; 86% of users with strong views in the feminist communities are pro-choice. We identify pro-choice and pro-life tweeters using a set of keywords: for example, tweeters who refer to abortion as “murder” are reliably pro-life. We focused on users who tweet multiple times, who we refer to as “users with strong views,” so we can be confident in our categorization.

But the communities also reveal the complexity of the abortion debate. For example, there are clear differences between the two feminist clusters: the hashtag originates in Feminists 1, but users in Feminists 2 are more likely to be pro-life. Conservative clusters also differ: Conservatives 1 has many Catholics, while Conservatives 2 and 3 are more likely to describe themselves as Christian and conservative. We also observed gender skews in the anti-abortion groups: while the Gamergate community is 75% male, Conservatives 4 is 71% female. Prominent tweeters in the latter community are often young women (one profile read, “Political activist, but like, still a basic white girl. I can’t explain why I don’t need feminism, I’m too busy succeeding without it”). It’s tempting to reduce the abortion debate to two extremes, but as Twitter data shows, people support or oppose abortion for many reasons.

Data from Scale Model
Data from Scale Model
Image: Brian Clifton, Emma Pierson, Gilad Lotan

A little over a day after the hashtag first appeared, an Australian, @TheAntiskeptic, posted a conflicting hashtag: #ShoutYourAdoption. @TheAntiskeptic identifies as a member of the Gamergate community and is also “#christian, #conservative, #environmentalscientist, #prolife, #introvert, and #intelligentdesign.” This raises a mystery: how did an Australian conservative find out about a hashtag started by American feminists?

On Twitter you see only tweets from people you choose to follow, and none of the people @TheAntiskeptic was following tweeted about the hashtag before he did. The answer is probably that when a topic becomes sufficiently popular (“trends”) in a geographic area, it is displayed to all tweeters in the area: #ShoutYourAbortion became a Twitter trend in Australia shortly before @TheAntiskeptic posted about it. Twitter’s Trending Topics are one of the few features powerful enough to break the walls of its echo chambers. This speaks to their importance. Twitter remains coy about how they are determined.

Shortly after trending in Australia, the hashtag started trending in Seattle, Pittsburgh, and Philadelphia, and 16 hours later began trending in Canada where it caught the attention of conservative communities. Within four hours, a torrent of tweets caused the hashtag to rapidly begin trending throughout the United States and across the United Kingdom. When the hashtag peaked at over 10,000 tweets per hour (many tweets are not represented in the graph below because the members do not clearly belong to a community) @BuzzFeed tweeted about it, noting how women were using it to challenge “the stigma of abortion.”

Data from Scale Model
Data from Scale Model
Image: Brian Clifton, Emma Pierson, Gilad Lotan

Predicting someone’s views

Can you predict whether users with strong views will be pro-life or pro-choice using their profiles?

Yes, it turns out. If you just make a claim that every user with a male name is pro-life and every tweeter with a female name is pro-choice, you’ll be right 68% of the time for male tweeters and 56% of the time for female tweeters. You can improve your accuracy by combining gender with other features like whether someone describes themselves as Christian or progressive. Indeed, user profile words—out of tens of thousands—that predict most strongly that someone will be pro-life are “conservative,” “Christian,” “God,” “Jesus” and “Christ;” the words that predict most strongly that someone will be pro-choice are “feminist,” “blacklivesmatter,” “progressive,” “rights,” and “writer.” (Some of the funnier words: “cats” and “f***” predict being pro-choice; “football” predicts being pro-life.)

Top Profile Words:

*Numbers represent size of effect (sparse logistic regression)

But you can do a much better job predicting someone’s views on abortion by looking not at their profile but at their social network. By identifying the community in which a user is embedded, you can predict whether they’re pro-choice or pro-life with over 85% accuracy. This accuracy climbs even higher if you zoom in on the social network and predict using the views of someone’s immediate social connections, especially if those connections are homogenous. If you look only at a user linked to at least three people, all of whom are on one side, there’s a 95% chance the user will be on the same side.

Remarkably, these social network data are so powerful that in some cases, someone’s profile provides no further information. For example, if you know the stances of two of the people a tweeter is following, knowing their gender and whether they describe themselves as Catholic does not improve your prediction of whether they will be pro-life or pro-choice. Put another way, so polarized is the social network structure that even very basic, obvious characteristics stop mattering if we know who your friends are.

One broader implication of this is that no one should take the NSA seriously when they say they are only collecting “metadata” on whom someone contacts, rather than the content of the communication. Social network metadata is incredibly powerful.

It is worth noting that we are studying a group with unusually strong views; people tweeting about the abortion debate are not a representative population. They are arguably, however, an especially influential population—the ones who speak up online, who vote, who join special-interest groups and picket abortion clinics. It is a problem if the people who are most vocal about this debate are also so polarized.

Our analysis reveals a digital landscape both intimately connected and deeply disconnected. It’s a world where a hashtag can spread like wildfire over national borders, where one woman’s plea can compel thousands of others to reveal their secrets. A world where given the right match between content and community, groups of closely connected users serve as fuel, spreading information first within their community and later across the wider network. But it is also a world so polarized that by observing someone’s friends, you can often gauge with high confidence what their political views are—without ever bothering to read what they actually say.

Brian Clifton is a Software Engineer and Analyst at Scale Model, and a Data Researcher. He recently built an automated international policy prediction system, @WorldLeaderTips.

Gilad Lotan is the Chief Data Scientist at Betaworks and an Adjunct Professor at NYU. He frequently writes about social network analysis and algorithmic ranking in online media spaces. His work has been covered by a wide range of publishers and academic journals.

Emma Pierson is a Rhodes Scholar and PhD student in computer science at Stanford who writes about statistics at Obsession with Regression.