After firing 26 contractors tasked with curating the world’s news for Facebook users, the social media company is now relying on an algorithm do their jobs. Algorithms run most of Facebook already: They individually populate each user’s News Feed, assist with search, and automatically tag photos.
But news provides a more complex challenge. It’s easy to see—starting Aug. 26, summaries and short headlines in the Trending bar were replaced by a combination of proper nouns and the number of people sharing the topic. A fake news topic even made its way into the section. Machines appear to be no match for humans when it comes to the news.
Summarizing news is one of the toughest problems in computer science. The text of news stories is unstructured data, filled with context and already written to minimize space. And even if the algorithm can shrink the word count, the summarized text needs to be true. Humans do this every day when we read the news and talk about it with friends, family or coworkers.
Earlier this year, Facebook published research on news summarization in which strings of text were boiled down to about half their length. The work is called “abstractive” summarization, similar to a human paraphrasing rather than cherrypicking important words from a text. The algorithm would take a long string like:
brazilian defender pepe is out for the rest of the season with a knee injury , his porto coach jesualdo ferreira said saturday .
and reduce it to:
brazilian defender pepe out for rest of season with knee injury
This is the best example of Facebook’s research, and doesn’t make any changes to the base text. The paper cites other news stories, like a 2006 New York Times story on political maneuvering within the White House. The original sentence, “Colin L. Powell said nothing–a silence that spoke volumes to many in the White House on Thursday morning,” uses symbolic speech to represent an idea. That idea is lost when Facebook’s algorithm reduced it to “powell speaks volumes on the white house.” The sentence is untrue, and the summarization loses the nuance of the Times story.
Text summarization is a widespread area of interest. In a Aug. 24 blog post announcing their open-source text tool, Google points out that summarization serves as a “reading comprehension test for machines.” The better the summarization, the better the computer understands the text. IBM Watson also recently published a paper on summarization, which focused on summarizing data from CNN and the Daily Mail.
News itself is governed by context. A story can be important because of the events that preceded it, or because of a person’s occupation. These ideas would fall outside of the scope of the Trending algorithm, according to Facebook patent applications. News stories are ranked by popularity, measured in shares, likes and clicks on the link, but can also fall in line with a user’s interests. Facebook can tell what things are popular, but not why.
That’s the failing of using modern algorithms to judge news: Their understanding is shallow. The contractors Facebook fired were journalists, trained to understand what’s important and why. And if the stripped Trending bar is any indication, even the best minds at Facebook can’t replace that with machinery yet.