Elon Musk’s OpenAI is using Reddit to teach AI to speak like humans

What do you want to talk about?
What do you want to talk about?
Image: Reuters/Robert Galbraith
We may earn a commission from links on this page.

OpenAI wants to build the technology that will finally create a computer that can converse in a way that is indistinguishable humans.

The nonprofit, backed by Tesla CEO Elon Musk and his PayPal co-founder Peter Thiel, brought on NVIDIA’s supercomputer DGX-1, which has 170 teraflops of computing power, to help hone machine learning systems to create algorithms that can comprehend language and teach robots to respond appropriately.

That should solve one of the biggest hindrances to making AI systems that can learn complex interactions: the slowness of current computers. “The speed of our computers is in some sense the lifeblood of deep learning,” OpenAI research director Ilya Sutskever in an NVIDIA video. The goal of this project is to allow a robot to become smart enough to not only recognize speech, but to also use the data it gathers to formulate appropriate responses on its own—and to do that, computers need to digest data more quickly than they are currently capable of.

The DGX-1, which is optimized for an arm of machine learning called deep learning, can feed copious amounts of natural language data into OpenAI’s network much quicker than ever before. The supercomputer needs just 10 hours to carry out computations that would take 250 hours on a conventional computer, according to MIT Technology Review. Then, an AI system uses the data to then creates its own “speech.”

That’s all well and good. But there’s a scarcity of the type of immense amounts of data spanning over many years it needs to improve. That’s where Reddit comes in.

Researchers at Musk’s San Francisco lab are using the news aggregation and discussion site to increase the bank of content available for the AI to study. In a press release, OpenAI research scientist Andrej Karpathy said they are “[training] on entire years of conversations of people talking to each other on all of Reddit.”

Anyone who’s scoured the web knows that Reddit houses ample amounts of crude material and offensive language. But scientists still love the site for the access it provides to large numbers of colloquial conversations. Its nearly 900,000 subreddit communities comprise a range of topics both broad and niche,  from r/science to r/DogsStandingUp.

The real challenge is how the researchers will protect the AI from venturing over to the darker side of the freewheeling forum. Over the past year, Reddit has been tightening its reins around untoward content. For example, the site now prohibits posting naked photos and sex videos without the permission of those depicted. It also has policies barring threats, harassment, and bullying. But trolls keep surfacing alongside intelligent conversations.

OpenAI doesn’t want to become the second coming of Microsoft’s chatbot Tay, which turned racist and genocidal after interacting with people on Twitter. With Reddit as its teacher, the latest AI experiment could be in danger of realizing Musk’s stated fear of AI agents becoming more harmful to the human race than even nuclear weapons. But, if the researchers can find a way to control the influence of trolls, the experiment could just end up training the most conversational bot of our times.