Can we take Elon Musk's new chatbot seriously?

Haters be warned—don't use the "Grok" AI chatbot if you can't take a joke

We may earn a commission from links on this page.
Sam Rockwell starred in the movie adaptation of The Hitchhiker’s Guide to the Galaxy, which inspired the Grok chatbot.
Sam Rockwell starred in the movie adaptation of The Hitchhiker’s Guide to the Galaxy, which inspired the Grok chatbot.
Photo: MJ Kim (Getty Images)

A wisecracking alter ego of billionaire entrepreneur Elon Musk is here, and this new chatbot intends to answer “almost anything” and “even suggest what questions to ask.” But proceed with caution. In a blog by Musk’s newly created company xAI, which developed the AI model, the team quips that anyone who “hates humor” shouldn’t use it.

The chatbot, named “Grok” (pronounced graw-k), is trained on “real-time access” to information pulled from social media platform X, which Musk bought a year ago as Twitter.

Advertisement

The large language model (LLM) aims to contend with OpenAI’s ChatGPT, Google’s Bard, and Anthropic’s Claude, with xAI boasting that its capabilities already rival its competitors’ after just four months of development and two months of data training.

Advertisement

Still, some experts are skeptical about Musk’s intentions.“I don’t have high hopes for what Elon Musk is doing; he’s not even concerned about the rates of hallucinations,” said Amin Ahmad, co-founder and CTO of Vectara, a software company specializing in semantic search.

Advertisement

“He only wants higher clicks,” added Ahmad, a former AI researcher at Google.

Why AI chatbots “hallucinate”

Hallucination refers to when AI makes up its own facts. That happens more often than we might think, according to a recent study by Vectara.

Advertisement

In situations designed to prevent AI from inventing untruths, ChatGPT4 veered from its original dataset the least, making up information 3 percent of the time. By contrast, Google’s LLM, PaLM-chat, invented answers 27 percent of the time.

Ahmad’s best guess about PaLM-chat’s high hallucination rate is that its creators wanted it to be talkative and witty, which impacts what kind of answers the chatbot generates.

Advertisement

Vectara co-founder and CEO Amr Awadallah, also a Google alumnus, explained that an LLM is fed petabytes of data that must be compressed into megabytes and then decompressed to derive answers. Information gets lost in this process, prompting the AI to fill in the blanks with details that weren’t in the original content.

Awadallah compared it to going to school: After a while, we don’t remember all we learned and have a tendency to make up information if we can’t recall something.

Advertisement

Similarly, when a chatbot summarizes a news article incorrectly, it isn’t repeating false information from other parts of the internet; it just got its summary wrong when filling in the gaps.

Awadallah added that ChatGPT has the lowest hallucination rate because it’s been on the market longer than other LLMs, getting millions of free sessions with users around the world that let its creators fine-tune the model.

Advertisement
Logo of a bird and X in front
Elon Musk’s company xAI has been developing the “Grok” chatbot for four months.
Illustration: Dado Ruvic (Reuters)

The business purpose of chatbots

“Humor hides the deficiency of the model,” Awadallah said when asked about Grok.

Advertisement

In line with Ahmad’s theory on why Google’s chatbot hallucinates so much, making a sarcastic, witty bot could lead to the same issues. But it may well serve its purpose.

“Elon’s model has been created for consumers—50% for entertainment,” Ahmad said.

Advertisement

Just look at the example Musk shared on X: an ostensibly “humorous” recipe for making cocaine, with a disclaimer that it’s for “educational purposes.”

Because Grok can’t seem to reply to a question without baking in an accompanying joke, Ahmad said it’s fine for the chatbot to hallucinate away. But that conflicts with what Musk believes to be reliable information. He posted that because Grok pulls from X in real time, it’s more current than rival ChatGPT.

Advertisement

That said, given that X is the biggest source of fake news and disinformation, according to the European Union, it’s fitting that many of Grok’s answers would be unreliable. While getting feedback from users will improve the chatbot’s responses, fact-checking and controlling widespread untruth aren’t exactly Musk’s forte.

Hallucination rates matter most when a chatbot is used in regulated industries—for example, to help lawyers prepare briefs or accountants write reports—Ahmad said.

Advertisement

That’s what Vectara aims to address by providing metrics on chatbots’ accuracy, much like those mandated by US president Joe Biden’s executive order on AI regulation.

Both Ahmad and Awadallah are optimistic that chatbots could be trusted within a year or two, when hallucinations will be close to zero, making it easier to measure the accuracy of the content they produce.

Advertisement

What does Grok’s name mean?

The chatbot Grok is supposedly modeled after The Hitchhiker’s Guide to the Galaxy, a comic science fiction franchise created by Douglas Adams. But its name didn’t come from Adams’ work; it was coined by sci-fi author Robert A. Heinlein, in his 1961 novel Stranger in a Strange Land.

Advertisement

To grok means to understand something profoundly or intuitively, according to The Merriam-Webster Dictionary.