Microsoft is fixing a simple reason why voice assistants are so bad

As more and more of us ask the virtual personal assistants that live on our phones what the weather will be tomorrow or the height of the Chrysler Building (1,046 feet), it’s clear that the voice-activated software is currently only really useful for answering simple fact-based questions.

By Dave Gershgorn3 min readUpdated July 20, 2022

Add QZ to Google

Microsoft $MSFT, whose virtual assistant Cortana operates on Windows and as a cross-platform mobile app, is trying to smarten up our dumb bots by making a new dataset available to the public, letting future AI analyze how humans would do the same tasks that virtual assistants handle every day. The dataset (pdf) consists of 22 pairs of humans talking to each other—one person without the internet asking for information, and another taking those questions and trying to come up with a good response.

The essential business news, delivered fresh every morning.

Join 500,000+ readers who start their day with Quartz.

By subscribing, you agree to our Terms of Service and Privacy Policy.

The Microsoft dataset focuses specifically on information retrieval: How does a human know when they’re supplied a good answer? What does a natural human exchange sound like? How important is context? It also differs from other conversational datasets (like one from the makers of Siri, which catalogued transcripts between travel agents and prospective vacationers), because it includes data about the participants’ levels of stress, emotion, engagement, and satisfaction. It contains transcripts, audio, video, and the aforementioned data. Microsoft did not mention expanding the database over time.

Here’s an example of one of the questions:

Imagine that you recently began suffering from migraines. You heard about two possible treatments for migraine headaches, beta-blockers and/or calcium channel blockers, and you decided to do some research about them. At the same time, you want to explore whether there are other options for treating migraines without taking medicines, such as diet and exercise.

The researchers classified this as a low difficulty, high complexity problem—one where the information might be easily available, but there are many factors to consider. The humans then talk through getting more information until the information seeker is satisfied with the result.

The trick now for AI researchers is to design systems that can make use of this complex data. Today, virtual assistants like Alexa analyze what a person is saying, but not how they say it or whether the command has multiple parts. It’s a glorified Google $GOOGL search, as a result of the simple datasets mentioned before. Simple data teaches machines simplistic ideas.

But now, future algorithms can be given a much better idea of how two humans would accomplish an information-driven goal together, since they have data to understand what kind of information would make a human feel satisfied with an answer, and which wouldn’t. So next time you get frustrated with Siri or Alexa, know that soon they might be able to tell.