In the future, science fiction tells us that we’ll be able to hold complex conversations with our starship’s computer as easily as we would our robot crew-mates or even other humans. But before we reach that point, we’ll have to make do by chatting with fridges and vacuum cleaners.
This week, several companies announced developer programs to make it easier to create software that can understand what people are saying, and respond accordingly. Microsoft, Nuance—which supposedly provides one of the tools behind Apple’s Siri—and SoundHound, the company behind the eponymous music recognition app, unveiled ways to make every app and smart device easier to interact with.
Everything these days seems to be connected to the internet—though not everything probably should be—and one of the most difficult parts of using these devices is interaction. Tapping away on tiny screens or third-party mobile apps is a lot harder than just being able to talk to something. But developing software that can understand people, let alone derive any meaning from their words and respond with something useful, is not simple. Siri seems to mishear just about every other phrase, and other virtual assistants are not much better.
As more developers are able to incorporate voice recognition into their software, the more data those systems will be able to use to improve their understanding of language. And so it might not be too long before you’re able to have an intelligent conversation with your toaster about how browned you want your bread tomorrow morning.
Listen and learn
Nuance’s new developer platform, called Mix, launched today (Dec. 15). It’s designed to be a simple toolkit that developers can use to set up a voice recognition and understanding app in minutes. Nuance showed Quartz a program it created to talk to a virtual robot that’s been tasked with finding a cat. You can ask it to look under the couch, and follow up with, “What about behind the curtains?” Nuance’s language understanding API allows the program to derive context for the follow-up question without needing to be told again about the missing cat.
The idea is to allow developers to create something that more adequately reflects how people would talk to other people, and apply that to smart devices, like a Nest thermostat, or even emotional household robots. Nuance wouldn’t reveal pricing details, but said there would be a free tier for developers to test it out.
SoundHound’s API platform, which it’s calling Houndify, has a similar goal. “The demand for a fast and accurate, natural, voice-enabled, and conversational interface to the exploding number of connected devices is upon us,” Keyvan Mohajer, SoundHound’s CEO, said in a release. The company is also opening up access to a set of APIs designed to allow developers to add voice recognition, understanding, and follow-up questions to apps.
The company is also partnering with others to provide additional data that developers can use. They’ll be able to pull in hotel and flight booking information from Expedia, weather data from AccuWeather, sports scores, exchange rates, stock prices, and other things people are generally interested in. Soon, you might be able to ask every device in your home the sorts of questions you might have for Amazon Echo.
SoundHound recently released its own impressive virtual assistant app, Hound, that uses many of the APIs it’s making available for developers. You can ask it involved, multipart questions about mortgage pricing, or find exactly the right hotel you’re after for your next trip. Instead of having to fiddle through online menus or in-app buttons, you just talk to Hound as you would a person.
Microsoft also released new tools as part of Project Oxford, its artificial intelligence research division. Much like IBM’s Watson, Microsoft has built out a series of APIs to help developers incorporate machine learning and voice recognition into apps. Yesterday, it announced new APIs that would potentially overcome one of the biggest issues with voice recognition apps: that they can never understand us. Its two new recognition apps aim to be able to determine who is talking, by learning the characteristics of a person’s voice, and identify voices even in noisy environments.
2015 was not the best year for robot intelligence, but it did point the way to far more intelligent cars, robots, and virtual assistants in the near future. When more developers are able to get their hands on APIs like these, and people are able to feed them back more data, the systems’ ability to understand us will improve. IBM, for example, just integrated its Watson AI system into the Japanese robot Pepper, in the aims of using it as a salesperson once it’s been trained to understand the questions it will likely hear.
And in the future, let’s hope that when we ask our robots to open the pod bay doors on our spaceships, they cooperate.