In the early days of virtual personal assistants, the goal was to create a multipurpose digital buddy—always there, ready to take on any task. Now, tech companies are realizing that doing it all is too much, and instead doubling down on what they know best.
For Google, that means allowing Google Assistant to take over things you might ask a real personal assistant to do if you were too busy with work. At its I/O developer conference this week, the company outlined plans to build up Google Assistant’s ability to do the bulk of the work of renting a car, and last year demonstrated having it make automated calls on users’ behalf. Meanwhile, at its Build conference in Seattle this week, Microsoft made clear that it’s approaching the assistant role from another angle. Since the company has a deep understanding of how organizations work, Microsoft is focusing on managing your workday with voice, rearranging meetings and turning the dials on the behemoth of bureaucracy in concert with your phone.
“The thing that excites me is to take a step back and think about what is the promise of natural-language systems,” says Dan Klein, a technical fellow at Microsoft who co-founded Semantic Machines, a natural-language processing company Microsoft acquired last year. “It’s not being able to push a button with your voice. That’s cool; but the true promise of a natural-language system is to be able to do a wide range of things with uniform interface that’s natural to you, that’s quicker than the alternative.”
If Microsoft or Google can live up to that promise, their virtual assistants won’t just be trendy add-ons for users who want to set alarms or move calendar invites by talking out loud. Voice is the next major platform, and being first to it is an opportunity to make the category as popular as Apple made touchscreens. To dominate even one aspect of voice technology is to tap into the next iteration of how humans use computers.
Just as the smartphone made touch a popular—if not the most popular—way to interact with software, big tech companies see voice as a similar revolution. It has the potential to be faster and more intuitive, and is also a convenient alternative to spending our lives looking at screens. With minimal setup, you can talk to your phone or laptop as you would a person, and blissfully ignore that you’re replacing one computer with another.
But a true do-it-all virtual assistant is difficult because AI today only functions in narrow domains. You might be able to teach it to answer questions that relate to coffee by gathering data on coffee and training an algorithm to pull answers out of that data, but to do that for everything you’d have to compile data on every known subject, verify that all of it is true, and update that data with every new piece of knowledge. And that’s just for obtaining information, not including including the computer science efforts it takes it takes to try and understand context, or parse meaning within human conversation.
Because of those challenges, virtual assistants today are focusing on smaller tasks that tend to skew personal (ordering an Uber or making a restaurant reservation) or professional (“tell me what’s on my calendar.”)
With Cortana, Microsoft is leaning hard into the latter, a mission made possible by its 2018 acquisition of Semantic Machines. During a Cortana demonstration for Quartz, Semantic co-founder Klein described the experience of using a virtual personal assistant today as a series of isolated sessions. You start a session by asking a question or making a command, and then that session ends. There are a few situations where you might be able to follow up with another question, but those interactions are “fragile,” he says, meaning secondary questions are typically limited. For instance, if a virtual assistant follows up with, “Did that answer your question?” and you say “No,” it just starts the session over again.
The upcoming Cortana tries to break the standard of short, isolated sessions. In the demo, Klein asks what his day looks like tomorrow, which Cortana answers by pulling up his calendar. He then asks where a lunch event is located, and Cortana pulls the information from an event invite and displays it. He asks what the weather is “there,” and Cortana pulls the weather forecast for the location of the event at the specific time of the event. He asks whether there’s outdoor seating, and Cortana looks online and determines there is not. In the middle of his line of questioning, Klein asks Cortana to make some time for him to run an errand after his last appointment. Then he asks Cortana to make an event after lunch, and invite “Andy” and Andy’s manager. Cortana figures out which Andy he means, finds Andy’s manager, and invites them both to the meeting.
Of course, this was a premeditated demonstration using a fake calendar—but it was real code. A Microsoft representative told Quartz the questions were contemporaneous, based on what Klein knew the system could do.
“I think that we can foundationally help people get time back to do what they want to do,” says Andrew Shuman, corporate vice president of Cortana engineering. “Such an enormous amount of their time is being spent in front of Microsoft services and products that we owe it to our customers to give them back time.”
Google is also working from its own trove of data, in its case emphasizing the “personal” aspect of the virtual personal assistant.
The company has made particular breakthroughs in its technology for voice, branded as Duplex. Last year it demonstrated the ability to call local businesses on a user’s behalf to find out information like store hours, and it can also book appointments and reservations. Earlier this week, the company announced new features for Google Assistant that make even more use of Google’s huge database of user information. Starting later this year, for example, Assistant will be able to reference the data it has from Gmail to automatically fill in the information required to book a car on a rental website.
It’s not hard to imagine the vast universe of other personal data that Google Assistant could tap into, since many people plan leisure activities and manage their whole lives on Google services.
This isn’t an AI breakthrough as much as it is a super-powered Auto Fill, made possible by Google’s ability to understand its users personal lives in an increasingly intimate way. Google may have ambitions to be the do-it-all assistant, but those ambitions are stifled by both AI limitations and market realities. Google has a massive trove of personal connections, but its enterprise and business division is dwarfed by Microsoft’s.
Every voice competitor has struggled to gain traction building a one-stop assistant. Amazon, which created the smart–speaker business with its Alexa line of devices, has expanded the number of devices Alexa inhabits, bringing the virtual personal assistant to wall clocks and microwaves. But it hasn’t meaningfully changed the kinds of interactions users have with those devices, at least not beyond the natural differences between wall clocks and microwaves. Apple’s Siri, the original mass-market virtual assistant, can call an Uber or order food on Caviar, but only because the company gave developers the ability to hook their software into Siri. The company hasn’t done much else to develop Siri’s proprietary technology in the past five years.
For now, these companies seem resigned to their inability to create a dominant assistant that people that people will actually use for work and play. Even Microsoft started a partnership with Alexa so that one assistant could summon another for Cortana users’ e-commerce needs. But a piece of the voice pie is better than no pie at all, and tech giants remain hopeful that the blurring of work and life will make any virtual assistant valuable in both realms. “It’s important to recognize that these kind of work problems are universal problems,” Shuman says. “It’s not like I go home and I don’t have to collaborate or schedule things or manage tasks and to-do lists.”