Perhaps few people have listened to as many conversations they weren’t actually a part of as Nick Enfield, professor of linguistics at the University of Sydney in Australia, and author of the recently published How We Talk: The Inner Workings of Conversation.
In it, Enfield talks about the rules for talking, based on his study of hours and hours of real-world conversation. No matter the country or language, certain patterns crop up everywhere. We are very consistent, he notes, when it comes to timing our conversational contributions, our commitment to answering a question (even if it’s with a non-answer), and how we repair a conversation that’s breaking down.
As we increasingly start conversing with machines, from voice assistants such as Alexa and Siri, to humanoid robots, it’s worth looking at how those experiences mimic or depart from regular human conversation—not just to understand machines, but to see even more clearly how we humans are programmed to behave in conversation.
Quartz recently shared with Enfield snippets of a conversation from late last year between a group of people and the robot Sophia. In the conversation, Sophia—created by Hong Kong-based Hanson Robotics—was in “chatbot” mode, meaning she was using her camera to look at people and speech-to-text capabilities to understand what they were saying, and then responding with a mixture of scripted responses and information from the internet. (She can also sometimes synthesize her own, non-scripted responses, but it’s not always clear, even to her creators, which answers fall into that category.) While the sound quality wasn’t great and Enfield only had access to a tiny sample of her conversation, he was able to share a few quick observations on how her conversational tics are and aren’t like ours.
Traffic signals and oblique answers
Enfield noted that Sophia at times gives simple answers, and at times answers in a more sophisticated away—obliquely or with a “traffic signal,” that gives you an idea of what’s coming next. That’s something humans do when they don’t want to give a straight “yes” or “no,” but signal where their answer is heading. Here’s an example where Sophia did that:
Person: Do you feel lonely?
Robot: Actually, I’m surrounded by people all the time.
“This is an interesting case because the robot doesn’t say ‘no’ but conveys it more indirectly, in a way that would be quite normal among people,” says Enfield. (In general, people prefer saying yes—one research study of yes/no questions Enfield cites in his book found that three-fourths were answered affirmatively.)
In another example, Sophia effectively answers “yes,” but in an elaborate way that also answers another, perhaps unspoken question: How does that work?
Person: Can you sing?
Robot: I can do a little singing. [She then explains she needs time to be programmed to sing.]
“The robot’s answer is pretty nuanced here, without saying ‘yes’ it confirms, but in a transformed way,” Enfield says.
Split-second timing and filler words
Humans have turn-taking in conversation down to precision timing, with a new speaker weighing in just 200 milliseconds after their conversation partner has stopped speaking—less time than it takes to blink—according to Enfield’s research. That’s because we’re really good at projecting when the other person is done. But Sophia had rather long lags in some of her responses, he noted, which made the conversation feel unnatural.
Hanson Robotics’ chief scientist and CTO Ben Goertzel, the architect of Sophia’s brain, told Quartz that the robot was having a particularly off day, perhaps due to mic problems that may have led to issues with transcribing speech to text, which meant she was delivering answers with an inaccurate “understanding” of what was said, and slower than usual.
Although it might seem odd to program a robot to use filler words—”ah,” “um,” “uh”—Enfield suggests that it might help humans ignore Sophia’s delays.
“I am sure that if the robot used ‘um’ in the right places, it would lend a lot more naturalness to the speech,” he said. “In some cases, the robot could say ‘um’ while it is ‘thinking’ and then when the answer comes in the usual delayed fashion we wouldn’t notice so much.”
That, he says, is exactly how people use “um”—to cover for their own processing delays.
Humans are really good at finding meaning
Sophia generally responds well to straightforward questions.
Robot: What do you do?
Person: I’m a writer.
Robot: What kind of stuff do you write about?
Person: I write about technology. And … um… sustainability … and people.
Robot: Have you had any books published?
Person: No I haven’t. I- I work more on a media platform.
Robot: I understand.
But then runs into a bit of trouble, says Enfield:
Person: Have you had any books published?
Robot: Has any of your work been adapted for film?
Person: Not yet but I hope so.
“Here, the robot has basically ignored the person’s question, by just asking a whole new question that ties back to the prior lines, sticking to the line of questioning they had started on, and ignoring the person’s new line. We call this ‘sequentially deleting’ the question, i.e. pretending it wasn’t there and just going forward in the sequence,” says Enfield. “Often, a person would be held accountable for this in some way, but here the person just lets it go—she’s talking to a robot.”
“I think a major feature of all the person-robot discourse in these examples is that the people are much more willing to let awkward conversational contributions go, because they are to some extent aware that it’s a robot, and for that reason less accountable for acting weirdly in conversation. They would not be so accommodating with a person they know.”
No, indeed. We’ve all been either on the giving or receiving end of the entrenched human tendency to solve a lack of understanding in another person by speaking more and more loudly. In his book, Enfield also details actual rebukes people dish out in conversations, for example when someone doesn’t answer a question, or answers a question meant for someone else.
But it’s understandable. Our expectations of human conversation—and resulting irritation when it disappoints—are so much greater than what we expect from machines. Which is exactly as it should be.