How Amazon’s new health tracker will use AI to monitor your tone of voice

Humans have been wrestling with how to define happiness for eons. Can Amazon’s AI crack that nut?
Humans have been wrestling with how to define happiness for eons. Can Amazon’s AI crack that nut?
We may earn a commission from links on this page.

Amazon may not have unlocked the secret to happiness. But with the announcement of a new voice monitoring tool called Tone, the company promises that it knows what happiness sounds like. And that—with a new gadget and a little tracking—you, too, can sound happy.

Tone will be a feature on Amazon’s new wearable health tracker, dubbed Halo. Users can opt in to let it sample snippets of their speech throughout the day, or turn it on for up to 30 minutes at a time to get a detailed report on how they sounded in a particular conversation. Powered by AI algorithms designed to detect the “positivity” and “energy” in human voices, the tool purports to offer users feedback on their tone so they can improve their communication skills and relationships.

Of course, it’s hard to define fuzzy traits like positivity—and it’s an even more Herculean task to train an AI model to objectively quantify and measure them. In a blog post, Amazon simply says that “positivity” measures how happy or sad a voice sounds. But humanity (and the field of positive psychology) have been wrestling with how to define happiness for eons.

“It’s hard for me to imagine that there could be a single objective measure,” said Jim Allen, an associate professor of psychology at the State University of New York at Geneseo who writes and teaches about the psychology of happiness. Our perception of what a happy voice sounds like, he notes, varies depending on culture, gender, ethnicity, and other personal factors.

An Amazon spokesperson said that the developers had accounted for these differences by drawing on vocal samples from tens of thousands of voices from across US regions and demographic groups. A team of Amazon employees then listened to the recordings and rated the voices as happy or sad to determine “positivity” and tired or excited to measure “energy.” The model associated those emotional ratings with vocal qualities like pitch, intensity, tempo, and rhythm, which the AI uses to label users’ speech.

Training sets, however, are highly susceptible to bias from the humans who build them, as researchers have extensively documented in fields like facial recognition. That makes vetting the data, and the people who label it, very important. Amazon declined to offer any detail about the demographic breakdown of its vocal samples, or the team whose perceptions of positivity and energy form the basis for the model. “Throughout product development, we’ve focused on ensuring the data we use to train and evaluate our models accounts for all demographic groups,” a spokesperson said in an email.

In particular contexts, Allen said some version of a tool like Tone could work well. “In the hands of a skilled counselor giving feedback to a client about how they come across to other people, it could be really helpful,” he said. But, he noted, constantly monitoring yourself for signs of happiness—or worse, projecting a positivity you do not feel—has been shown to make people less happy.

Pattie Maes, an MIT professor who studies wearable technology designed to enhance people’s lives, pointed out that the AI would be more likely to return meaningful results if it didn’t try to treat happiness as a universal truth. “People have different speaking styles,” she said in an email. “I believe a personalized AI model trained on an individual’s own data would perform better.” (While Tone learns to pick out a user’s voice in a conversation, it does not calibrate its ratings to that user’s emotional baseline.)

But these approaches to boosting the model’s validity are not compatible with mass consumer tech. In its announcement blog post, Amazon medical officer Maulik Majmudar describes a gadget that comes out of a box ready to coax users into better communication. He writes about the ease with which his colleagues can turn on Tone and rehearse for a big presentation at work. Majmudar says he switches the system on before talking to his children, to make sure he’s not taking work stress out on his family.

It’s an intriguing vision for an AI-enabled future. But it might not be the one we live in right now.