Body language and facial expressions have always been thought of as the prime channels for conveying emotion. But over the years we've learned that there are countless ways to manipulate them and mask what we really feel. Everyone has pulled a smile when they didn't feel like it, or acted like they were having fun when they actually just wanted to go home. It's almost instinctive.
But what about voice?
Turns out it's much tougher to disguise your true emotions when you speak. Not only do you have to measure what words you're going to say, but also control your facial expressions, body language, and several vocal cues—like pitch, cadence, speed and volume. It's a lot to spin at the same time, which makes your voice the most accurate medium for listeners to perceive your emotional state.
On an episode of Inside VOICE, we spoke with Rana Gujral, the CEO of a company that has been researching the implications of voice interactions for over 20 years. His company, Behavioral Signals, has made a name for itself by delivering award-winning tech that enables emotionally intelligent conversations with AI—a direct result of their thorough understanding of voice as a powerful channel for conveying emotional cues.
In his episode, he shared some of his insight on the role of emotion in voice and why creators of voice-enabled experiences need to start taking it seriously. Here are the main takeaways.
Perceiving emotion through voice
Not long ago, Michael Kraus—a psychologist at the Yale School of Management—decided to investigate how accurately people can gauge emotion solely from someone's voice. In his study, Kraus rounded up three groups of people and showed each group a different version of the same video.
The video was of a bunch of friends teasing each other over a nickname. The first group watched and listened to the video, the second group only heard the video, while the third group only saw the video without the audio. The participants were then asked to estimate what emotions they picked up from the friends on a scale of 0 to 8.
The group that only heard the video ended up giving the most accurate estimates.
Next, researchers presented the participants with a digital voice reciting the friends’ interactions from the video. The theory was that if people were deducing emotions from the words being used, then they'd perform just as well at gleaning emotion from this digital voice. In reality, the participants did terribly.
“It’s really how you speak—not just what you say—that matters for conveying emotion.” —Michael Kraus.
But what does this mean for voice tech?
With this interesting tidbit in mind, we asked Rana whether it's necessary for voice-enabled tech—like the kind in our phones, laptops, cars, and those sitting on our kitchen counters—to be emotionally intelligent, too.
To this, Rana replied that there are two parts to that question: if machines can truly be emotionally intelligent, and if they even should be.
For the first part, he assures that machines can certainly be smart without being emotionally savvy. AI has consistently proven itself to be better than humans at many things, like computing vast amounts of data in mere seconds. But as we interact with machines on a more regular basis, particularly voice-enabled ones that are meant to elicit natural conversations, these superhuman data-processors are showing that they're painfully limited to clean-cut transactions.
"We need to give these machines the ability to be as good as humans when processing affect," Rana said, "so they can be more relatable and provide more engaging experiences with the fellow human."
As for whether it's even ethical for machines to be emotionally aware, Rana poses an interesting question: would you want to interact with a human who doesn't understand emotion? Because, technically, that person would be classified as a psychopath. So, do we want machines to be equally as detached and emotionless? Rana argues that giving machines the ability to process emotion and give the impression of empathy would make user interactions much fairer.
The future of emotion in Voice
Computers are becoming smarter by the minute, and Rana believes emotional intelligence is the key to reaching that extra mile.
"We're talking to machines, but it's a very one-sided interaction where we're giving commands," Rana explained. "We're not really having a dialogue, we're not really having a conversation. And that was the promise of these virtual assistants."
Rana adds that the reason we haven't reached that point is because of the one missing piece: their ability to accurately process our emotional state. If we could get responses from Alexa, Siri, or Cortana that change depending on how happy we sound, whether we're crying, what word we emphasize on, or even how sarcastic we are—our interactions would feel a lot more human.
For Rana, this kind of emotional understanding is the next frontier of voice technology.
Join our virtual events to learn more
Designing voice with human emotion in mind is no easy feat, but there are always generous entrepreneurs like Rana who are more than happy to share their expertise with those hoping to bring their voice tech up a notch. At VOICE, we have two exciting opportunities to surround yourself with the people who know best:
VOICE Talks—a free monthly livestream presented by Google Assistant that features different industry experts in each episode to give you the inside scoop on current trends and opportunities.
VOICE Global—a free virtual voice tech conference featuring pioneering thinkers, startups, and everyone in between for a full 24 hours on June 9th. Jump online at any time to learn from your favorite companies and hop into private chats with your industry heroes.
Get involved today and prepare your notebooks for countless free insights that your business can use to create and market exceptional voice-enabled experiences.