Humans are emotionally complex. Not just in how we feel, but in how we convey those feelings. What you say and what you do communicates only a fraction of your emotions. Thousands of other cues in body language, facial expressions, and especially in the tone of your voice can paint a vivid picture of how you feel at any particular moment.
AI systems are currently being programmed to start recognizing human emotion and responding in kind. Humans are often trying to “hide” or skew their emotions. In his book What Every Body is Saying, Joe Navarro explains that our face is the least truthful part of our body when it comes to emotional honesty. In particular, we now know that voice is much more difficult to fake than facial expressions, and in general, both are difficult to fake for long periods of time.
It takes a conscious, concerted effort to change your voice and facial expressions, so they don’t match how you actually feel. When it comes to voice, many of these nuances are noticeable in conversation, however subtle they might be.
The human ability to capture subtle nuances in speech
Humans have the capability of capturing these types of small nuances in voice, and in many cases, intentionally coding them in. The research around this is constantly developing. Until recently, facial expressions and body language were identified as the most important elements of non-linguistic communication—the subvocal cues that could define intent in a conversation.
But a recent study by Michael Kraus (among others) identifies our sense of hearing as being more acute at detecting emotion in a conversation. His study showed a higher degree of accuracy in identifying emotions, not just when hearing a voice vs. seeing facial expressions, but also when compared to both hearing and seeing facial expressions. When isolated, a voice is loaded with information that the human brain is particularly good at deciphering.
Although it goes further than just dissecting and understanding the base emotional state of someone based on their speech. As discussed in a recent article from the Berkeley Greater Good Science Center, research by Emiliana Simon-Thomas and Dacher Keltner shows humans can capture small nuances in speech, delineating between sadness, angry, repulsion, and exhaustion for example. And many of these cues are language independent. People are able to determine emotional state even when they are not fluent in the language being spoken.
Empathy in an increasingly digital world
Until recently, technology has allowed us to largely abandon face-to-face conversations in our daily lives. People spend an average of more than two hours per day on social media, use email far more phone their phones, and have shifted even short quick conversations in-office to communal platforms like Slack.
When so much of how we interact is tied up in our connection through speech, what impact does this digital transition have on our ability to feel empathy and truly connect with one another? Research on this is still developing, but it is clear that the majority of the emotional expressiveness in lost in text (and is only partially recovered via the use of emoticons in texting or emailing). The lack of vocal cues makes the emotional and human connection poorer—a lot is getting lost in text messages and emails.
Our own emotions are driven not only by what is being said but how it is being said, more so than we previously realized. A recent study by Jean-Julien Aucouturier at CNRS in France asked people to read and record a short, innocuous story. Their voices were then altered and when played back, many of them would feel different based on what they heard. If their voice was sped up and the pitch raised, they felt more excited. If it was lower with pauses added, they felt a bit unsure of themselves.
It’s an interesting experiment that highlights the deep emotional impact different speaking styles and different voices have on us. So, when we don’t communicate via voice, it raises several questions about how effectively we are connecting with others.
What this means for voice assistants and AI
The words in a conversation are the tip of an iceberg—only a small percentage of the conversation we are really having. That’s where Emotion AI can play an important role, bridging the gap between what is expressed overtly and what is hiding just under the surface in a conversation.
Today’s Emotion AI can automate a host of subjective signals, detecting not just that someone is frustrated, but how frustrated they are on a spectrum drawn from millions of data points in conversations it has analyzed. Also by combining content and emotional analysis it is possible to detect the source or the root cause of this frustration.
While humans detect these cues automatically, we are not always accurate in decoding their meaning. At the same time, we are notoriously bad at recognizing such signals in our own voices. With the use of Emotion AI, we can improve the performance of customer service and sales teams, build more responsive personal voice assistants and leverage unstructured data in new and exciting ways across all industries. This is all possible with the advances in technology around voice and conversational AI.
Alex Potamianos will be sharing his impressive expertise during his upcoming talk, "Virtual Assistants are both marvelous and god awful," at VOICE this July 23rd. Don't miss the chance to connect with Alex and discover the fascinating potential of Emotion AI in your industry!