Sentiment Analysis Moves into Voice Interactions
Sentiment analysis technology has been around for decades, with the earliest iterations centered on opinion polarity, gauging whether someone had a positive, negative, or neutral opinion about something.
Today, sentiment analysis has become one of the most heavily researched and fastest-growing areas in computer science. Efforts already undertaken have moved the technology from simple polarity detection to more complex nuances of emotions, going, for example, beyond simple negativity detection to distinguishing between anger and grief.
The most significant advances, though, have been moving the technology from what was primarily a tool for text analysis to uncover insights from voice interactions.
Voice sentiment analysis emerged in just the past few years with advancements in speech recognition and machine learning. It has evolved to where it can now analyze the emotional tone within spoken language through features like pitch, pace, and intonation and even complex contextual nuances.
Voice sentiment analysis tools can also automatically extract the topics, subtopics, intents, sense of urgency, and trends from interactions between agents and customers. And with artificial intelligence, voice sentiment analysis tools can be more accurate and granular; cut through the subjectivity of human opinion; and perform in real time.
Voice sentiment analysis is now widely used in various applications, like customer service analysis, market research, and human-computer interaction, enabling systems to understand the emotional tone of customer conversations and respond accordingly.
“Sentiment is an important element of measuring the customer experience,” says Daniel Ziv, vice president of go-to-market strategy for artificial intelligence and analytics at Verint. “If you serve customers and they’re unhappy, sooner or later that’s going to affect you.”
“Taking action on it and improving customer sentiment is important,” Ziv says. “But first you have to measure it.”
That’s been the goal for a long time, but it’s only recently become a reality.
“Contact centers have wanted to understand customers and their sentiment for decades, but it was not until 10 or 15 years ago that we actually had the natural language processing models in place to properly transcribe calls and to understand fully not only what is being said but how it portrays the customer’s concerns or feelings,” says Robert Wakefield-Carl, senior director of innovation architects at TTEC Digital.
And the newer voice sentiment analysis tools have other benefits, according to Natalie Bidnick Andreas, a digital strategist and assistant professor in the department of communication studies at the University of Texas at Austin. “The latest strides in sentiment analysis with voice are opening up exciting new ways to understand emotions and attitudes. Traditionally, sentiment analysis has been rooted in text, but written words often miss the emotional nuance that spoken language carries so effortlessly. The way we say something—our tone, pitch, rhythm, and even the pauses we take—adds layers of meaning that text just can’t convey.”
Prior to the advent of AI, voice sentiment analytics was driven by relatively rigid manual data definitions and query definitions, says Brett Forman, head of the Enlighten AI line of business at NICE. If different customers used widely different terminology for the same sentiment or if a customer had difficulty articulating positive or negative sentiment, the sentiment scores could vary widely.
“Now we’re bringing in technologies that are enabling companies to build [sentiment models] using vast amounts of data,” Forman says. “It’s more system-driven and programmatic.”
More modern voice sentiment analysis tools also allow for greater customization and provide more granular insights than they could previously, Forman adds.
Post-call surveys only capture a small element of a customer’s sentiment, Ziv agrees. “There tends to be bias. People that are very happy might respond, and people that are angry might respond, but in the middle, the majority might not respond. They just don’t have time. Using voice is more objective.”
Voice analysis, especially for extensive conversations, enables companies to develop truer sentiment scores, and at a lower cost because it doesn’t come with the additional effort that can be involved in post-call surveys, Ziv says.
Recent advances in natural language processing and machine learning are also making it possible for sentiment analysis systems to pick up on vocal cues in real time, Andreas says. “This is a huge leap forward, especially for industries like customer service, healthcare, and marketing.”
“In customer service, for example, voice sentiment analysis can help detect when a customer’s frustration is building up, prompting an agent to step in before things escalate,” Andreas says. “In healthcare, it could be used to monitor a patient’s emotional state, providing early signs of distress or depression that might otherwise go unnoticed.”
Adoption Is Just Starting
Though the latest sentiment technology is far more advanced than technologies available just a few years ago, it is still in the early adopter stages, according to Ziv. “There’s extensive room for growth.”
Forman concurs. “While some organizations are already experimenting with real-time sentiment deployments, widespread adoption is still in its early stages. This lag is largely due to ongoing organizational concerns around AI governance and compliance in customer-facing scenarios, as well as the challenge of translating sentiment insights into clear, actionable strategies. Many businesses are still figuring out how to respond to sentiment data in a way that enhances, rather than complicates, the customer experience.”
To increase adoption, voice sentiment analysis tools vendors have focused recently on quantifying the potential value of improving sentiment and experience, Forman says. “For a long time, we’ve had the ability to create sentiment metrics that can aggregate and measure the potential likelihood of a positive or negative post-contact survey.”
Vendors are also offering increased granularity in their measurements and analysis, according to Forman.
Until recently, customer sentiment has been scored on a scale of -100 to +100 determined by the scores given for specific phrases, predetermined by the natural language engine to be good or bad, explains Wakefield-Carl. However, with phrases like “I want to talk to your supervisor,” what comes before or after the phrase can determine the positive or negative sentiment.
This same principle is used to determine agent empathy—how the agent treats the customer. The combination of acoustic analysis and phrase matching helps by identifying changes in voice volume, pace, pitch, and other factors that would indicate a rise in conversation temperature from the agent or customer, Wakefield-Carl adds.
However, using these factors without showing trends over the call is not fair to individuals with strong accents or non-native skills, Wakefield-Carl cautions. “A combination of traditional sentiment/empathy analysis combined with acoustic markers will help to create an overall sentiment analysis that will be more in line with human evaluations of communication between agent and customer and help to train agents in the correct way to guide the calls to deliver a better outcome.”
It’s important to distinguish between sentiment analysis and emotional analysis, says Jean-Louis Quéguiner, CEO and cofounder of Gladia. Both can be text- or speech-based, but sentiment analysis only classifies information as positive, negative, or neutral, often failing to understand the emotional nuances of voice communication. For speech applications used in customer service, sales, quality monitoring, or medical settings, context is critical.
“This is where emotional analysis comes in, assessing more complex emotions and undertones by looking at factors such as cadence, speech rhythm, and vocal tone,” Quéguiner adds. “Both are required to maintain high levels of accuracy and quality. As a result, multimodal systems that integrate both sentiment and emotional analysis will be the driving force behind applications that need to extract rich insights from audio data in real time.”
Other important recent advancements in the technology are multilingual and multichannel capabilities, according to Forman. “In the past, we’ve been forced to create separate metrics for different channels and for different languages because the ways that things are articulated are unique to those spaces.”
Artificial intelligence has enabled enterprises to consistently measure sentiment analysis across channels and languages, according to Forman. “That’s a big advancement,” he says. “Having multilingual capabilities makes it much easier to manage a business.”
And following other recent systems upgrades, voice sentiment analysis is now sophisticated enough to detect changes in sentiment during a single conversation, according to Forman and Ziv.
This in-call sentiment analysis enables companies to send agents different prompts as sentiment changes during the course of the call.
Voice sentiment analysis tools vendors have incorporated the above advances in some of their most recent technology introductions.
Amazon Web Services, for example, in August 2023 began offering some businesses real-time conversational analytics for voice through its Connect Contact Lens. This enables contact center operators to receive transcription, sentiment analysis, supervisor alerts, and additional functionalities while handling live customer calls. Most companies that have deployed the conversational analytics provided by Contact Lens use it to identify primary contact drivers, obfuscate sensitive information, and natively classify their contacts within Amazon Connect. These functionalities assist organizations in enhancing the consumer, agent, and supervisor experiences within their Amazon Connect contact centers.
Slightly less than a year ago, NICE launched a cloud-based speech analytics tool with advanced predictive analytics capabilities. The platform’s real-time insights are helping companies reduce call center response times and enhance customer experience.
In August, Verint introduced its latest AI-powered speech analytics platform with enhanced features, such as real-time customer sentiment analysis and automated call transcription.
And then in October Mitel announced the global availability of Mitel Interaction Recording (MIR) Insights AI, which provides recording summaries and sentiment detection with voice recording transcripts in more than 100 languages. It leverages a built-in generative AI engine backed by Microsoft Azure AI Services and OpenAI foundation models. Built directly into MIR’s wave bar, AI conversation summaries and action items allow agents to get to the point of the conversation in seconds and even ask case-specific follow-up questions to a genAI-powered chatbot.
Challenges in Sentiment Analysis Persist
Sentiment analysis with voice is still an evolving science. Understanding the subtle ways in which different people express emotions based on factors like cultural differences or regional speech patterns can be tricky, Andreas says.
“One of the challenges in calculating sentiment is perception,” Forman says. “If you listen to an interaction and somebody changes the way that they’re speaking, then it’s intuitive that that there is a change in sentiment.”
The challenge is not so much in looking at a single interaction but in measuring, aggregating, and creating a metric for sentiment across hundreds or even thousands of interactions, Forman adds. People can speak differently in different parts of the country, but differences in other parts of the world can be even more pronounced. For example, sarcasm in the United States is expressed very differently than it is in the United Kingdom.
Audio fidelity is another challenge. Recording via a simple microphone typically yields a different sound quality than a recording using a more complex microphone or an array of several different microphones. Similarly, an incoming call from a smartphone will have a different sound quality than a call coming in from a landline or a computer. Those differences can affect the sentiment metrics.
“We take care to work with our clients to make sure they use metrics as accurately as possible,” Forman says. “It’s important for organizations to understand how sentiment is generated so that they don’t have misperceptions about what the data measures.”
Privacy concerns are another big issue, according to Andreas. “With companies now analyzing the emotional tone in our voices, there’s a fine line between improving services and infringing on personal space,” she says.
To highlight that, outdoor apparel retailer Patagonia is being sued for allegedly breaking California privacy law due to its partnership with an artificial and data intelligence company, which plaintiffs say led to their communications being intercepted, recorded, and analyzed by a third party without their permission.
The above challenges are why NICE champions purpose-built AI, Forman says. “This is AI that is specifically tailored to distinct industries, geographies, and domains, whether it’s financial services, U.K. regulatory requirements, or customer service operations, with governance frameworks in mind. This approach ensures the transparency, accountability, and adaptability needed for compliance. It also ensures responsible deployment, delivering reliable, effective solutions that maximize business value,” he says.
Experts expect that business value to continue to improve as voice sentiment analysis technology evolves and achieves more widespread adoption. In fact recent Market Research Intellect data valued the global voice sentiment analytics market at around $3.6 billion in 2023, and the firm expects it to grow to approximately $10.7 billion by 2032, with a compound annual growth rate of 12.6 percent.
However, accuracy, biases, and user privacy concerns remain, and are likely to be with the industry for some time to come.
Phillip Britt is a freelance writer based in the Chicago area. He can be reached at spenterprises1@comcast.net.
5 Companies That Matter
- CallMiner, a software company that develops speech analytics and interaction analytics software.
- NICE, a provider of artificial intelligence-powered self-service and agent-assisted software for the contact center.
- Observe.AI, a provider of an artificial intelligence-powered conversational intelligence platform for contact centers with real-time agent guidance, coaching, post-interaction summaries, quality assurance, and advanced business analytics.
- Verint, a full customer service automation company.
- Xdroid, providers of a platform that delivers artificial intelligence-driven insights, such as Net Promoter Score prediction, sentiment analysis, and agent action summaries.