Speech Analytics Expands Beyond Voice
Contact centers for decades have used speech analytics to identify spoken words or phrases in audio. Though originally a post-processing step on recorded speech, recent advances have added real-time capabilities that allow contact center leaders to gain insights into live conversations.
A rapidly growing market—projected by Verified Market Research to surpass revenue of $2.3 billion worldwide today and reach $11.9 billion by 2031, growing at a compound annual rate of 25.5 percent—speech analytics has also been fueled by other advances: the ability to classify calls, identify trends, find the root cause of common problems, identify the emotional state of conversation participants, and even identify subtle changes in tone, pitch, and other voice qualities that might indicate the onset of certain medical conditions, like dementia, depression, or concussions.
Now, as digital communications have supplanted much of the previous voice-based channels for customer interactions, speech analytics is undergoing perhaps its most radical change ever. Artificial intelligence is now a central component, enabling faster insights from a much larger pool of information that includes not just voice but also text, images, video, and more.
“Voice is still very important and in many cases is still the most-used channel for items other than self-service,” says Daniel Ziv, vice president of AI and analytics go-to-market strategy at Verint. “Once you start talking about assisted service, the use of voice has dropped off, as email, chat, and social channels have grown. We’re a market leader in speech analytics, but we’ve definitely expanded into what we call interaction analytics, which analyzes not only the voice but also any contextual source.”
“Contact centers can no longer rely on phone conversations alone to serve customers,” agrees Dave Hoekstra, a product evangelist at Calabrio. “They must leverage data from multiple digital channels, such as chatbots, voicebots, and more, to ensure a smooth customer experience.”
These expanded analytics can also identify sentiment, channel categories, and the drivers of both customer sentiment (positive and negative) and of interaction success, according to Ziv. “You can start mapping out which channels are most effective; the agents who are the most effective in each channel; and the agents who are the best with multichannel support. This allows you to adjust and manage the different channels and the different insights much more effectively.”
“Speech and related analytics in call centers have advanced significantly, enhancing operational efficiency and customer experience,” agrees Bobby Hakimi, cofounder and chief product officer of Convoso, a provider of cloud contact center software. “With growing use cases, modern systems now perform real-time sentiment analysis, identifying customer emotions to tailor responses dynamically. For instance, sentiment analysis tools can detect frustration in a caller’s voice, prompting the agent to adopt a more empathetic approach, thus improving customer satisfaction.”
GenAI Generates Buzz
The introduction of generative artificial intelligence nearly two years ago had a dramatic impact on virtually all business systems, and its influence in typical contact center analytics has been nothing short of transformative.
With the incorporation of generative AI, expanded speech analytics now enables more thorough insights into customer sentiment, dialect variations, environmental conditions, and other elements of customer interactions, according to Claudio Rodrigues, chief product officer of Omilia, a conversational intelligence technology provider. “With this breadth of outputs, large language models are exceptionally well-equipped to offer profound insights into various aspects of the data.”
Generative AI from companies like Google and Meta is giving contact centers full context of conversations using transcriptions to provide concise summaries, wrap-up suggestions, reasons for the call, the outcome of the call, and even the reason for the sentiment or empathy scores, Rodrigues says. “We are also seeing an uptick in agent guidance through real-time topic spotting that can provide procedures, suggestions, and input to the agent while they are speaking or chatting with the customer to help steer the conversation to a more positive outcome.”
Generative AI also shortens analysis time significantly, according to Ziv, who credits retrieval-augmented generation, the process of optimizing the output of large language models so they reference authoritative knowledge bases outside of their training data sources before generating responses.
“I differentiate LLM-powered speech analytics from regular speech analytics because we’re actually using large language models to figure out what’s going on in a conversation more holistically,” says Evan Macmillan, cofounder and CEO of Gridspace, a voice automation company. “So instead of finding a single word, you’re actually looking at the complete conversation and figuring out what’s going on in a more general sense.”
Interconnected agent assistant features allow knowledge surfacing, real-time topic spotting, sentiment analysis, and even agent empathy scoring, further enabling the AI to determine the main topics in the conversation, how the customer feels about the conversation, and how the agent treats the customer (whether being empathetic or unhelpful), and all this is made possible by advances in speech recognition, natural language understanding, and speech pattern recognition, experts agree.
But getting there is not always as easy as it might sound. Voice interactions might offer the toughest analytics challenge, according to Ziv, who notes that most callers today go to the voice channel only after failing to get resolution via one or more other channels. “It’s very important to be able to see the full picture and build a strategy across multiple channels and track journeys.”
Many end-to-end interactions start in one channel and end in another; others go back and forth between channels, Ziv points out. “You want to see that whole view.”
Interaction analytics also need to take into account the differences between voice and text, Ziv adds, maintaining that companies might try to get by with analyzing digital interactions and voice the same way by using transcripts rather than actual speech. This, however, doesn’t take into account factors like the nuances of silence in a conversation; cross talk; or customers raising their voices when angry, he says.
“Text has unique characteristics that don’t exist in voice, like spelling errors, emoticons, bold letters, or capitalization, that can reflect emotion,” Ziv says. “Our recommendation is to have an engine for voice analytics and an engine for text analytics, and then the ability to see the combined insights together, which is our approach to what we call interaction analytics.”
Combining analytics for the different channels can also be challenging, according to Ziv, whose company, Verint, offers separate sentiment analysis engines for text and speech.
“The important part is that I can see a unified view, but I can also slice and dice it by channel. You want to be able to look at it in one place to see everything. But you also want to listen to calls and do analysis of them. Sometimes you want to dive into chats. You want to be able to have analytics that are built for individual channels, but then you also want to have a unified view.”
Analytics in Practice
Further highlighting the difference, Ziv also notes that some agents might be really good on voice but not necessarily on text. “Some people are just naturally better speakers and listeners and others are naturally better writers and readers. Your best agents aren’t always going to be the best across all channels. With interaction analytics, you can optimize the agent’s performance and behavior or apply the relevant coaching.”
As evidence, he points to a Verint customer in the insurance industry that used this type of interaction analytics to determine which channels and strategies were and weren’t working well when customers called about policy renewals.
Identifying interaction inefficiencies, the insurer was able to reduce interaction handling time by about three minutes per call while also increasing its Net Promoter Score by 95 percent, according to Ziv.
“This is the kind of impact you can see,” Ziv says, saying such analysis and resulting changes can be made in just a few weeks, not months, when the impact would be much less. “We look for both improvement of customer experience while reducing cost or increasing revenue. We don’t want just the improvement of customer training; we want to get both benefits, to improve CX without increasing costs, and sometimes reducing costs or increasing revenue.”
Generative AI-powered speech analytics is also being used to understand how virtual agents perform, according to Gridspace’s Macmillan. “In the past, we looked at speech analytics to analyze live agent conversations. Now you can use large language models not only to drive virtual agent conversation but to actually investigate how that virtual agent is performing in the field.”
For example, Gridspace client Memorial Hermann Hospital in Houston has used virtual assistants to contact patients after discharge to check on their health. If the patient responds negatively, the call transfers to a live agent. Large language models analyze if the virtual agents are successfully performing their tasks, according to Macmillan.
LLMs also help with post-call summarization and related analytics, quickly determining the important elements of calls, which is particularly helpful with the lengthy calls that are common with insurance claims, according to Macmillan.
Similarly, LLMs can quickly identify common keywords and issues across customer interactions so companies can act more quickly to correct them, Macmillan says. “We can get much more granular data out of these calls using large language models.”
Best Practices
To optimize the benefits of advanced analytics, companies need to be very focused, according to Ziv. “Start with the end in mind. What is really important? Do I want to improve sales? Improve NPS or handle time? Reduce costs?”
For most companies, starting small works best, though there might be occasions where a large, relatively easily implemented opportunity can produce large results.
After determining the goals, the next step is to determine the insights needed to reach them and then using analytics to identify root causes and which changes can be made to improve performance.
“If you start with the end in mind, you go back and look at the insights and then automate or quickly take action,” Ziv says.
However, AI and other technologies have evolved to the point that some companies can get buried under the mountain of insights that are now available, Ziv cautions. “A lot of the insights are not necessarily actionable or quantifiable. Start with one, prove your success, and celebrate it. That fuels more focus.”
Success breeds success, Ziv adds, noting that once action resulting from expanded analytics proves successful in one area, other departments will want to use expanded analytics to drive similar improvements.
To take full advantage of the analytic capabilities of generative AI, companies need to have customer interactions in a cloud environment where all data and metadata is easily accessible, Macmillan adds. “The simplest way is by adopting a cloud contact center platform.”
Beyond adopting a cloud contact center platform, companies need to be connected to workflows and have the ability to adapt to add new analyses and outputs, according to Macmillan.
However, for all of the speed and other benefits that advanced technologies bring to analytics, they don’t make humans obsolete, Ziv says. “Though we’re getting there, [automated] solutions are not totally automated. A human in the loop is still required. Some people like the idea of AI taking over, but it’s more about AI supporting humans in taking the right actions. Buying the software doesn’t necessarily drive outcomes.
“It’s not where I can just push a button and then suddenly I will see that everything is improved,” Ziv continues.
And industry experts expect these trends to continue.
“As the year progresses and AI becomes more conversational, I predict 2025 will have more virtual assistants, collaborative widgets for agents, and much more automation for supervisors and management to take all this powerful speech analytics data and use it to train agents, provide faster and more targeted feedback, and personalize customer service on the fly,” says Robert Wakefield-Carl, senior director of innovation architects at TTEC Digital.
AI analytic capabilities will also be more deeply entrenched in contact centers next year, Convoso’s Hakimi predicts. “The affordability and accessibility of these technologies are expected to drive widespread adoption. Companies will be able to dig deeper into call data, identifying trends and issues that impact call quality.”
Phillip Britt is a freelance writer based in the Chicago area. He can be reached at spenterprises1@comcast.net.