The Outlook for Deep Neural Networks and Speech Technology
Indeed, [24]7 says it increased speech recognition accuracy in IVR systems to better than 95 percent last year when it integrated Microsoft’s Deep Neural Networks into its Customer Engagement Platform.
In August 2015, [24]7 became the first company to integrate DNN technology into enterprise IVRs using Microsoft’s Deep Neural Networks technology. The company had earlier acquired Microsoft’s enterprise IVR business.
The [24]7 Customer Engagement Platform with Microsoft’s Deep Neural Networks has training based on analyses from more than 10 billion speech utterances from Bing search, Xbox, and other sources; it applies that training to enterprise self-service IVR interactions, which often include challenging speech interpretations that need to be made despite background noise, accents, and dialects. Gartner predicts that by the end of 2017 two thirds of all customer service interactions will no longer require the support of a human intermediary.
Providers of these technologies say that the DNNs enable them to solve these issues in a way that previous methods couldn’t, so that the systems can handle challenges such as ambient noise and accents with increasing levels of accuracy.
That’s why companies such as Avis’s Budget Group have installed speech-enabled IVR systems using DNNs. Company officials have said they hope the newer system can handle the cacophony of airports and other busy places in ways that previous speech-based IVR systems couldn’t.
There are other companies that are or soon will be using the technologies of [24]7 and other providers, but their names have yet to be made public.
However, to realize the full capabilities of the DNNs within speech-enabled IVR systems, companies need huge memory capacity to store all of the elements of reasoning, dialogue flow, and other data, Suleman says.
Humans Still Involved
Suleman and Thomson agree that to work best, the newer speech-enabled IVR systems still need human backups to listen in on calls and continually train systems to further improve accuracy rates. However, Thomson points out that the systems usually work well enough to make human intervention rare; therefore, often only those in the industry know that human listeners are involved.
“There are no perfect [speech] recognizers,” Thomson cautions of the capabilities of the underlying technology. But if a company combines a top-level speech recognizer with extensive training of neural networks based on increasingly large data inputs and backs up systems with human intervention, the systems provide good, personal interactions with customers that weren’t available with the older, robotic-sounding speech-enabled IVR systems that some companies still rely on today, according to Thomson.
“What they can do is amazing, but it isn’t magical,” Thomson says.
Interaction’s Curo Speech, built on the AT&T Watson platform that Interactions acquired in 2014, is an automatic speech recognition (ASR) and text-to-speech (TTS) solution that incorporates neural networks and deep learning to modernize customer care interactions.
The solution enables enterprises to incorporate U.S. English and North American Spanish speech into their voice-based customer service channels, providing a direct path to multichannel and enhanced care offerings to support customer interactions. Other languages will be added as early as 2017.
Partnerships Aid DNN Development
Curo Speech supports speech recognition grammar specification (SRGS), speech synthesis markup language (SSML), and Voice XML and integrates with monitoring technology from Genesys, Avaya, and other Interactions partners.
These types of partnerships are the reason for the recent growth in DNN usage, according to Aravind Ganapathiraju, director of speech technology for Interactive Intelligence. “This sudden surge in DNN usage can, to a large extent, be attributed to tool kits developed by tech giants such as Google, Microsoft, IBM, Nvidia, and several other open-source contributors. These vendors offer simple interfaces and recipes that can get companies started quickly.”
In 2011, for example, Microsoft made available to outside developers algorithms from the Microsoft Audio Video Indexing Service, the first time a company had released a DNN-based speech recognition algorithm in a commercial product.