What Speech Technology Buyers Really Want: How to Meet the Needs of Enterprise Customers
Another challenge is that each market develops its own jargon, terms known only in that sector. Speech engines flounder when they move from common terms to utterances from specific markets. “Speech systems like Alexa or Google aim for the broadest swath of users: consumers,” explained Chakravarthy. “In the enterprise market, a term may have one meaning in healthcare and a different one in retail. Speech engines need to be trained to recognize the different meanings in each vertical industry.”
A Potpourri of Choices
In response, a hodgepodge of possible solutions is arising. Currently, each NLP vendor defines how to set up such records and does so in its own way. This approach further exacerbates application development challenges because portability problems result. If a third party builds a connection to exchange information with Alexa, they have to repeat the work, largely from scratch, for Siri. Businesses want to spend more time on adding user functionality and less on NLP software plumbing.
Standards would help ease the burden. Rather than coalescing, vendors are moving away from one another. “The speech application development industry now is not trending in the right way: The vendors are all building out their own black box solutions,” explained AgVoice’s Rasa.
The World Wide Web Consortium (W3C), which developed the Voice XML standard, has dabbled in this area, but its work has not gained much traction. “It’s a shame that vendors do not pick up on the W3C’s work; they needlessly reinvent the wheel,” says Larson. The tunnel vision approach makes it more difficult for customers to build speech-enabled applications.
A Different Design Foundation
A lack of technical expertise compounds the development problems. Creating a strong speech user interface is challenging. “In many cases, companies take their existing computer interface and use it for a voice application, which is the worst way to design a speech system,” says Ashish Goyal, CEO at Srijan.
This approach does not translate well because how individuals interact with speech systems differs from traditional software. With computer screen-based search, a user looks at a handful of options and picks the best one. With speech, the first option is usually the one selected.
Because speech applications are new, few developers are adept at building these applications. “The speech industry needs more emphasis by vendors on speech application development, education, and training,” says Larson.
Training programs are starting to arise. The Voices XML Forum has a speech developer certification program, and vendors are also taking on the work. Amazon teamed with training specialists—like Career Foundry, Code Academy, Cloud Guru—to run classes like “Learn Alexa Series” and “Voice User Interface Design in collaboration with Amazon Alexa.” But more work needs to be done in higher education. “Few universities offer much help in building speech applications,” Larson notes.
The Hurdles Mount for Speech Technologies
The end result from the many development limitations is adding voice features to enterprise applications typically is a custom development project. The process is fraught with challenges, starting with making the business case for speech.
Corporations often have unrealistic expectations about the level of commitment needed to build enterprise speech applications. The market is generating hyperbole as well as raising revenue. The Global Chatbot Market was valued at $858.1 million in 2017 and is projected to reach $4.02 billion by 2025, a CAGR of 23.16%, according to Market Research. “Managers see how easy it is to build a consumer chatbot and think adding speech to an enterprise application can be done in 10 to 20 minutes,” explains Larson. But consumer chatbots perform rudimentary functions. Enterprise projects are more complicated and expensive. A small proof of concept costs five figures to get up and running, and a typical project runs into six figures and beyond.