Top Trends in Speech Technology for 2018
At a time when the pace of change in speech technology evolution and adoption seems to be on hyperdrive, a few key trends are pointing where the industry will move in the coming months. “All of a sudden, trends that we predicted two, three, or five years ago are finally happening, and they’re happening at the same time,” says Raúl Castañón-Martínez, senior analyst at 451 Research, an information technology research and advisory company that focuses on emerging technologies.
From the miraculous to the mundane, developments in the market are being driven by broader applications of speech technology and higher expectations for what those applications can achieve. Conversations with industry watchers show that ubiquity, democratization, and AI-powered solutions that do more for users, in both the consumer and enterprise settings, are pushing speech technology into the limelight—and raising serious privacy concerns in the process.
Ubiquity and Commoditization
After so many years of speech technology vendors working in the trenches to create reliable, replicable, and accurate solutions for voice-enabled products and services, industry experts believe we’ve reached a critical adoption point. “It’s gratifying to see the expectation, among several generations, that you can talk to [a voice-enabled] object and get positive results,” says Dan Miller, founder of advisory and analysis firm Opus Research. “We’re moving toward the commoditization of the natural user language interface, this idea that you can use your own words to accomplish what you want, and all these objects like phones and virtual assistants will understand.”
The growing expectation that natural language processing will be both accurate and embedded will, according to Castañón-Martínez, lead to a curious result: the seeming disappearance of the speech tech layer. “Voice will be embedded in more and more apps,” he predicts. “So rather than opening Siri or Google Assistant to ask them to do something for you, you’ll simply be talking to your car, or to a Google Map. We’ll see a more seamless workflow, and that should drive efficiency and benefits of voice input.”
Castañón-Martínez credits the advances in natural language processing for making that increased pervasiveness possible. “It’s paved the way for automated tasks and improved handling of speech-enabled commands,” he says. As Miller puts it, “Now that we can convert speech to text with 95% accuracy, ideas of how to use it are sparking.”
For Moshe Yudkowsky, president of Disaggregate Corporation, which provides consulting for both speech technology and web-enabled telephony, the relatively new ability to get standardized, modularized components of speech solutions is also encouraging broader use. “The ability to break technology into smaller pieces and reassemble them has always driven tech evolution,” he says. “With cloud computing and the ability to use technology components from Amazon or Microsoft, it’s huge for programmers to realize, ‘I don’t have to build it all from scratch!’”
Easier and more affordable access to speech enablement technologies means that the design of voice solutions is no longer the sole province of Ph.D. holders and computational linguists. Product designers and user interface designers will bring different perspectives. Says Miller, “Non-technical individuals can be the subject matter experts, so you could have sales clerks and customer service reps providing the raw material for the conversations. That means the bots can become more human.
“The democratization of development brings new creativity to the design of conversation interfaces,” he adds.
One area in which that is likely to occur will be virtual reality (VR). “Smartphone and other connected devices are trending toward more immersive experiences,” Castañón-Martínez says. “Over time speech is becoming a more practical user interface and that’s even more relevant with VR; if you’re using a VR headset, you’re not using a touchscreen or keyboard for input.” With VR and voice input making such a natural fit, VR’s continued move mainstream and into nontraditional arenas like education, real estate, and the healthcare/medical device market can only portend good things for speech technology.
Enterprise Applications Driven by Consumer Adoption
Similar to the way the early consumer embrace of technologies like mobile phones and e-commerce applications fueled adoption in the workplace, many industry experts expect impatience with non-speech-enabled enterprise applications to encourage innovation. Walter Rolandi of the Voice User Interface Company, a private consultancy specializing in the design and empirical assessment of voice user interfaces, says, “There’s a growing chasm between the accuracy of personal assistants like Alexa and Siri, and enterprise applications. Consumers are less patient than ever when accuracy of their interaction with a speech-recognition system with their bank or insurance company is so poor.”
Google made waves in the speech tech industry in early May with the demonstration of its new Google Duplex technology, a means of allowing customers to conduct natural conversations over the phone with bots, for real-world tasks like making restaurant reservations or hair appointments. While there is some skepticism about how ready Google’s solution is for broad rollout—as Rolandi puts it, “How many miracle demos does one have to witness in one lifetime?”—Duplex surely points the way to how speech interfaces will work for businesses in the future.
Castañón-Martínez notes that when Siri made its debut with consumers, the big question was when voice-enabled applications would be available for businesses. “Now there are lots of announcements, from small to large organizations, about voice-enabled solutions for the enterprise.” He cites Nuance as one company delivering solutions for enterprise clients. It offers conversational AI solutions such as the one powering the Daimler AG “Mercedes Benz User Experience” multimedia system, as well as its cloud-based speech recognition platform for the medical market, Dragon One, which integrates voice input with electronic health records.
Dave Michels of TalkingPointz, which provides insight and services in enterprise communications and enterprise IoT, highlights another potential enterprise application, showcased at a 2017 Google presentation, “The Future of the Meeting Room”: the ability to record meetings and store them in Google Drive. “You could then do a search for any instances where someone in your company mentioned a specific keyword or customer in a meeting,” he says, making it possible to search the corpus of your firm’s verbal and written communication and connect with subject matter experts efficiently.
Still, challenges remain with enterprise adoption. Castañón-Martínez says, “With consumer devices like Alexa and Siri, you exercise a higher level of control over background noise. It’s a big challenge for software manufacturers designing for the enterprise to identify the nuances of whether input is a voice or from background noise. They need to handle a higher level of complexity than a consumer device does.” That’s why he feels that, in contrast to the consumer market, where adoption tracks closely with the evolution of hardware, the enterprise adoption rate will be driven much more by improvements in software. “Devices that are taking their place in the enterprise,” he says, “emerge only after the software gets there.”
Related Articles
The technology is magical, but can be misused
29 Apr 2019
A Game Platform Helps Nonverbal Children Find Their Voice.
13 Aug 2018
We present the thinkers and innovators who are creating new tools and approaches for speech technology—and fostering the next generation of talent.
06 Aug 2018
These cutting-edge vendors are leading the way in AI, analytics, natural language, smart speakers, and more.
01 Aug 2018