The 2015 Speech Industry Star Performers: Google
Google Searches for Better Ways to Use Speech
Google in late May credited its advancements in neural networking and deep learning for cutting the error rate with its speech recognition to 8 percent, down from 23 percent in 2013, according to Sundar Pichai, senior vice president of Android, Chrome, and Apps at Google.
The company's vastly improved text-to-speech engine now powers even more applications. For example, it can be used by Google Play Books to read aloud users' favorite books; by Google Translate to speak translations aloud so users can hear word pronunciations; and by TalkBack and several other accessibility applications for spoken feedback across devices.
Google also successfully brought speech technology to the car with its launch of Android Auto, a voice-enabled application that pairs Android smartphones with in-car systems. Android Auto, which first appeared in the 2015 Hyundai Sonata with Navigation in late May, brings together many of the familiar Android phone applications—such as Google Maps, Google Now, messaging, phone calling, and Google Play—and lets users control them by voice, steering-wheel controls, or touchscreen.
Now Google is making that same technology available to developers. It recently introduced the Voice Interaction API as part of its Android M platform. With it, users can interact with apps via spoken dialogue. While Google has supported voice commands and text-to-speech for some time, the API creates a true dialogue. For example, if a user tells a smart home app to "Turn on the lights," the Voice Interaction API can ask which room the user means.
"I can't wait for the day when I can talk to my watch and phone like they do in the movies," Sunil Vemuri, Google's voice actions product manager, says in a YouTube video. And while he admits that the world is still "a ways away from that," he does note that the new API on Android M "takes us an important step in that direction."
Google also opened up the voice command functionality on most Android mobile devices to third-party apps. With the usual "OK Google" command, users can launch applications with nothing more than the sound of their voice. Right now only a few select apps are available. Among them, users can find their next homes with Trulia, Zillow, and Realtor.com; shop at Walmart.com; explore music with Shazam; make travel plans with TripAdvisor; and listen to NPR. Google says many more integrations are on the way.
"It's not so much just launching an app, but having it open and performing an action as well," Derek Ross, an Android enthusiast and blogger at Phandroid, wrote in a blog post.
Google's innovations in the past year also extend to continued development around Web Real-Time Communications (WebRTC). The company was one of the pioneers and early champions of WebRTC, which gives Web browsers, mobile applications, and Internet-connected devices the ability to communicate with one another via simple Javascript APIs and a common set of protocols. After partnering with Vocalcom in February, Google is bringing that same capability to the contact center. As a result, contact center agents will be able to use the Vocalcom cloud contact center solution on Google Chromebook laptops through WebRTC-enabled interfaces. Tapping the Chrome browser's native support of WebRTC, users can engage in multichannel communications without having to install additional software. Within a single window, the agent will have the customer record front and center. Customers can also initiate contact with live agents directly from the Web site without relying on additional third-party applications to make the connection.
"Self-service features, the contact center, social, and mobile have never been intimately connected," said Anthony Dinis, CEO of Vocalcom, in a statement. "Cloud-enabled solutions like this one drive [total cost of ownership] savings, increase flexibility, accelerate ramp time, and make it easier to implement secure agent desktops."