Voice Search Still Looking for Its Tipping Point
SpeechTEK 2007 (NEW YORK) -- During a SpeechTEK keynote on Tuesday morning, Malcolm Gladwell posited that for speech technology to reach its tipping point—a watershed that would see the technology embraced by society—it might have to "re-frame" itself, to fully reach its potential by finding exciting new applications.
And that potential is certainly inherent for speech technology, particularly in the area of search—using voice as an interface to search for data. During a panel called Voice Search, Leo Chiu, Apptera’s chief technology officer, explained the ideal the developers strived to reach: real-time performance, low recognition error rates, and open-ended voice search.
However, as Charles Galles, Intervoice’s principal speech solutions architect, said in the same panel, the ideal is still a long ways in coming. Grammars are prone to aging and a search system would require constant tuning. Galles relayed an incident, during the beginning of the Clinton Administration, in which an attempt to search for "President Clinton" elicited only "President Bush." Such tuning problems continue to plague the technology. A voice search system, he said, "must be more dynamic" and engineers must be able to change search criteria and word associations "on the fly." Still, that large enterprises like IBM and Google are devoting resources indicates a hopeful future for voice search.
Part of the success of voice search, according to Chiu, would be how open-ended it is; this would require natural language lookups. The implementation of natural language, however, has its own problems. For instance, an end-user might approach the system as if it’s capable of understanding everything he says. However, in the panel Advances in Natural Language Processing, one speaker said a natural language system would have to be able to parse unnecessary verbiage and recognize the relevant spoken strings. Thus, a natural language interface needs to be trained on a large body of real data. At the same time, overstuffing the system with concepts would cause recognition errors.
And while natural language is a nascent concept, it probably should be used by the speech industry sparingly and in specific circumstances, such as ones that involve large item lists or that require end users to speak freely. In most cases, a simple directed dialogue interface is adequate, according to Jim Larson, vice president of Larson Technical Services. "People already know what to expect with directed dialogue," he said, adding that a sudden switch within a system to natural language could confuse end user response and, consequently, the speech recognition engine. Thus, while the technology shows promise, it should be used with consideration.
And yet, even with all of these developments, Larson wondered if the industry hadn’t already missed the boat. During a Q&A following the Speech in the Mainstream panel, Larson said he wasn’t completely optimistic about the industry’s future. "The iPhone was a missed opportunity," he said, regarding its lack of speech technology. This was particularly ironic, he said, because a phone would seem like the natural place to host a voice-driven interface. "Americans are just so used to a graphical user interface," he said, wondering if society would ever be willing to shift to a voice-driven one. But if the tipping point occurs, as Gladwell stated in his keynote, it will probably happen quickly and unexpectedly. And it would behoove designers and developers to hone their technologies and continue developing new ones if they hope to usher it along.