Speech WarsRound Two
The Voice Browser Working Group has finished the technical work on the three major languages in the W3C Speech Interface FrameworkVoiceXML 2.0, the Speech Recognition Grammar Specification, and the Speech Synthesis Markup Language. These languages will soon become check-off items in a list of features provided by the leading speech platforms. Most speech platform vendors will offer these languages as a standard part of their platforms. In order to compete, speech platform vendors need to find new ways to differentiate their platforms from the platforms of their competitors. Major differentiators will include new software development and management facilities that accelerate the speech application development process and enable sophisticated monitoring of deployed speech applications. Vendors will no longer boast our platform supports VoiceXML and yours doesntthe speech wars of yesteryear. Vendors will now do battle with a range of tools that make life easier for speech application developers. These tools will fall into three categories: development environments, information centers and control centers. Development Environment A development environment contains integrated tools that accelerate the speech development process. Useful tools include: Code editorsText editors that help the developer to createVoiceXML 2.0, Speech Recognition Grammar Specification and Speech Synthesis Markup Language code that syntactically conforms to the associated XML Schema. Any syntactical error is immediately flagged so the developer can resolve it. Graphical dialog designerEnables developers to draw dialog states and transitions and, then, automatically generate VoiceXML 2.0 code. Prompt managerCollects all of the text prompt messages into a single file that enables voice talent to easily record the equivalent verbal prompts. Pronunciation specification toolEnables developers to specify the pronunciation of words by selecting and sequencing sounds for each phoneme in a word. Grammar specification toolConverts developer-created flow charts, spread sheets or tree structures into grammar rules. Rehearsal toolEnables the designer to walk through a VoiceXML application without using speech recognition and speech synthesisby reading textual prompts on a screen and typing the responses via a keyboard. The developer debugs the dialog logic without dealing with speech recognition errors and misunderstood synthesized speech. Debug toolsDisplays the contents of internal buffers and actions performed by the VoiceXML interpreter. Information Center The information center provides tools that present developers with information about how speech applications are used, including: Log fileCaptures timestamps and other information for each
tag that developers embed into a VoiceXML application. Log file report generatorSummarizes information calculated from log file including durations (task duration, response latency, mean system turn duration) and counts (turns to task completion, number of help, nomatch and no response events, number of reprompts). User evaluation resultsSummarizes usability questions answered by users, including likes, dislikes, preferences and other user subjective responses to usability questions. Control Center The control center contains performance monitoring tools such as: Application activity toolSummarizes speech recognition response times and application activity process. Platform activity toolSummarizes page fault rates and fetch times, communication delays, and congestion at platform resources. System administrators can dynamically reconfigure the distributed system to better support application performance and offload processing to backup hosts during peak processing loads. Who will win the new speech wars to provide better development environments, information centers and control centers? Vendors who pay attention to the speech applications developers and provide a usable collection of tools that satisfy their needs will win by selling more platforms than their competitors. This is a war where application developers win by developing and deploying new applications faster. Let round two of the speech wars begin.
Dr. Jim A. Larson is an adjunct professor at Portland State University and Oregon Health Sciences University. He can be reached at jim@larson-tech.com and his Web site is http://www.larson-tech.com