Getting to an SIV Module for VoiceXML
A tremendous amount of market interest has grown in speaker identification and verification (SIV). VoiceXML developers are being asked to add SIV to existing applications and to build new SIV applications, while VoiceXML-compliant companies, such as Voxeo, Genesys Telecommunications Laboratories, and Angel.com, have active partnerships with SIV technology companies.
All of this favors the development of a solid SIV module for VoiceXML 3.0 as soon as possible so standards-based SIV can be integrated into VoiceXML and other applications quickly and easily. Fortunately, the World Wide Web Consortium’s Voice Browser Working Group (VBWG) began that process by releasing the first draft of an SIV module in December. It is still very much a work in progress.
The module covers three operations, or core functions: enrollment, identification, and verification. Early on, the VBWG decided against including a broad spectrum of core functions, such as classification and speaker separation, in the first version of the module so it could delve more deeply into the requirements of the most widely used SIV operations.
The draft document, which can be viewed at www.w3.org/TR/2010/WD-voicexml30-20100304/, explains that SIV is “an additional recognition resource (along with the existing touch-tone and speech recognition resources) which can be activated alone or simultaneously with the other recognition resources,” but which also has unique characteristics. For example, the resource uses “voice models” rather than grammars.
The steps of a typical SIV operation—capturing audio and comparing the audio input with a reference voice model—are displayed in the flow chart below. Some steps contain additional information that will likely be the basis for crafting code for those steps. For example, “assessing audio quality” is augmented by various reasons audio quality might be lacking, such as when there’s too much noise. The audio quality evaluation step is also a task that must be accomplished by speech recognition, which suggests that existing VoiceXML functions for audio quality and similar crossover operations could be extended to SIV.
The SIV resource, itself, consists of two elements—data model and state model—and performs a constrained set of actions in the course of proceeding through the steps of the processing cycle, such as the handling of results and termination. The document also provides the shell of expected syntax and semantics for the SIV module.
The Challenge
The draft module is nicely crafted and described in language familiar to the VoiceXML professionals who will use it. However, it is only a skeleton. For it to evolve into a full-bodied component of VoiceXML it must address the needs of developers, vendors, integrators, and others involved in crafting SIV solutions and bringing them to market. That cannot happen unless members of all of those constituencies examine the current draft with a critical eye and give the VBWG substantive feedback about the good, the bad, and the ugly.
Too often, that kind of input is provided late in the standards-development process, after critical design and functionality decisions have already been made. As the editor of a standard developed within the biometric committee of the American National Standards Institute, I have firsthand experience with receiving valuable suggestions so late in the process that it wasn’t possible to implement good changes without reworking the draft standard.
That’s why I invite VoiceXML developers and VoiceXML-compliant companies to be the first to provide feedback to the VBWG. You have the knowledge and hands-on experience to assess the extent to which the SIV module meshes with the rest of VoiceXML. I also encourage you to involve your SIV technology partners in your review. They can provide insight into events, functions, parameters, and other requirements from the core technology perspective.
For those of you who want additional information, I recommend two documents published by the VoiceXML Forum’s Speaker Biometric Committee:
- “Speaker Identification and Verification (SIV) Requirements for VoiceXML Applications,” and
- “Speaker Identification and Verification Applications.”
Both documents are posted on the W3C’s Web site (www.w3.org).
Judith Markowitz, Ph.D., is president of J. Markowitz Consultants and a leading independent analyst in the speech and voice biometrics fields. She can be reached at judith@jmarkowitz.com.