Combining Technologies I
The idea that technologies can be used together to produce more flexible, accurate, powerful and friendly systems is certainly not new. It has been receiving increasing attention in the computing industry as a result of the availability of multimedia devices and fast, powerful processors. This column focuses on combining speech recognition with speaker verification/identification. Each of these technologies is effective when used independently of the other. The object of combining them is to apply their relative strengths to solving problems, improving performance, and offering other useful enhancements.
About Speaker Verification and Identification
Speaker verification and identification are the most highly commercialized forms of what I call voice biometrics. They use acoustic features of a person's voice to assign an identity to the voice of an unknown speaker or to verify that the person is who she/he claims to be. These technologies cannot understand what that person is saying. In contrast, speech recognition can neither identify a voice nor verify a claim of identity. Speech recognition is not a biometric technology but it does an excellent job of understanding speech. All biometrics, including speaker verification and identification, differ from other forms of security in their ability to provide "positive identification of the person." This means that they can tell who is accessing a secured system. No other form of security can do that. At best, other types of security can verify that that person knows a piece information (e.g., PIN, password) or has possession of an access-control device (e.g., card, token).
Adding voice biometrics to ASR
When speaker verification and/or identification are added to a speech-recognition application it is generally for the purpose of providing security. This is not surprising. Sometimes voice biometrics replaces an existing form of security, such as a PIN. In other cases voice-based security is incorporated into an existing speech-recognition system so that the system can be extended to operations that require security - for example, to add stock transactions to an existing stock-quote system. The power of positive identification makes speaker verification/ identification attractive for uses other than security. If you know the person who is using the speech-recognition system you can personalize that system. Personalization can be as simple as greeting the person or as complicated as loading profiles and setting preferences. This use of voice biometrics is an example of what I call 1+1=3 because the benefits that are gained are different from and go beyond the ones that are usually considered when one thinks about speaker verification and identification.
Adding ASR to verification
Speech recognition can be added to a speaker-verification application to create a total voice solution in which the front-end security leads to voice-activated command and control or a dialogue system. Sometimes, speech recognition supplements or replaces touch tone input. When someone claims to be person X most commercial speaker-verification systems ask that person to say person X's password or repeat specific numbers or phrases (e.g., "say 42 57"). If the match between what that person says and person X's voiceprint is bad the system will reject the claimant as an impostor. Sometimes a poor match occurs when the person fails to give the expected response. When this happens, the system can use speech recognition to ascertain that the person has said the wrong thing. Similarly, when a system encounters a questionable matching score it might ask personal content questions, such as, "What is your mother's maiden name?" Only speech recognition can determine whether the person has responded properly. This implementation of speech recognition extends the degree of automation that can be built into a deployment of speaker-verification. Speech recognition can be used to shorten the verification process as well. When a speaker-verification system requests a user ID (e.g., an account number) speech recognition can be used to interpret what the person has said (e.g., "743 952"), access the voiceprint associated with that ID, and then hand processing to the speaker-verification system.
Judith Markowitz is the associate editor for Speech Technology Magazine and is a leading independent analyst in the speech technology and voice biometric fields. She recently completed a market analysis of speaker verification and identification. She can be reached at (773) 769-9243 or Jmarkowitz@pobox.com