Speech in the Car
Defined as the integration of telematics and interactive speech technology, voice telematics makes it practical to use an audio interface to do things while driving. Driver distraction is minimized because eyes can be kept on the road. Examples of voice telematics include: getting traffic reports, receiving driving directions, radio control, personal voice dialing, climate control and info-service call routing. The list is endless, but the most significant applications are those that are vehicle-centric and involve GPS-derived vehicle location. The key in this market is the right mix of applications, convenience and cost to the vehicle owner. Cost is directly related to how the services are implemented. Voice telematics in the vehicle can be achieved in a number of different ways. The two primary architectures for voice telematics are referred to as embedded solutions and off-board solutions. The term embedded means that all components of the application and speech technology reside within the vehicle. By contrast, off-board implementations are server-based meaning the audio is transmitted between the vehicle and the server, much like conventional telephony applications, except that hands-free microphones are commonly used. Hands-free microphones are an important component of voice telematics. Unfortunately, the microphone speech signal can be difficult to recognize because of common speakerphone-like acoustic properties. For automotive environments, these properties include: low signal-to-noise ratios, low-frequency road noise, voices in the background (or back seat), acoustic echo, half-duplex artifacts and microphone distortion due to turbulence from wind noise. A close-talk microphone would clearly be better for recognition, but is simply not practical from a human factors perspective. For voice telematics to be successful, recognition accuracy must be high for both embedded and off-board solutions. Embedded solutions are thin on processing which makes accuracy a challenge for complex grammars. Off-board solutions are thick on processing, but the audio quality may be degraded after network transmission. For both embedded and off-board systems, the hands-free microphone is typically used for voice applications and telephone conversations. To improve the ability to have hands-free telephone conversations, noise and acoustic echo canceling software is executed on board the vehicle. The degree to which such signal processing helps recognition accuracy remains controversial. Also highly debated in telematics, is the topic of distributed speech recognition, the concept of splitting the recognizer so that back-end recognition processing takes place off-board. The main advantage of embedded voice technology is cost. The size, memory and CPU processing requirements are minimal and ideal for single user applications (as opposed to a multi-port server capability). Off-board voice solutions are also cost-effective when sized appropriated (i.e., the number of ports is properly utilized). Other off-board advantages include information access, system flexibility, maintainability and scalability. Disadvantages of an off-board voice solution include inconsistent audio quality and system latency. Wireless communication does have its shortcomings, and fortunately, people are conditioned to them. However, if the audio to be recognized is reasonably intelligible to a human, then a properly designed recognizer will perform adequately. Two areas of application latency are connect setup time and delay during the user dialogue. Both must be minimized and managed very carefully. If it consistently takes 60 seconds to connect to an off-board voice service, dont expect repeat users. Likewise, if an application consistently responds slowly to user commands, dont expect high user satisfaction. There are emerging hybrid voice solutions in which embedded and off-board systems are integrated. In fact, many believe that the hybrid approach represents the future of voice telematics. The primary reason is that certain automotive voice applications require embedded processing while other applications require off-board processing. For example, changing a radio station by voice only makes sense with an embedded application. On the other hand, getting voice delivered traffic reports is only possible by accessing off-board dynamic information. Even today, drivers use in-vehicle embedded voice dialing to access off-board voice services such as getting stock quotes and weather reports. The telematics industry is in the process of taking interactive voice technology to the next level of performance. Consistent, intuitive voice user interfaces will provide safety, convenience and value to the vehicle owner as driver distraction is eliminated. Embedded voice telematics will converge with off-board voice solutions leading to more and more services. Whether youre controlling your in-vehicle climate, or receiving driving directions to a designated destination, the human factors design will be consistent and intuitive.
Dr. Thomas Schalk is the Voice Principal at ATX Technologies, a telematics company that provides voice services to the high-end automotive market. He is a member of the AVIOS board of directors and formerly president. He can be reached at
tschalk@atxtelematics.com