Tell Me About It: Why Speech Recognition Might or Might Not Be Working For You
Whats so weird about talking to your computer? For regular readers of this magazine, probably nothing. But we have all seen, and at some time in the past, experienced, the wide range of responses to the idea of talking to a PC. For some its the most natural thing in the world, certainly more natural than typing on a plastic keyboard. For others, however, its not that simple using speech recognition software to create documents or control a computer just feels strange. Why? One reason may be that speaking to a PC, or to any inanimate object, is fundamentally different from speaking to another person. In fact, the PC may be the first business machine weve ever spoken to without intending our words to be heard by another human being. We speak through telephones knowing someone is listening on the other end. We record speech knowing that the ultimate audience will be a transcriptionist or another person. Of course we all talk to our cars (sometimes in colorful language!) or plants on occasion, but on the whole, were used to speaking to people, not to inanimate objects. We think of speech as an exchange with someone who understands us. And until now, inanimate objects didnt. With the introduction of speech recognition software, all this has changed. Users are being told that dictating text and speaking commands into a headset to control Windows 98 is as easy as speaking to their best friend. For some users, it is just that easy over the last two years they have experienced a dramatic increase in productivity and comfort by using continuous speech technology. For others, whose experience with speech technology hasnt met their high expectations, hopefully this article (and two to follow in future issues) will explain how todays products can yield positive results with proper set-up and training, helpful tips and suggestions, and a better understanding of the softwares capabilities.
Innovations often slowly adopted People have been wary of some of the worlds most useful inventions. A cartoon by humorist James Thurber, famous in the 1930s, showed a worried woman eyeing a suspicious light socket in the ceiling. The caption read, "Electricity was leaking all over the house." The fax machine is an extreme example of slow technology adoption. Although the first patent for facsimile transmission over wires was filed in 1843, after millions of development dollars, as late as 1970 there were still only 50,000 fax machines in the entire U.S. It took another 20 years for people to drop their preconceptions about what the technology could and couldnt do and start using it as a practical business tool--not as a replacement for existing mail and telephone service, but as a complement to it. Speech recognition technology may also suffer from users overall hesitancy to adopt new technologies, compounded by unrealistic expectations based on popular science fiction films such as Star Trek or 2001: A Space Odyssey, in which people rattle off rapid-fire conversational questions and commands to computers. Those new to dictation software are often surprised to learn they must wear a headset and undergo an "enrollment" or training period, during which the software learns to recognize an individual human voice. Todays speech recognition software doesnt match their expectations, and this obscures the real benefits of the new technology.
"Write out loud" There is a definite psychology to speaking to a PC. First of all, users have to be willing to wear a headset with a noise-canceling microphone. Todays built-in PC microphones arent sophisticated enough to recognize a users voice while filtering out background noise. This means that anybody using dictation software today has to wear a headset microphone and be literally hardwired to a PC. Though more and more people seem to be wearing headsets in public witness Madonna, Garth Brooks, Wall Street stock traders and federal agents it still feels strange at first to most users, especially after seeing the way Dave commanded Hal to "open the pod bay doors" in 2001 while moving about the spaceship. The kind of microphone someone uses and how they use it is fundamental to achieving good results with a speech recognition product. Dictating into a computer requires users to take a fundamentally different approach to gathering and communicating thoughts. This is because speech recognition relies on a language model that mimics the way sentences are written, which is sometimes entirely different from how we speak to each other in conversation. Even though the software recognizes continuous speech (it used to only recognize speech with pauses after each word), it still works best with "full sentence" speech that sounds like written text. Consequently, to achieve good results with todays products, users must "write out loud," speaking in complete sentences at a measured pace just as if one were reading text off a page. To get a better idea of how this works, users might try dictating first by reading text without looking at the computer screen until they have finished dictating, and then try the same exercise while looking at the results. Often the appearance of text on the screen can be distracting and can subtly but significantly affect the pace and clarity of the users voice, and hence the final dictation results. Users should be seated comfortably, with a glass of water nearby, and should avoid leaning over the keyboard. These simple changes in technique and approach can often dramatically improve the accuracy of speech recognition software æ another area ensuing articles will explore.
Two speech extremes Speech recognition software seems to engender two extremes in its users. There are those who use it constantly and aggressively, for letters, memos, white papers, business plans, e-mail and anything else they need to do while sitting in front of a computer screen. Then there are those who try it once or twice and never use it again. The vast majority of those who stop using speech recognition do so right away for one of two simple reasons. The first is purely technical æ dictation software wont run on a users computer if it doesnt meet the processor, memory or sound card requirements. However, thanks to ever-increasing processor speeds and memory capacity, this is becoming less of an issue. In fact, almost any new PC purchased in 1999 will support speech recognition software. People who do get past the PC configuration hurdles can be initially disappointed by the softwares accuracy rate. Users generally expect 100 percent accuracy æ they dont expect to have to check words for accuracy. The end result is shelfware; frustrated users give up on the product before it has a chance to adapt to their particular voice. Once again, 2001s Hal computer has raised expectation levels over patience levels. But consider those who do continue to use speech software. Studies suggest that they are often extraordinarily evangelistic about it because they have spent time training the products to obtain maximum accuracy. They dont just use speech recognition a little bit. They use it more than 50 percent of the time they spend on their PCs. Between these two extremes lies a large number of users that could benefit from practical suggestions on how to get the most out of dictation software tips and tricks for maximizing comfort, boosting accuracy rates and improving overall experience with speech technology. Dictation software is only the first of many new applications users will operate using speech. In the coming years, users will control not only their computers, but also their household devices, automobile electronics and personal digital assistants by speaking to them. Microsofts Chairman Bill Gates has said that, "Speech is not just the future of Windows, but the future of computing itself." If thats the case, what new users really need is a practical guide to using, integrating and maximizing speech recognition for use in a wide variety of home and office environments. Hopefully, we are providing that with this and future articles.
Paul McNulty is the vice president of Lernout & Hauspies PC Applications Group and can be reached through Lernout & Hauspie at http://www.lhs.com.