Continuous Dictation: The Future of Speech Recognition
Looking into our crystal ball, what do we see for the future of computer dictation speech recognition? Allow me to make some predictions. Recognition accuracy will become better and better, gradually over time, but never perfect. Correction methods and paradigms will continue to evolve and improve. Dictation speech recognition will gradually become accepted, incrementally, by more and more people over time, used more and more widely, but never become the universal and exclusive data input method. Typing and mice will continue to be ubiquitous, and handwriting recognition may become more common. Keyboards, mice, and electronic pens will become more closely integrated with speech recognition. Microphone positioning and usage will become more natural, with increasing use of hand-held microphones, and microphones within small hand-held computers and small hand-held recording devices. Although headset microphones will continue to be used by some individuals who need their hands free, and by those who find headsets comfortable, nevertheless hand-held, desktop, and especially in-dwelling microphones will greatly expand the range of microphone options used in the field, the office, and the home. Small hand-held writing tablets for computer speech recognition dictation and data entry will be developed, as faster processors and more memory become available. Speech recognition will be done more and more in rapid access memory with less need to consult the hard disk during the recognition process. Speech recognition systems will make more use of a larger context in order to achieve higher rates of recognition. This is already happening in some systems, as dates, cash amounts, email addresses, telephone numbers, and zip codes, for example, among others, are becoming increasingly more accurately formatted. More use will be made of "common sense" in choosing the correct word for a particular context. The "common sense research project" which was based in Texas, has tried to understand some of the apparent assumptions underlying common language usage. Recognition systems will sometimes make "silly" mistakes without regarding context. Context recognition is an ongoing challenge for computers. As the writer shifts topics, the computer target vocabulary may automatically change. When, for example, baseball is mentioned, a particular vocabulary may become more prominent within the choice structure of the system. Also greater "programmed understanding" of grammar and syntax will improve accurate recognition. "Grammar checkers" will be incorporated more closely into recognition algorithms. Greater use will be made of "intelligent systems" which will attempt to guess what the speaker intended to say, rather than what was actually said, as people often misspeak and make unintentional mistakes. Partially spoken phrases might be finished, and sentences where the speaker’s voice trails off at the end, may be properly terminated by the system, rather than left incompletely or inaccurately finished. This is similar to the way the human brain tries to understand and make sense of what it hears, finishing incomplete thoughts and sentences to some orderly completion, or trying to make sense out of obscure statements. Command and control will never be entirely speech driven. Recognition errors may have disastrous results, and keyboards, mice and electronic pens and buttons are easy to use in conjunction with speech systems. Also many people work where noise might interfere with speech recognition. In addition, we are often in non-private environments, where speech recognition dictation and commands might be overheard. Material recorded onto small hand-held tape or digital-recorders will continue to be played into computers for later recognition. However recorders will be replaced somewhat by small hand-held computers doing the recognition on the spot, so that the user may immediately make corrections--for convenience, and while the ideas are fresh. Systems will be designed to adapt more quickly, more completely, and in an ever gradually evolving fashion to the writing habits of an individual or group. Constant continuous adaptation is needed. Similarly, microphone and sound systems will be designed to adapt more quickly to changing background noise levels, different environments, with better recognition of extraneous material to be discarded. All in all, systems will become smoother, more accessible, and easier to use. They will become snappier, more accurate, and more integrated. They will become more intuitive. There is tremendous potential room for improvement. Nevertheless the current systems are magnificent achievements in technology, and eminently usable in their current forms.
Peter Fleming, a speech recognition consultant, may be reached at aris@world.std.com, or (617) 923-9356.