Speaking Frankly

In a report published in the April/May 2000 issue of this magazine, Sergei Kochin of Knowles Electronics surveyed 13,690 speech recognition purchasers and found that less than 15 percent of them used speech software at least one hour per week. This means that five out of every six buyers are using the software rarely or never! Internal Microsoft studies place the abandonment rate for speech recognition software rate even higher, at 95 percent. Existing problems with desktop speech software usability do not appear to be slowing the development of speech for portable digital assistants, as speech companies note both the limited input methods and popularity of existing portable devices. Lernout & Hauspie is developing a voice- activated PDA, and IBM is thought to be adapting its technology for portable devices, too. Palm Computing CEO Carl Yankowski has said that his company will release a voice-activated product by the end of the year. Voice-activated PDAs face more technical obstacles than voice software at the desktop. Noise-filled, changing sound environments, reduced memory and processing power and the lack of close-talking headset microphones make accurate speech recognition on a handheld a substantial technical challenge. I have no wish to see the voice-activated PDA follow the Newton and be written off as a useless toy. The first voice PDA devices should not be marketed as free-form dictation solutions. It's hard for me to believe large vocabulary dictation will work well enough to be practical, given handhelds' smaller processors and environmental noise. If five out of six people give up on speech recognition on their desktops, I'm skeptical that users will have better results dictating to their PDAs. Voice PDA makers should limit and simplify, following the example of the Palm. By demanding Graffiti from users instead of allowing free-form handwriting, the Palm makes the recognition task simpler, forcing people to change a little bit in exchange for acceptable accuracy. The Palm also emphasizes simplicity over features and does not emphasize its handwriting abilities in its marketing. Also, voice-activated features have a higher usability hurdle to overcome in PDAs than in desktop software. Using scheduling and address book software requires near 100 percent accuracy. Finding and fixing misrecognitions is frustrating enough when dictating text. I do not look forward to creating a voice-scheduled appointment for "June 8" and having it disappear into never-never land in my calendar, perhaps to be found on June 18, 10 days late. Some suggestions for creating successful voice PDAs:

Limit the available voice command set, allowing only dates, times, phone numbers and a few navigation commands. Free-form dictation could be limited to a special mode to avoid epidemic misrecognitions.
Include an "undo" command for easily resurrecting events that disappear into the wrong date or to-do list category.
Include an optional "push to talk" button to activate the microphone. This makes the user do a bit of extra work, but avoids picking up extraneous sounds.
Have a menu structure designed for easy voice navigation, prompting the user with what to say. (See my article, "Speech Interfaces That Require Less Human Memory," in the July/August 2000 issue of this magazine.)

The Apple Newton overpromised and underdelivered. Voice PDAs must be designed and marketed carefully to avoid ending up as the punchline in a cartoon.

Dan Newman is president of the speech software consulting firm Say I Can Inc. in Berkeley, Calif. His free newsletter is available online at www.SayICan.com .

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Speaking Frankly

Gladia Launches Solaria, a Multilingual Speech-to-Text Model

aiOla Launches Jargonic Speech Recognition Model

XL8 Delivers Real-Time Spanish Translation Captions to U.S. Public Broadcasters

Northeastern Researchers Develop AI App to Help Speech-Impaired