Beyond Voice Search

Mobile devices have become easier to use because freely spoken speech can be recognized as typed text. When creating an email, text message, or text string to enter into a search box, speaking is easier than typing on a mobile keypad, at least for most people. Nevertheless, how practical is speech-enabled typing?

Since last year’s first Voice Search Conference, many of the speech applications that were demonstrated have become a reality on a variety of personal devices. Although voice search capabilities have existed on mobile devices for a couple of years via Microsoft and Nuance Communications, recent iPhone releases from Google and Vlingo seem to have caught real attention. Moreover, the general feeling now is that speech works for most people. Interestingly, the speech recognition is server-based and leverages the device’s Internet connection to transmit packeted audio and return recognition results almost instantly. The key to accuracy is the live speaker adaptation technique, first introduced in the early 1990s in network-based voice dialing.

When people talk on their mobile phones, they usually hold them up to their ears with the bottom of the phone oriented toward their mouths. Generally, when the phone’s microphone is held close to the user’s mouth, good recognition performance is achieved because speech stands out strongly over the background noise. So, perhaps anytime it’s practical to have a phone conversation, it would also be sensible to use speech instead of typing. This is not always the case. For mobile devices, speech-enabled typing doesn’t necessarily go along with holding the device near one’s mouth because seeing the display can be critical. Speech as an input implies hands-free interaction because the user is inclined to look at the display while inputting information and waiting for the results. You tend to see people holding the devices near their mouths, but only briefly while speaking a phrase or sentence.

Initiating the dialogue involves a tactile interaction with the device that effectively performs the end-pointing for the subsequent speech recognition task that is performed off-board. This can be done in a few ways, but pressing something while speaking seems to be the most robust and the technique users adapt to quickly.

But just how practical is it for people to talk to their mobile devices to generate text strings? Assuming perfect speech recognition, it may be surprising that speech isn’t always preferred over typing. Some people actually prefer typing over speaking. Then consider privacy while talking in public. Finally, consider environmental conditions that simply make it impractical to use speech.

Talking to Text

Texting is a mainstream communication method, and speech-enabled texting is expected to become a significant mobile device feature. But if you pay attention to the younger generation, you’ll see high-speed typing on a mobile keypad, often using only thumbs. Would such skilled individuals rather talk if given the option? Consequently, what about when you’re around other people and don’t want them to know what you’re texting, much less to whom? Or you’re in a meeting or a classroom and you want to text someone? Or when you’re in a room with many other people also talking? Or if you have a strong accent?

On the other hand, typing is neither practical nor safe while you’re driving a car. This applies to entering information into a navigation system, browsing, and texting, all of which could benefit from speech-enabled typing. Thus, we expect speech to play a critical role in enabling personal device functionality inside the vehicle. In this context, effective, easy-to-use, simple, and safe multimodal interfaces are needed. Speech-enabled typing seems critical for mobile device functionality while driving.

So the general question to consider becomes: How practical is it to talk into rather than to type on a mobile device? Overall, my guess would be about half of the time.

Indeed, the mobile device has evolved from a wireless telephone into a miniature laptop with a lot more functionality than one would expect. And because there is so much functionality with a limited keyboard on a small device, speech-enabled typing can greatly improve usability. We are seeing more and more downloadable applications that enable users to create email and text messages by speaking, along with voice search, voice dialing, and speech-enabled social networking. It would be interesting to know usage statistics that reflect speech versus tactile input for typing.

Thomas Schalk, Ph.D., is vice president of voice technology at ATX, a provider of telematics services to the automotive industry. He can be reached at tschalk@atxg.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Beyond Voice Search

Hona Launches Voice AI

AI Virtual Assistants Market to Hit $2.45 Billion by 2030

SoundHound AI Delivers Voice Assistants at Scale with NVIDIA

Kardome Mobility Now Available on NVIDIA AGX Platform