The Tipping Point of Speech Proliferation, Part II

What uses of speech technologies are gaining momentum, and are any uses revolutionary? At the end of my last column (“The Tipping Point of Speech Proliferation, Part I,” September 2009), I talked about what has been pushing the proliferation of speech, which included speech in unified communications (UC) and mobility applications.

UC solutions tie together communication applications to optimize business processes, thereby helping users communicate and collaborate more effectively, no matter where they are or on what device they’re working. These solutions include presence, instant messaging, unified messaging, collaboration, and audio and videoconferencing. Speech technologies have become integral, enabling technologies for many UC applications.

Let’s look at the devices themselves. If one of the goals of UC is to conduct business no matter where a person is, then solutions should be accessible from any device, even if the user is driving a car. To this end, speech recognition has been employed for some time to control applications on mobile devices, including voice-activated dialing, and command and control of desktop applications, such as accessing and updating calendar entries and contacts. More recently, a wave of companies, such as Pinger, SpinVox, Vlingo, Utterli, and Travelling Wave, have sprouted further applications, such as speech-to-text and text-to-speech (TTS) services, providing hands-free capabilities for dictating varying forms of communiqués, including voicemail, text messages, Tweets, and blogs, and for reading the same. Speech-to-blog capabilities allow a person to use his voice to blog and have it posted to Web sites or social networking sites of his choice. For example, Jott Networks—recently acquired by Nuance Communications—provides desktop and mobile applications, such as updating contacts, calendars, and social networking feeds, but also provides speech-enabled access to Salesforce.com, along with access to dozens of application mashups.

What is the difference between mobility and mobile applications found in UC solutions? There certainly is overlap. The above applications using speech as an interface are mobility applications. But those applications aren’t always used as part of a UC solution. Dozens of mobile applications, such as doing voice search of the Web from any phone, are turning up everywhere. As an example of how widespread these applications are becoming, Vlingo—which allows mobile users to control cameras and calendars; dictate email, text messages, and personal notes; dial; and search the Web, all by voice—is now being preloaded on millions of Nokia phones in Europe, and is available in several European languages as well. This follows on the heels of its U.S. debut last year.

In addition, applications that let users access speech-enabled interactive voice response systems, particularly those that use a multimodal user interface, are mobility applications, but they are not necessarily part of a UC solution. For example, Movidilo, a start-up company that spun off from Ydilo, offers a mobile search engine that is based on more than 50 million keywords and provides voice query and search for customer service applications. It does this by providing visual customer service screens on the user’s phone, and query functionality with search results that the user can select by voice or from a visual list. This allows the user a rich user interface and allows him to pick when to use speech and when to use the keypad.

Enhancing the user experience further are technologies such as predictive text, which predicts the next word the user will input on a mobile handset, greatly speeding up the process of keying in text. T9 and XT9 predictive text software from Nuance, for example, adapts to an individual’s preferences and language patterns, supports more than 73 languages, and allows a user to text in two languages simultaneously.

Then there is speech technology in cars. Besides the ever-growing use of TTS in navigation units, speech recognition can be used for command and control in selecting music or the temperature, voice-activated dialing, and asking for driving directions.

The market for these applications is just heating up, with a lot of interest from either investors providing funding to start-up companies in the mobile space, or from companies acquiring them.

Nancy Jamison is the principal analyst at Jamison Consulting. She can be reached at nsj@jamison-consulting.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Tipping Point of Speech Proliferation, Part II

Amazon Launches Nova Sonic, a Gen AI Model for Building Voice Applications and Agents

Phonic Launches End-to-End Speech-to-Speech Platform for Building Voice Agents

Deepgram Launches Aura-2 Text-to-Speech Model

Wistia Becomes First Video Marketing Platform with End-to-End AI Translation and Voice Dubbing