Speech of the Future: Mobile, Multimodal, and Multilingual

Three major trends will influence the future of speech, according to Dave Burke, engineering director at Google and the opening keynote speaker at the second annual SpeechTEK Europe conference recently in London.

The trends he identified include the growth of smartphones and applications to run on them. “Smartphones and tablets will be the playground for speech apps,” Burke says, noting that the inflection point will come by 2012, when global shipments of PCs will be outstripped by mobile phones.

The second trend that Burke highlighted is the growth of storage and cloud-based services, with ever-cheaper memory leading to faster and lower-cost devices. Third, HTML5 will have a major impact as “the next era of the Web,” Burke says. “We will start to see speech recognition and synthesis on mobile devices, especially devices that are now touch-based.”

Also important will be multimodal apps, Burke says, because “people can speak faster than they can type, but they listen slower than they can read.”

In his keynote, which opened the second day of the conference, Alexander Waibel, a professor at Carnegie Mellon University and Karlsruhe Institute of Technology and director of the International Center for Advanced Communication Technologies, said it is a myth that everyone speaks English; in the European Union alone, 56 language directions must be translated. Language technology solutions can maintain cultural diversity while enabling communication and collaboration, he argues.

To illustrate the point, Waibel simultaneously translated his own presentation into Spanish using Jibbigo, the world’s first commercially available speech translator running on an iPhone. According to Waibel, such applications will not put human translators out of a job, but they are useful when no translator is available. These apps have been used by doctors and aid workers on humanitarian missions, as well as by tourists and travelers.

Moreover, the technology available now will usher in the next generation of translation systems, which will happen in two to 10 years, Waibel predicts.

SpeechTEK Europe was made up of two days of conference sessions and a parallel exhibition, with a day of practical, half-day preconference workshops covering the latest breed of speech technologies and their applications. Attendees came from 20 countries in Europe, Asia, and the Middle East. Some of the most popular sessions discussed the customer testing of natural language dialogue, the design of voice applications, speech technologies in the car, voice biometrics, and multichannel strategies.

Also at the event, the winners of SpeechTEK Europe’s Multimodal Challenge were announced. The challenge has been running all year, with entries covering a range of innovative applications, including personal assistants for mobile devices, an electronic entry doorman, and tools to aid the disabled.

The following entries for the challenge were submitted:
• Textpilot from Norwegian firm Include: This intelligent, multisensory software helps you read, compose, and comprehend text.
• Personal Health Assistant from Openstream: This multimodal mobile application permits users to create and update personal health records using multimodal interfaces. It combines reminder/alerting services with location-based searches, maps and directions, image/video capture and annotation, and group-share of selective information securely.
• SafeRise from FST21: This is an intelligent multimodal recognition system for automated real-time authentication processes related to building access. The system enables biometric recognition based on face, voice, and behavioral patterns, correlated with real-time live information and database cross queries. SafeRise provides convenient access to authorized tenants and approved visitors while maintaining the highest level of security.
• FlexT9 for Android from Nuance Communications: This combines voice and predictive input to give mobile consumers more control over how they send emails or text messages, search the Web, and update their social media status. By combining Nuance’s Dragon Dictation, T9 Trace, T9 Write, and XT9, FlexT9 permits users to switch between input methods with a simple tap. Supported languages include U.K. and U.S. English, French, Italian, German, and Spanish.
• Talk@TV from OceanBlue Software: This digital set-top box benefits users with visual impairments by speaking the on-screen program guide and menu items. Broadcast data, such as the narrative for programs, and static data, such as menu items and layouts, are clearly described by voice.
• Prime III from Prime Voting Systems: Prime III allows private and secure voting for people with disabilities by letting them use touch or voice, or both.
• “The Last Person on Earth Who Doesn’t Use Online Banking” from Garanti Bank: This project’s purpose is to increase the use of online banking by customers of Garanti, one of Turkey’s largest banks. The process begins by sending an email to customers, who click on a link that triggers a news video about the last person not using Garanti Internet Banking. During the video, the customer is called on his mobile phone and an interactive conversation begins, during which the customer is persuaded to use Garanti Internet Banking.
• MobileCareMyAssistant from Convergys: This blends technology with assisted services to deliver a more personalized, effective, and secure way for on-the-go customers to transact business whenever and wherever they would like.
• Angel Multimodal Application (Angel-MM) from Angel: Angel-MM is an iPhone application that lets users engage IVR systems in real-time multimodal interactions. Users can engage the IVR by speaking, pushing buttons, and listening, as well as by typing responses, reading text, and looking at images pushed to the user by the IVR.

“The Multimodal Challenge submissions demonstrated a variety of user input technologies, including keyboard, speech recognition, speaker recognition, face recognition, and special input devices,” says Jim Larson, SpeechTEK Europe program chair and a member of the Multimodal Challenge judging panel. “Our panel of multimodal experts evaluated each submission using the four criteria of originality, user experience, required user training, and quality. The judges unanimously voted Textpilot, from Include, as the winner.”

In addition to the Expert’s Prize, delegates on site at SpeechTEK Europe were invited to vote for their favorite application. The winner of the People’s Choice award this year was Angel-MM.

Speech of the Future: Mobile, Multimodal, and Multilingual

Vonage Integrates with Salesforce's Agentforce Voice

Lorikeet Launches Voice 2.0

Krisp Launches SDK for AI Accent Conversion

Kling AI Launches Kling Video 2.6 Model