Speech and Language Technology: Going Global, Thinking Local
In China, Tom.com, a voice portal, provides virtually anytime, anyplace automated access to stock, entertainment and weather information in Mandarin - in a country where the presence of cell phones outpaces the number of PCs by far. In Japan, ASIMO, a walking, talking robot, charms visitors in Japanese and moonlights as a celebrity host at public events. In Germany, bankers at Deutsche Bank receive English language research reports from their colleagues in London and have those translated into German using machine translation so they can read them more quickly. In the U.S., T Rowe Price plan participants can call a virtual account rep to get the account information they need. Participants can steer the conversation where they want it to go thanks to natural language understanding, rather than having to wade through lengthy menus to get what they want. After years of expectation, speech technology is fulfilling its promise. Faster chip speeds and more sophisticated algorithms mean voice recognition is performing better than ever before. New speech-enabled applications are hitting the market as businesses and consumers realize that voice is the most natural way to access information from the Internet, mobile phones, car dashboards or handheld organizers. Voice technology may have started with desktop computers, but today, speech is making its way beyond talking to desktop computers to the various touch points of an increasingly mobile e-business world. The question is, how can this be deployed across the globe? With all that information technology offers today, speech and language technologies are perhaps the most dependent on cultural context. Voice technology, which for a long time has been confined to research, is now putting a natural interface on the computing environment, from end-user devices to infrastructure behind the scenes - crossing national boundaries. Worldwide spending on voice recognition will reach $41 billion by 2005, according to the Kelsey Group, a market research firm. There are several forces driving the growth: ·
Companies view voice as a way to improve service from their call centers while also reducing costs. Voice recognition allows companies to use automation to serve customers over the phone, 24/7, without subjecting them to hold times or requiring people to respond to rigidly structured menus. Then there are the business savings: a typical customer service call costs $5 to $10 to support; automated voice recognition can lower that to 10 to 30 cents. The market research firm Datamonitor says call center managers are seeing an increase in customer acceptance of automation and self-service, along with cost savings. ·The rise of telematics, which combines computers and wireless telecommunications with motor vehicles, provides customized services such as driving directions, emergency roadside assistance, personalized news, sports and weather information, and access to e-mail and other productivity tools. The Kelsey Group predicts U.S. and European spending on telematics will exceed $6.4 billion by 2006. ·Companies looking to voice-enable the Internet and their IT establishments, whether it's providing information to consumers through "voice portals" or allowing employees to access corporate databases through spoken commands over the phone. ·The ability to squeeze convenient speech recognition into ever-smaller devices, such as phones, PDAs and other mobile devices. This is happening not only in the U.S. but across the globe. For the most part, companies looking to deploy voice face a lot of similar issues. They want to know what business applications will bring more value to their customers and set them apart from their competition. Of course, the underlying question is, is the technology available in their language? But because the goal of speech is to put a natural interface across technology and lower communications barriers, language is not the only consideration speech providers need to bear in mind. Cultural context is key. Speech providers need to consider not only whether specific applications will transfer across borders, but also how people in a country are likely to ask questions or request services, demographic variations, and what type of technology they are likely to warm toward. This can differ not only by country, but also within regions in a country. Take something as basic as demographics for example. Even in a largely English-speaking country like Singapore, English accents vary. Data collected for automatic speech recognition show that older Singaporeans, influenced by a largely British education system, speak with a UK English accent. Younger ones, who've grown up on MTV and American movies, lean toward U.S. English. Another factor is language input. Because of the difficulty of Chinese character input on keyboard, Chinese software developers have brought to market applications using a combination of dictation, keyboard and pen input. Having the option of using more than one input method gives users added flexibility and convenience. Apart from language, we also need to consider the extent to which technology is embraced. Go to Akihabara in Japan and you'll find a proliferation of devices and gadgets. Japanese teenagers and even adults are glued to wireless services for communication, entertainment and their social interaction. Speech-enabled toys and games become a natural fit. A video game called Seaman, for example, has players interacting with a character that looks like a cross between a man and a fish. You talk to it; it talks back and asks you pointed questions. To the average American, the game might seem slow, even tedious. Another example is ASIMO, Honda's "humanoid robot", which the company rents out at functions and events. The robot speaks Japanese, walks and understands voice commands and can be controlled by an operator tens of yards away by voice. It also knows which direction to face when a person talks to it. Honda says this is one step toward building robots of the future that "work in harmony" with people. In Europe, especially Northern Europe, where countries have a much higher rate of wireless usage, telecommunications companies are looking to deploy value-added, Internet-related services to keep and grow customer loyalty. Apart from Web-related, wireless transactions, companies are interested in applications including services such as personal dialing assistant, where you can connect calls using your voice. Businesses are also keen to deploy mobile workforce applications, for instance, where salespeople in the field can access price lists, order information, transact wirelessly and by voice or use multimodal applications. In Europe, more than anywhere else, there is a pent-up demand for transactional capabilities using both voice and, in the future, multimodal devices. This way, salespeople can ask for product specifications and have the information returned to them as a graphic. What companies want are robust voice applications for constant business use. The combination of voice and machine translation technology used in tandem should not be overlooked. In the coming years, machine translation technology will bring an added dimension to speech, as the two are coupled together. Taken together, speech and language technologies are set to grow, and have the potential to bring business value and differentiation to companies worldwide. Deutsche Bank already deploys machine translation to do gist translations of research reports and emails they receive from their English-speaking counterparts. Gist translation gives its readers enough accuracy to understand the meaning of the document, without the added polish of text edited by a human translator. The banks staff feel they are more productive reading the material in German than English much of the time. Taking this further, European developers and service providers are looking at how to offer machine translation over wireless networks and devices - and what types of services the public, as well as corporations, would subscribe to. In the U.S., businesses are voice-enabling their call centers to perform simple, repetitive tasks so that their live agents can be put to more complex, value-added work. Natural Language Understanding (NLU) technology has helped participants in T Rowe Price's system, for example, to be receptive to using voice for simple inquiries. NLU not only allows end-users to speak naturally to the system rather than be bound by menus, it also enables the system to understand context. After you ask, for instance, for the price of the Franklin Templeton Growth Fund, you can ask for the objective of "that" fund and get your question answered. There is no doubt that businesses here and abroad are looking to speech technology to help them do business more efficiently, save costs, and offer better customer service. Speech technology has come a long way, and has become a practical way to implement a range of applications. Base technology is becoming more robust, with improved algorithms and chip speeds, and with devices having as much power as the laptops of yesteryear. Speech will increasingly be part of critical applications, and has the potential to become the means by which large segments of populations access technology naturally and easily. The industry now needs to focus on making the interface more natural, tailoring it to fit specific cultural differences. Only then will speech move rapidly across cultural and national boundaries and become truly pervasive.
Ozzie S. Osborne is general manager of IBM Voice Systems.