May 21, 2002
By Judith Markowitz Principal - J. Markowitz, Consultants
Forward Thinking

Speaking in Tongues

We live in a global world where it is no longer unusual for even a small business to market its goods and services globally. Wireless networks are expanding the global reach of business by making it possible to provide telecommunications services to areas that could not have wired services. The Internet, which is one of the cornerstones of the global marketplace, is being hooked to telephones, increasingly sophisticated hand-held devices and wearable systems. One of our biggest challenges in this global marketplace is that the communications channels are becoming multilingual and, increasingly, one must think about reaching customers and partners in their languages rather than in English. Technology and Applications
The number of speech recognition and synthesis systems that support languages other than English is growing at a rapid pace. It is no longer difficult to find vendors who offer speech recognition systems in a number of European and Asian languages. On the synthesis side, there are established parametric TTS synthesis systems covering all major European languages, and we are starting to see concatenated synthesis systems that support multiple languages as well. Although the base technology is moving forward quickly, significant challenges still remain at the application level. Having a speech system capable of serving speakers of two or more languages is not simply a matter of appending additional languages to an English-language system. Furthermore, the issues that must be addressed go well beyond the core technology and are affected by cultural, political and usability concerns. For Swahili, press…
One of the questions that a multilingual system must address is how to make users aware of the availability of more than one language. Traditionally, this question has been solved by placing a prompt at the start of the interaction, asking the user to "say” the name of a language or to “press” a touch tone key to select a language. Some systems establish one language as a default and offer other languages as alternates in the initial prompt. Sometimes the selection of a default language is politically charged, such as in Canada where companies risk angering one group of customers if they set English as the default instead of French, and vice versa. Parlez Usted Deutsch?
Some speech recognition systems are beginning to implement language identification technology to determine the language that the user is speaking. Using language identification does not solve the problem of how to let users know which languages the system understands, but it makes the system more responsive to use of a language. Some systems with language identification are also designed to switch seamlessly from language to language. This is useful when, for example, the wrong language is selected. Another benefit of such polyglot systems is the ability to support speakers who mix languages within and between utterances. Multilingual utterances are not unusual in geographic areas where two or more languages are spoken or where a third language, such as English or Swahili, is used as a lingua franca. Selection of the proper language to use is only one facet of the multilingual challenge. In areas and situations where English is used as a lingua franca (as it is in many technical conferences) the speech acoustic models for English speech recognition need a great deal of data about how non-native speakers talk. This need is what spurred the creation of the The Translanguage English Database (TED), which recently made the news when the European Language Resources Agency (ELDA) and the U.S. Linguistic Data Consortium (LDC) agreed to co-operate more on the distribution of TED and other spoken language databases/corpora. TED is a corpus of recordings made of conference presentations in English by non-native speakers Localization
Usable interfaces need to be culturally appropriate as well. Design of culturally appropriate interfaces takes into account differences in word and phrase selection, vagueness, ambiguity, formality, dialect, voice selection and many other facets that are bundled into the term “localization.” As the world moves towards globalization where non-English and multilingual systems become the norm, localization will begin shifting into the foreground. This is not to say that the challenges involved in designing culturally sensitive IVR systems are not already significant. They are, but the incorporation of highly flexible, automatic approaches to speech recognition, such as statistical processing, will bring with it far greater need to address linguistic and cultural complexity. The apparent naturalness of statistical processing makes it appear as if those systems actually understand what is being communicated. They do not. Furthermore, a statistical processing/N-gram system is only as good as the data given to it. Certainly, the fact that their models are constructed from data can make them more sensitive to cultural and linguistic miscues than manually developed IVR systems. This means that in order to have reasonably good N-gram systems, the developers need to make sure that linguistic and cultural knowledge is going into the data used to construct the system models and that such knowledge is part of other analytical elements of those systems (e.g., goal structures). Ultimately, what is needed is a true partnership of approaches: linguistic, cultural, statistical and object-oriented. That partnership can be part of the power of speech-processing solutions for the global marketplace.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Companies and Suppliers Mentioned

Speaking in Tongues

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Conversational AI to Reach $41.39 Billion by 2030

Deepgram Launches Voice Agent API