One System, Many Languages
Multilanguage speech systems are not new. Nevertheless, we do not find many that can work with more than two languages at once.
Still, many countries and regions have people speaking three or four languages. These usually have a growing population of immigrants who need to be accommodated by financial institutions, government services, etc.
One example is my native Israel, with a population of 7.5 million people. Though Hebrew is the national default language, at least 1.5 million people speak Arabic, 1 million speak Russian, and English speakers also form a community. Most self-service commercial and government systems support these four languages, if not more.
But the truth is that not many multilanguage speech systems exist. In the United States, we find mainly English/Spanish systems. In Europe, we might expect to find speech services with multiple languages, but that isn’t the case. Few speech systems work with more than two languages. Call centers today need to hire multilingual customer representatives.
Further examination of touch-tone-based systems with multilanguage service shows that 80 percent of the interaction is done in the default language, and the remaining 20 percent is divided among the others.
The question is whether this is true only of touch-tone interaction or also when one needs to speak the default language—the well-known difference between competence and performance. It seems that call center operators do not take chances with this issue.
Maybe a way to answer that is to look at the challenges in designing and developing a multilingual system from the technological and user interface aspects. In terms of system components, the acoustic model can be adjusted to accommodate similar languages, but the language model, grammars, and lexicons are entirely different. Hebrew, English, Arabic, and Russian couldn’t be more different in structure, acoustic model, language model, and cultural tendencies.
From a technology standpoint, Israel needs to support four different applications on a single platform, assuming the same engine supports them all. If such a speech recognition engine exists, then a text-to-speech engine to complement it is nowhere in sight. That means you have to record your prompts and data, and if you need to combine prompt with data, then you have a quality problem.
In designing these four applications, one may ask how different one is from the other. Clearly they aren’t very different on the technological side, but they are on the user interface side. Parts of the application can be replicated, but they definitely need to be redesigned with a new interface. To maintain the quality of the existing interaction, a rich and grammatical language should be designed with big enough coverage. This requires a skilled linguistic team experienced in many languages, but then not only does the project’s complexity grow, so do the costs.
We must deploy multilanguage speech systems with all languages at once so we don’t confuse customers. This is the ideal way to progress.
Our responsibility as designers is to make sure complexity and costs do not stop projects from happening. We need to find a way to accommodate customer needs in the best way possible while justifying the project’s cost. I would suggest the following steps:
- Set priorities. Discover which language covers the highest percentage of use and treat that as the top-priority application.
- Research your customers and their cultural sensitivities. Perform usability tests to identify their behaviors when they need to interact in the default language. Conduct focus groups to check their reactions to the set of priorities that was established.
- Plan according to ROI. Choose a gradual path that will justify the cost by serving 80 percent of the population, and then build a gradual plan to add the other languages according to their ROI. In this way you can justify the cost by returning some of the investment before continuing to the next step.
- Tune. Additional value is achieved by dividing the project. One can examine the behaviors and results on one or two languages, and then tune and modify the system accordingly before adding the others.
- Design a flexible platform. Allow the gradual growth of the system in the most transparent way so you can add modules in an integrative fashion.
- Involve marketing in the process. Create a synergism between the needs of the customer and the organization.
The challenge presented is not simple, but it is one more step toward the evolution of speech systems, which is constantly happening thanks to the evolution of service demand and the importance of the customer experience.
Nava Shaked, Ph.D., leads a professional practice focusing on voice applications. She is a member of the AVIOS board and established the Israeli local chapter, of which she is the chairperson. She can be reached at nava@navashaked.com.