April 1, 2007
By Leonard Klie Editor, Speech Technology and CRM magazines
Features

The Art and Science of War

Nearly four years after U.S. troops first entered Iraq, still very few soldiers, sailors, airmen, and marines speak Iraqi Arabic or any of the other native languages and dialects spoken by the majority of people there.

U.S. Department of Defense efforts to step up language training for service members has produced few results, prompting the Defense Advanced Research Projects Agency (DARPA), based in Alexandria, Va., to look to technological fixes instead. DARPA reportedly has spent between $15 million and $20 million a year over the past five years to develop and field mobile translator technology, under the umbrella of the Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) and the Global Autonomous Language Exploitation (GALE) initiatives, because there simply aren't enough human translators to go around.

"We are short on translators, and sometimes you can't trust the locals to translate for us because of the possibility of conflicting loyalties," explains Dr. Mari Maeda, program manager of the Information Processing Technology Office at DARPA. "Also, it's an extremely dangerous job for locals to work for the U.S. military in Iraq, so not many people want to do it."

As a result, some members of the U.S. military stationed there are being equipped with voice-enabled translators that allow them to communicate with the locals—both friendly and hostile. With these devices, an average GI can speak commands, greetings, or hundreds of other English phrases into a headset or find them among an extensive list of entries in a preprogrammed, scroll-through menu; the device then automatically translates the phrase into any number of languages and either displays it on a screen or repeats it through an integrated speaker system.

Among the most widely distributed devices currently in use are the Phraselator from VoxTec International and the Voice Response Translator (VRT) from Integrated Wave Technologies. About 5,000 Phraselators and 1,500 VRTs are in use in Iraq, Afghanistan and the Pacific Rim right now among Army, Navy, Coast Guard, and Marine Corps forces. Phraselators were first deployed with U.S. forces in Afghanistan in the spring of 2002.

The devices have proven "very useful for the military in combat situations; they also help in rolling operations and force protection," says Timothy McCune, president of Integrated Wave Technologies. "They remove pulling the trigger as the first option. They can eliminate a lot of the misunderstandings that can occur so that a young soldier does not have to shoot first."

The technology got its start in law enforcement. The military climbed on board when several members of Congress saw the equipment in action among police departments and authorized funding for military development.

The devices also have found a home in the construction, healthcare, airlines, and airport security industries. A number of Native American Indian tribes are even using the devices to preserve their native languages and as teaching aids for future generations. (see sidebar on page 18).

"It is highly adaptable for paramedics, hospital triage, retail stocking or other situations and trains well with speakers who may have serious accents or speech impediments," says Capt. Kenneth Pence of the Nashville, Tenn., Police Department, one of the first users of Integrated Wave Technologies' VRT system.

"With the troops is where we see the most need right now," says John Hall, president of VoxTec.

The military version of VoxTec's Phraselator can be programmed for up to 30 language choices and features up to one gigabyte of storage capacity to hold about 30,000 phrases. The VRT can be programmed to store thousands of phrases in up to 125 languages and dialects, and also contains 15 minutes of prerecorded cultural awareness information for Iraq. The prerecorded phrases are contained within subdirectories covering seven mission types, including combat, medical, intelligence-gathering, and humanitarian aid. "The device is tailored for where and how it will be used. It can pretty much get a member of the military through whatever he would need to do," McCune says.

But before the devices could be approved for military use, they had to go through extensive design and redesign to meet certain criteria. Background noise reduction was one of the major criteria.

The VRT is able to work even with background noise levels of up to 100 decibels. To do so, though, it must be trained on one user's voice. "It has to be user-dependent," McCune says. "It has to latch onto one speaker's voice because he will invariably be in noisy situations with a lot of gunfire and people talking and yelling."

Phraselators are user independent. They rely on Nuance Communications' VoCon 3200 speech recognition engine and can accommodate an unlimited number of male or female voices and don't have to be trained to recognize just one. They use directional microphones that block out background noise.

The devices also had to be lightweight and small enough to fit on a servicemember's belt, yet rugged enough to handle the harshest of battlefield environments, terrains, and climates.

Both the Phraselator and VRT were originally designed as handheld devices using touch screens, but Integrated Wave Technology has since made its devices to operate hands-free and eyes-free using an integrated microphone and external speaker. For a soldier on a hostile mission, that means he doesn't have to take his eyes off the enemy or his hands off his weapon.

While the Phraselator is still a handheld device, VoxTec will begin production later this month of a hands-free version called the Squid.

So far, the equipment has received positive reviews from the U.S. troops using it. "The VRT system is one thing that I think every marine should carry. I am an important part of raids and cordon knocks because I can communicate to the people," says Lance Cpl. Josh Noble of Marine Expeditionary Force 3/2 Lima, one of many units stationed in Iraq.

While Integrated Wave Technologies promises between 95 percent and 100 percent accuracy with its VRT, "I am confident that if you speak articulately and stay within the domains for which the software was designed, you can get about ninety percent reliability," DARPA's Maeda says. "The problem is that it's preprogrammed with certain phrases, so you can't have free-flowing conversations. A soldier can't just say what he wants and [the device] can't understand what the Iraqi is saying back." Additionally, the devices "need to take better advantage of context-building so they can better address conversations in specific situations," she adds.

Videogame Training

U.S. service members are getting plenty of help in that regard with another piece of speech-enabled technology that is also being developed and fielded with funding from DARPA. Thousands of troops at U.S. training bases all over the world have been using a highly interactive 3-D videogame that simulates real-life social interactions involving not only spoken dialogues but also cultural protocols, non-verbal gestures and norms of politeness and etiquette. Developed by Alelo, the Tactical Language & Culture Training System uses computational models of language, culture and learning; artificial intelligence-based psychological situations that dynamically control the unscripted behaviors of the game's animated characters; and contextual, speaker-independent speech recognition.

The game has no shooting. Instead, trainees advance through the game's levels by completing assigned tasks with help from the game's Iraqi characters. If the user communicates properly—taking into account all the needed social and language skills, the game's characters cooperate and provide the answers needed to advance. Users speak to the game's characters through headset microphones, and get immediate feedback from the game in real time.

"The game is designed to make the troops functionally communicative, able to build a rapport with the Iraqis and have meaningful communications," explains Richard Koffler, president and CEO of Alelo. "It engages them in the language rather than having them just parrot it back."

The game has also been used to train U.S. troops in Pashtu, the native Arabic dialect spoken throughout Afghanistan.

Going Both Ways

Beyond the Battlefield and Base Camp
Translators are being used on the homefront to preserve native languages

Speech technology originally developed to help members of the U.S. military communicate with the locals in war-torn regions of the Middle East is being used by Native American Indian tribes throughout the western and Midwestern United States, Alaska, North Carolina, and western Canada to store their native languages as a teaching aid for future generations.

About 50 tribes are currently using the Phraselator, a speech-enabled, hand-held electronic translation device from VoxTec International, to store thousands of phrases from their native languages. The devices can also be used to store complete songs, stories, and prayers in their entirety in the native languages, and deliver them in CD-quality sound.

With the devices, users speak into a microphone and the device produces a prerecorded translation of the phrase. Users can also search for specific words or phrases using voice, touch screen, or scroll-down menus.

For the Native American tribes that are using the technology, Phraselator offers them a way to prevent their languages from becoming extinct, like a good number have already done. "About 500 years ago, before the first European settlers came to North America, there were about 300 different Native American languages spoken by tribes across the continent," explains Don Thornton, an Oklahoma Cherokee whose California-based company, Thornton Media, is working with the tribes to help them control their own language destiny. "Today, there are about 200 left, and given the age of those who still speak them, there might be only 20 or so left in about 20 years. That's how close to the brink many of these languages are." Tribes that are using Phraselators are developing language programs for the classroom, one-on-one instruction, and for use around the home. Many are training language apprentices, whose job it will be to train future generations in its use.

Wayne Wells, a member of the Prairie Island Dakota tribe in Minnesota, is filling that role for his people. He teaches his native language to a core group of students, ages 5 to 15, for about an hour every Tuesday night. He just signed on to begin using Phraselators in class.

Among the Prairie Island Dakota and other tribes, Thornton and tribal officials work with older tribal members who still speak the languages to program the Phraselators and record the native phrases onto the devices. Most people can generally input about 500 to 800 phrases per day, right at their kitchen tables, using little more than a computer and scrolling through the device's preprogrammed menus with a touch screen or toggle button.

"We started programming it in the morning and I was already using it that night in class," Wells says, whose first experience with his native tongue didn't happen until he took a class on it at the University of Minnesota. "My goal is to have my students able to teach others by the time they get to college. They already know a lot more of their language than I did at their age," he says.

To input the language into the devices, Thornton Media provides some general content and it is up to the individual tribes to program the rest as they go along. The company has created a set of training materials specifically for this purpose, and has even drafted a set of 1,000 phrases that it feels should be included on any device. "You'll never be able to put an entire language on the system, so you have to supplement it with other written and recorded materials," Thornton maintains.

For tribal leaders, using the Phraselator does present its share of problems, the greatest of which is cost. Units sell for about $3,300 apiece, and many tribes are struggling financially. And while it would be easier, and certainly cheaper, for groups of tribes from the same region to get together on a language project, most tribes and languages don't work that way. "The challenge that tribes have is that there may be hundreds of small reservations, and each one can virtually have a different language," Thornton says. "Sometimes a single tribe can have three or four dialects. They may speak the same language, but they speak it differently, and each tribe is very protective of those little differences.

"When dealing with language revitalization, all the little nuances are important, and there's no way anyone would move forward without protecting them," he adds.

Wells agrees, noting that there are many Dakota tribes in Canada, Minnesota, North Dakota and South Dakota, and each has its own dialects and cultural differences. "We really wanted to preserve our specific dialect of the larger Dakota dialect," he says. "We also wanted to fuse our ancient language with the new, contemporary terms within the language."

Another difficulty with preserving many Native American languages is that often the spoken and written words don't match phonetically. Sentence structure and grammatical constructions are also quite different from English. That's why it's important for students of the languages to hear the phrases through the Phraselators with a human voice. "It definitely helps in the training to have a familiar voice," Wells maintains.

"There's been a lot of success with the tribes, and [Phraselators] are really taking a hold in the Native American community," says John Hall, president of VoxTec International.

"Reviving a language is not an easy thing to do, and many linguists think it's impossible," Thornton says, "but a lot of tribes are committed to doing it and the young people are taking a real interest. Tribes that take steps now have a better chance at success. It only becomes harder the longer you wait."

"The Phraselator won't save our language, but it will be a critical tool in preserving it," Wells says.

VoxTec and Integrated Wave Technologies, as well as a few other competitors, are also working with DARPA to develop two-way translators that will also convert foreign phrases back into English. Among others in this market are IBM and SRI International, both of which are currently field-testing advanced two-way speech-to-speech translation software with the U.S. Joint Forces Command in Iraq. IBM's solution, which it first shipped to the Middle East in October, is called Multilingual Automatic Speech-to-Speech Translator (MASTOR), and it can recognize and translate a vocabulary of more than 50,000 words in English and 100,000 words in Iraqi Arabic. SRI's solution, Iraqcomm, has been in the field since last spring and features a vocabulary of nearly 40,000 English words and 50,000 Iraqi Arabic words.

Both Iraqcomm and MASTOR integrate automatic speech recognition, machine translation, natural language understanding, and text-to-speech synthesis technologies. To initiate the dialogue, users speak into an integrated microphone; the software then recognizes and translates the speech and vocalizes the translation in the target language. The foreign language speaker can also speak into the microphone in his own language, and have it translated and vocalized into English. Translations can also appear on screen as text.

The two software solutions can run on desktop, laptop, or tablet PCs. The solutions currently being piloted in Iraq are running on ruggedized laptops.

In developing MASTOR, which started in 2001, IBM did a lot of work to effectively mitigate the impact of speech recognition errors and non-grammatical inputs common in conversational speech. MASTOR's innovations include methods that automatically extract, store, and act upon the most likely meaning of the spoken phrase; methods for statistical natural language generation of that phrase into the target language; generation of proper inflections by filtering hypotheses with a statistical language model; and algorithms that provide high accuracy in noisy environments.

Iraqcomm's innovations include an archiving capability that allows conversations to be logged for later study or review, an interface that provides a shortcut menu of frequently used phrases, and the ability to change the phrase to be translated by editing it with the keyboard or by selecting a similar sounding phrase with a single button press. In addition to SRI's own DynaSpeak voice recognition technology, IraqComm uses technology components from Language Weaver and Cepstral.

DARPA's goal in piloting MASTOR and Iraqcomm is to capture user feedback and use the operational test results to provide input on automated translation system requirements and performance expectations. "Our goal is to enable mobile units operating in areas where human interpreters are scarce to communicate effectively with speakers of different languages in real-world tactical situations," Wayne Richards, deputy branch chief, U.S. Joint Forces Capabilities Division, said in a statement. "Additionally, USJFCOM and DARPA seek to significantly improve our nation's capability in emerging languages and dialects to include low-density languages in other potential conflict areas."

Despite early success with the software, though, full two-way translation capabilities are probably a few years away. "To build a reliable two-way device, we'll need a lot of data—English and Iraqi speech patterns, phrases, dialects, situations, etc. We're making real progress, and I am confident that we'll have a usable two-way device in about five year's time," DARPA's Maeda says.

The technology "still needs to be finetuned," Hall adds. "There are a few things in prototype right now, but most of them run on laptops, and we're all working with DARPA to scale them down, maybe on something like a PDA."

Other devices also exist that will translate from Arabic into English. Fonix Speech and Brilliant Systems, for example, have already begun marketing to Middle Easterners a hand-held speaking dictionary, encyclopedia, and translator. The device, the 7700 Super, is powered by Fonix' DECtalk text-to-speech engine and allows users to hear spoken definitions of Arabic words in English or French. Users select the TTS voice from among prerecorded male, female, or children's voices, and can alter the pitch and speed of the voice as well.

The device features prerecorded Arabic words and phrases, as well as general, scientific, medical and economic dictionaries, a thesaurus, electronics and computer references, trading and banking references, geography references, an information manager, and even a few games.

At What Cost?

In fielding such equipment, though, the U.S. military in general and DARPA in particular have met with harsh criticism. Such devices can be expensive, as a single unit can cost up to $3,300.

Critics have argued that that's money that could be better spent on language training for the troops. Maeda would be the first to agree, and notes that service members have been taking language classes at the Defense Linguistics Institute and are using language training software like Alelo's Tactical Language & Culture Training Systems. She also says it's hard to predict where U.S. troops will be called upon to serve next, and therefore, critical language needs can change at a moment's notice. "We have to have translation services in place right away. We can't wait two years" for the troops to learn the language before deploying there.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Art and Science of War

Ethical Implications of Voice Generation

Driving Speech Technology Trends with AI

More Web Events

Hona Launches Voice AI

AI Virtual Assistants Market to Hit $2.45 Billion by 2030

Wispr Launches Wispr Flow for Windows

Microsoft Releases .NET MAUI Toolkit V. 11 with Offline Speech Recognition