Inside SRI's STAR Labs

In January, SRI International received its second U.S. patent for voice-to-database (V2DB) technology, which enables one-step, voice-based access to items in multimillion-record databases. Many people have heard of SRI, originally Stanford Research Institute, a nonprofit, independent R&D organization based in Menlo Park, Calif. However, unless you are familiar with the birthplace of Nuance Communications, which started at SRI in the early 1990s, or Discern Communications, which was acquired by Spanlink, you might not know the depth of research that SRI does in speech technologies.

Created in SRI’s Speech Technology and Research Laboratory (STAR), the latest V2DB patent is one of about 50 patents in SRI’s portfolio. Since 1946, SRI, and eventually STAR, has been conducting client-sponsored research, licensing its technologies, forming strategic partnerships, and creating spin-off companies like Nuance and Discern.

With a staff of approximately 25 engineers, computer scientists, and linguists, STAR does research for the government and commercial sectors in all areas of speech technologies. Besides development of high-accuracy large-vocabulary systems and speaker-independent speech recognition, much of its emphasis has been on some of the more vexing problems that speech technologies can solve. For instance, it is developing speech recognition for use in challenging audio environments where recognizing the intended speaker is difficult. STAR has also honed speaker identification and verification techniques for use in biometric applications. In natural language recognition, STAR projects have included pronunciation scoring for language learners to produce pronunciation rankings that mimic the judgment of native speakers, better detecting the end of an utterance by using more human-like end-of-speech detection rather than silence, and researching emotion detection in speech.

STAR also concentrates on creating speech recognition and automatic translation. Another focus area is acoustic modeling for specialized populations, such as non-native speakers and children. In fact, its EduSpeak software development kit, a high-accuracy, speaker-independent continuous speech recognition system for computer-based training and learning, offers outstanding accuracy for children ages 4 and up. EduSpeak has been licensed by Benesse, a Japanese corporation that creates software to help Japanese children learn English.

EduSpeak is only one of a number of technologies resulting from research projects that SRI licenses to companies wanting to incorporate speech technology into their products. Other examples include the SRI Language Modeling Toolkit, a toolkit for building and applying statistical language models, primarily for use in speech recognition, statistical tagging, and segmentation; and DynaSpeak, a small-footprint, high-accuracy, speaker-independent speech recognition engine for industrial, consumer, and military products and systems.

Current Projects
STAR currently has more than a dozen research projects in progress. One prominent example is the Global Autonomous Language Exploitation project, the largest unclassified speech and language program ever funded by the U.S. Department of Defense’s Defense Applied Research Projects Agency (DARPA). GALE’s goal is to produce a system that takes broadcasts and text documents, distills the most pertinent pieces of information from an inquiry, and translates them into English. The project includes Arabic, English, and Mandarin.

Another high-visibility project is IraqComm, funded by DARPA under the Translation System for Tactical Use (TRANSTAC) program. IraqComm is a speech-to-speech translator that enables conversations between an English speaker and an Arabic speaker, facilitating greater communication for military forces in Iraq and Afghanistan while minimizing risk to the limited number of skilled translators available.

Dozens of companies and organizations use and benefit from research done in the STAR Laboratory, fund research projects at SRI, or license resulting technology, although the general public may not be aware of the origins of the underlying speech technologies. However, if you look at some of the more publicly visible products produced by spin-off companies such as Nuance and Discern or highly beneficial projects such as IraqComm, STAR is clearly doing some really great work.

Nancy Jamison is the principal analyst at Jamison Consulting. She can be reached at nsj@jamisons.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Inside SRI's STAR Labs

Gladia Launches Solaria, a Multilingual Speech-to-Text Model

OpenAI Introduces Speech-to-Text and Text-to-Speech Audio Models

aiOla Launches Jargonic Speech Recognition Model

XL8 Delivers Real-Time Spanish Translation Captions to U.S. Public Broadcasters