Meta Develops Speech-to-Speech Translation for Oral Language
Meta last week intoduced a speech-to-speech translation system powered by artificial intelligence for Hokkien, a primarily oral language spoken within the Chinese diaspora.
The translation system is part of Meta's Universal Speech Translator project, which is developing AI methods to allow real-time speech-to-speech translation across many languages.
Since Hokkien doesn't have a standard written form, producing transcribed text as the translation output doesn't work. So, Meta focused on speech-to-speech translation. The company developed a variety of methods, such as using speech-to-unit translation to translate input speech to a sequence of acoustic sounds, and generated waveforms from them. It also relied on text from a related language, in this case Mandarin Chinese.
The Hokkien translation model is still a work in progress and can translate only one full sentence at a time.
Along with it, Meta also released SpeechMatrix, a large collection of speech-to-speech translations developed through its natural language processing toolkit called LASER. These tools will enable other researchers to create their own speech-to-speech translation systems.