The 2017 Speech Industry Star Performers: rSpeak
rSpeak Speaks with Greater Accuracy
Though rSpeak Technologies is based in the Netherlands, its family of embedded and server- and cloud-based text-to-speech (TTS) applications hails from around the world. As the number of available TTS voices grows and new languages are picked up, one of the company’s voices, Sophie (a female U.S. English voice), recently scored the highest overall against some of the market’s top TTS technologies in independent testing by Voice Information Associates on a corpus of nearly 1,600 phrases. The test corpus included numbers, homographs (two or more words spelled the same but not necessarily pronounced the same and having different meanings and origins), words of foreign origin, acronyms, abbreviations, proper names, and addresses. Of the 12 products tested, rSpeak’s proved the most accurate, at 98.6 percent. The average score for all units tested was just 79.2 percent.
rSpeak leverages a unique combination of statistical parametric speech synthesis using deep neural networks, voice adaptation, text normalization, lexical look-up, grapheme-to-phoneme modeling, prosody modeling, manuscript creation, voice talent recording, and acoustic database creation to develop its premium TTS voices. The company’s research and development efforts have resulted in advanced expertise in deep neural networks, deep learning for speech analysis, prosody models, and pronunciation modeling. And its TTS technology keeps learning, even after it is launched, thanks to an extremely agile feedback and update system.
The accuracy also comes from rSpeak’s use of a synthesis technique called unit selection synthesis. Although the resulting speech sounds very natural, it involves recording many hours of speech with a professional speaker, which is costly. And to avoid glitches at points where speech units are pasted together, the speech is recorded in a very neutral speaking style with little variation in pitch.
rSpeak’s 11 TTS voices cover English (U.S., U.K., and Australian), Spanish, French, German, Dutch, Swedish, and Italian and can be used to make online or offline content, multimedia applications, and embedded or mobile devices and platforms speak.
rSpeak offers its technologies as cloud services or application programming interfaces (APIs), including a Streaming Web API that allows users to access the voices in their web or mobile apps without bundling the full TTS engine. It also offers an online audio production environment and software development kits for the most popular platforms.
rSpeak Technologies was spun off from ReadSpeaker in 2012, though both companies remain part of the same family: rSpeak’s voices are part of ReadSpeaker’s multi-vendor language offering. ReadSpeaker most recently adopted rSpeak’s Italian and Spanish voices.