NVIDIA Releases RAD-TTS

NVIDIA last week at the Interspeech 2021 conference introduced RAD-TTS, a technology that lets individuals use their own voices to train artificial intelligence systems for pacing, tone, pitch, and other qualities. With RAD-TTS, users can also deliver one speaker's words using another person's voice.

RAD-TTS also lets users go frame-by-frame to fine tune the synthesized voice to emphasize or de-emphasize specific words, modify the pace of the narration, alter the pitch, and more.

"With this interface, our video producer could record himself reading the video script and then use the AI model to convert his speech into the female narrator's voice. Using this baseline narration, the producer could then direct the AI like a voice actor — tweaking the synthesized speech to emphasize specific words, and modifying the pacing of the narration to better express the video’s tone," an NVIDIA executive wrote in a recent blog post.

NVIDIA is distributing the RAD-TTS product via open source through the NVIDIA NeMo Python toolkit.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

NVIDIA Releases RAD-TTS

Meeting the Rising Demand for Voice-Based Biometric Systems

Avatar Platforms in Customer Service

Ethical Implications of Voice Generation

More Web Events

Speech’s Next Big Thing Is Moving Fast

Voice Is Poised to Take a Quantum Leap

Eliminate Ambient Noise to Make Speech Recognition More Accurate

Standards for Openness in AI Models: The Model Openness Framework