-->

Deepgram Launches Aura-2 Text-to-Speech Model

Article Featured Image

Deepgram, a voice artificial intelligence platform provider, has launched Aura-2, its next-generation text-to-speech (TTS) model for real-time voice applications in mission-critical business environments.

Engineered for clarity, consistency, and low-latency performance, and deployable via cloud or on-premises APIs, Aura-2 enables developers to build scalable, human-like voice experiences for automated interactions, including customer support, virtual agents, and AI-powered assistants. It is is built on Deepgram Enterprise Runtime infrastructure.

With Aura-2, Deepgram extends its enterprise speech technology to TTS, enabling businesses to deliver natural, responsive, and contextually accurate conversations at scale. It ensures precise handling of industry terminology, accurately pronouncing healthcare terms, financial jargon, product names, and complex numerals without special tagging.

Aura-2 includes more than 40 voices spanning U.S. English and localized accents and voice personas, from empathetic and charismatic to calm and professional. It intelligently adjusts pacing, pauses, tone, and expression based on context, whether delivering a phone number, handling a support escalation, or navigating a transactional interaction. It also supports thousands of concurrent requests while maintaining consistently low latency and high-quality speech output across high-volume deployments.

"Our customers need more than just voices that sound good; they need voices that communicate precisely and reliably in professional contexts," said Scott Stephenson, CEO of Deepgram, in a statement. "Aura-2 delivers the perfect balance of natural speech and enterprise-grade accuracy, enabling organizations to create voice experiences that truly enhance customer engagement while maintaining operational efficiency."

Key capabilities of Aura-2 include the following:

  • Automated model adaptation that continuously improves performance through high-value data curation, synthetic data generation, and automated training, allowing speech models to evolve.
  • Model hot-swapping that enables instant model changes in production, supporting real-time personalization and rapid iteration.
  • Extreme lossless compression that significantly reduces compute load and operational costs without compromising quality.
  • Flexible deployment, with support for public cloud, private cloud, and on-premises environments.
  • Interruption handling and end-of-thought detection that support dynamic, overlapping speech patterns.

By running on the same enterprise runtime that powers Deepgram's Nova-3 for speech recognition and the Voice Agent API for conversational AI, Aura-2 benefits from shared learning, unified deployment, and a seamless developer experience. This unified architecture enables continuous cross-model learning, where improvements in speech recognition automatically enhance speech synthesis through the shared runtime. As the platform learns and adapts to specific industry terminology and user interactions, it transforms isolated voice components into a cohesive voice AI platform that strengthens with every interaction.

"Our years developing Nova-3 and other STT models gave us deep insight into real-world speech patterns," said Natalie Rutgers, vice president of product at Deepgram, in a statement. "With the Enterprise Runtime, Aura-2 directly leverages our acoustic models and pronunciation datasets to deliver precise, industry-specific speech synthesis in real time."

SpeechTek Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Deepgram Launches Aura, a Text-to-Speech API for Real-Time Conversational Voice AI Agents

With Deepgram Aura, developers can now build production-grade, secure, and human-like voice AI applications.