-->

Deepgram Introduces Nova-3 Speech-to-Text Model

Article Featured Image

Deepgram, a voice artificial intelligence platform provider, has launched the Nova-3 speech-to-text (STT) model for AI-driven transcription. Its full-featured platform and high-performance runtime include automation and data capabilities, such as synthetic data generation and model curation, along with model hot-swapping and integrations.

Nova-3 leverages an advanced latent space architecture to encode complex speech patterns into a representation for superior transcription accuracy, even in noisy or specialized settings, such as the following:

  • Adverse acoustic conditions, like distant, noisy, and multi-speaker scenarios.
  • Real-time multilingual support.
  • Industry-specific and domain-specific terminology for specialized fields like medical and legal transcription.
  • Precision data handling, with numeric recognition for retail, banking, and finance while supporting real-time redaction of sensitive information for compliance and data privacy.

"Nova-3 represents a significant leap forward, extending the frontier of real-time accuracy while once again bending the cost curve—two critical components for enterprise speech-to-speech use cases," said Scott Stephenson, CEO of Deepgram, in a statement. "By integrating advanced architectural enhancements and extensive training across diverse datasets, we've developed a model that not only meets but exceeds the evolving needs of our clients across various industries."

Nova-3 allows users to fine-tune the model for specialized domains, and with the addition of Keyterm Prompting, developers can instantly improve transcription accuracy by optimizing up to 100 key terms.

In independent benchmarking, Nova-3 achieved a word error rate of 5.26 percent. In streaming, its word error rate was 6.84 percent.

SpeechTek Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues