Amazon launches Nova Sonic, a fast and natural-sounding AI voice model

HIGHLIGHTS

Nova Sonic is 80% more cost-efficient and faster than OpenAI’s GPT-4o.

The model powers Alexa+ and supports real-time speech transcription.

It will rival Google and OpenAI's latest AI voice model.

Amazon launches Nova Sonic, a fast and natural-sounding AI voice model

Amazon has officially introduced Nova Sonic, the next gen generative AI voice model. The model is designed to deliver highly natural-sounding speech along with real-time voice interaction and industry-leading speed, with an aim to compete with OpenAI’s and Google’s latest AI voice technologies.

Digit.in Survey
✅ Thank you for completing the survey!

Nova Sonic is perfect for enterprise AI applications because it is integrated into Amazon’s Bedrock developer platform and accessible through a bi-directional streaming API. Amazon positions the model as the most economical option among frontier voice models, claiming that it is not only faster than OpenAI’s GPT-4o but also 80% more economical.

“Available via a new API in Amazon Bedrock, the model simplifies the development of voice applications, such as customer service call automation and AI agents across a broad range of industries, including travel, education, health care, entertainment, and more,” the company added in a blog.

Also read: Samsung Galaxy Z Flip 6 price drops by Rs 26,519 on Amazon

According to a TechCrunch report citing officials, Nova Sonic is more accurate compared to other voice modes. The report continued that the benchmark measuring speech recognition across languages and dialects is almost accurate with the following results:

  • A 4.2% word error rate (WER) across five languages on the Multilingual LibriSpeech benchmark.
  • A 46.7% improvement in speech recognition accuracy over GPT-4o in noisy, multi-speaker scenarios.
  • An industry-best average latency of 1.09 seconds, beating OpenAI’s Realtime API at 1.18 seconds.

The enhanced digital assistant Alexa+ from Amazon is also powered by Nova Sonic, which improves its capacity to manage natural conversations, determine mumbled or noisy speech, and react with timing that is human-like. For productivity, accessibility, and customer service applications, the model even produces real-time transcripts of user speech, creating new opportunities for developer integration.

Meanwhile, Amazon also wants to introduce more models that can understand and interact with multiple modalities including voice, vision, and sensory data as its long-term plan. However, the exact details of the upcoming AI plans by the brand remain unknown at the moment.

Ashish Singh

Ashish Singh

Ashish Singh is the Chief Copy Editor at Digit. He's been wrangling tech jargon since 2020 (Times Internet, Jagran English '22). When not policing commas, he's likely fueling his gadget habit with coffee, strategising his next virtual race, or plotting a road trip to test the latest in-car tech. He speaks fluent Geek. View Full Profile

Digit.in
Logo
Digit.in
Logo