Better than Google Gemini and ChatGPT? Indian startup Sarvam AI claims to beat global models

HIGHLIGHTS

Sarvam AI’s Bulbul V3 delivers more natural, stable speech for Indian languages, accents, and mixed-language conversations than global rivals.

In independent blind listening and telephony quality evaluations, Bulbul V3 outperformed several competitors and showed advantages over Google Gemini and ChatGPT in select tasks.

Launched ahead of the India-AI Impact Summit 2026, Bulbul V3 strengthens India’s homegrown AI ecosystem, with real-time speech, enterprise features, and consent-based voice cloning.

Better than Google Gemini and ChatGPT? Indian startup Sarvam AI claims to beat global models

Bengaluru-based startup Sarvam AI has recently launched Bulbul V3, which is a new text-to-speech model designed for Indian languages, accents, and real-world use cases. The company says the model delivers more natural and stable speech than global rivals and has already outperformed tools from Google and OpenAI in key evaluations. With Bulbul V3, Sarvam is positioning itself as a serious player in voice AI, an area long dominated by US-based companies. Moreover, Bulbul V3 is one of several tools Sarvam has launched in a 14-day rollout ahead of the India-AI Impact Summit 2026 in New Delhi. The startup is also among the 12 entities selected under the Rs 10,300 crore India AI Mission, where sovereign Indian AI models are expected to be unveiled later this month.

Digit.in Survey
✅ Thank you for completing the survey!

Also read: Google Pixel 10a India launch soon: Price, pre-order date, specs and more

Sarvam says Bulbul V3 is designed around the realities of Indian speech. People often mix languages in a single sentence, pronounce the same word differently across regions, and use names or expressions that global systems struggle to handle. According to the company, Bulbul V3 manages these challenges without breaking flow or meaning.

As per the reports, the model is capable of generating speech with natural pauses, emphasis, and pace. Furthermore, it also supports real-time audio output, which is useful for live conversations, call centres, and interactive apps. Sarvam says that the fast response time is highly important in such settings, as delayed responses can hurt the user experience.

Also read: OpenAI co-founder says agentic engineering is the next big thing in AI coding

Bulbul V3 was tested by an independent third party through blind listening studies across 11 languages. Human listeners compared audio clips from different AI models without knowing which system produced them. While ElevenLabs ranked highest in overall sound quality, Bulbul V3 beat competitors like Cartesia Sonic-3 in general evaluations.

Sarvam also said Bulbul V3 performed best in telephony quality tests, which are important for phone-based services. The model showed fewer skipped words and mispronunciations compared to rivals. In related document and speech tasks through Sarvam Vision, the company has earlier claimed better results than Google Gemini and ChatGPT on certain benchmarks.

Also read: Apple iOS 26.4 beta may release this month with smarter Siri: Check details

The new model also allows users to create custom AI voices through consent-based voice cloning. Sarvam says the feature includes safeguards and is built for large enterprise use. Developers can access the model through the Sarvam Dashboard, with unlimited API usage available until February 28, 2026.

Bhaskar Sharma

Bhaskar Sharma

Bhaskar is a senior copy editor at Digit India, where he simplifies complex tech topics across iOS, Android, macOS, Windows, and emerging consumer tech. His work has appeared in iGeeksBlog, GuidingTech, and other publications, and he previously served as an assistant editor at TechBloat and TechReloaded. A B.Tech graduate and full-time tech writer, he is known for clear, practical guides and explainers. View Full Profile

Digit.in
Logo
Digit.in
Logo