OpenAI has introduced three new realtime voice AI models, which are designed to help developers create smarter and more natural voice-based applications. The new models focus on live conversations, real-time translation, and instant speech transcription.
‘Together, the models we are launching move realtime audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,’ OpenAI said. Keep reading for all the details.
The first new model is GPT‑Realtime‑2, which is built for live voice conversations. OpenAI says the model can keep the conversation moving while it reasons through a request, calls tools and handles corrections or interruptions. Developers can enable short responses like ‘let me check that’ so users know the AI is processing a request.
OpenAI has also expanded the context window from 32K to 128K tokens, allowing longer and more detailed conversations. Developers can also adjust the reasoning level depending on whether they want faster responses or deeper thinking.
Also read: OpenAI partners with Nvidia, Microsoft and others to build MRC: What it is
OpenAI also introduced GPT‑Realtime‑Translate, a realtime translation model for multilingual conversations. The model supports more than 70 input languages and can translate speech into 13 output languages in real time.
Also read: Apple to invest Rs 100 crore in India and it is not for iPhones or Macs
The third model is GPT‑Realtime‑Whisper, a new low-latency speech-to-text model. It can transcribe spoken audio live as a person speaks. ‘Teams can power captions for meetings, classrooms, broadcasts, and events; generate notes and summaries while conversations are still in progress; build voice agents that need to understand users continuously; and create faster follow-up workflows for customer support, healthcare, sales, recruiting, and other high-volume spoken interactions,’ OpenAI said.
Also read: Apple agrees to pay USD 250 million to iPhone buyers over AI claims: Who can claim
OpenAI says all three models are now available through its Realtime API. GPT-Realtime-2 is priced at $32 per 1 million audio input tokens and $64 per 1 million audio output tokens. Meanwhile, GPT-Realtime-Translate costs $0.034 per minute, while GPT-Realtime-Whisper is priced at $0.017 per minute.