OpenAI introduces advanced voice intelligence capabilities in its API

OpenAI has launched new voice intelligence features in its API, enabling developers to build more natural, responsive, and real-time AI voice experiences.

Shivangi Yadav

May 19, 2026 - 09:32

OpenAI introduces advanced voice intelligence capabilities in its API

OpenAI announced on Thursday that it is expanding the capabilities of its API with a new set of voice intelligence tools to help developers build applications that can speak, transcribe, and translate conversations in real time.

The update introduces several new audio-focused models, including GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, all designed to support more advanced conversational experiences across apps and services. Among the new releases is GPT-Realtime-2, OpenAI’s latest realtime model built to generate more natural and realistic spoken interactions. The company said the model is powered by GPT-5-class reasoning capabilities, enabling it to handle more complex user requests and conversations than the earlier GPT-Realtime-1.5 model.

OpenAI said the upgraded model is designed to move beyond simple voice interactions and support conversations that require more contextual understanding and reasoning. The company is also launching GPT-Realtime-Translate, a new feature focused on live translation during conversations. OpenAI said the tool can translate speech in real time while keeping pace with natural dialogue.

The translation model currently supports more than 70 input languages, meaning the system can understand speech in those languages, while responses can be delivered in 13 supported output languages.

In addition to voice conversation and translation, OpenAI introduced GPT-Realtime-Whisper, a new speech-to-text capability that provides live transcription during conversations. The company said the feature captures spoken interactions as they happen, enabling real-time transcreational experiences for developers and businesses.

“Together, the models we are launching more or less audio-realtime call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI said in its announcement.

The company said the new tools are expected to be particularly useful for businesses building customer support systems. Still, it noted that the technology could also support use cases in education, media, live events, and creator-focused platforms. At the same time, OpenAI acknowledged concerns about the potential misuse of advanced voice technologies, including spam, fraud, and other forms of online abuse.

To address those concerns, the company said it has added safety guardrails and monitoring systems to the new models. OpenAI explained that certain triggers have been embedded into the system to detect harmful or abusive activity, allowing conversations to be interrupted if they violate the company’s content safety policies.

All of the newly announced voice intelligence tools are being made available through the OpenAI Realtime API. Realtime said GPT-Realtime-Translate and GPT-Realtime-Whisper will be billed on a per-minute basis. At the same time, GPT-Realtime-2 pricing will depend on token usage.

Tags:

What's Your Reaction?

Like 0

Dislike 0

Love 0

Funny 0

Angry 0

Sad 0

Wow 0

Shivangi Yadav Shivangi Yadav reports on startups, technology policy, and other significant technology-focused developments in India for TechAmerica.Ai. She previously worked as a research intern at ORF.