Computing

OpenAI previews Realtime API for speech-to-speech apps

Byadmin

Oct 2, 2024 PC Gaming

OpenAI has introduced a public beta of the Realtime API, an API that allows paid developers to build low-latency, multi-modal experiences including text and speech in apps.

Introduced October 1, the Realtime API, similar to the OpenAI ChatGPT Advanced Voice Mode, supports natural speech-to-speech conversations using preset voices that the API already supports. OpenAI also is introducing audio input and output in the Chat Completions API to support use cases that do not need the low-latency benefits of the Realtime API. Developers can pass text or audio inputs into GPT-4o and have the model respond with text, audio, or both.

With the Realtime API and the audio support in the Chat Completions API, developers do not have to link together multiple models to power voice experiences. They can build natural conversational experiences with just one API call, OpenAI said. Previously, creating a similar voice experience had developers transcribing an automatic speech recognition model such as Whisper, passing text to a text model for inference or reasoning, and playing the model’s output using a text-to-speech model. This approach often resulted in loss of emotion, emphasis, and accents, plus latency.

Source link

OpenAI previews Realtime API for speech-to-speech apps

Byadmin

admin

Related Post

RWE bets on HPE Private Cloud AI for green energy push

Nvidia CEO talks up AI post-training, test learning and gigawatts

Nokia lands five-year extension on Microsoft Azure datacentre networking deal

You missed

India beat Australia by 295 runs to win first test in Perth | Cricket News

RWE bets on HPE Private Cloud AI for green energy push

Titan Dynamics’ 3D Printed Falcon V3 Achieves Remarkable Flight Performance – sUAS News

Redmi K80 Pro cameras revealed, telephoto shooter is finally coming