Imagine a world where your voice isn’t just yours anymore. Where a simple text input and a short audio sample are all it takes for AI to mimic your voice, crafting speech that sounds remarkably like you. Well, this could be future as OpenAI has unveiled the Voice Engine model.
The Voice Engine model has the ability to clone voices with accuracy, revolutionising the way we interact with technology.
In this article, we delve into the details of OpenAI’s Voice Engine, exploring how it works.
Also read: OpenAI’s Sora will be available for everyone later this year: Know more
OpenAI’s Voice Engine model uses text input and just a 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.
According to OpenAI, this model can generate “emotive and realistic voices.”
Also read: OpenAI’s Sora videos spark debate: Raising fears over realistic AI-generated content
Voice Engine was initially developed by OpenAI in late 2022, and has since been used to empower the preset voices featured in the text-to-speech API, alongside ChatGPT Voice and Read Aloud.
In an interview with TechCrunch, Jeff Harris– a member of the product staff at OpenAI– revealed that the Voice Engine model was trained on a mix of licensed and publicly available data.
It’s important to note that OpenAI is not releasing the Voice Engine model widely right now.
OpenAI stated that its partners have agreed to follow its usage policies. These policies prohibit impersonating others without consent or legal rights, require obtaining explicit and informed consent from the original speaker, not building ways for individual users to create their own voices, and mandate disclosing to listeners that the voices are generated by AI.
OpenAI has also implemented safety measures for Voice Engine, such as watermarking to track the origin of any generated audio and actively monitoring its usage.