Create life-like audio from a 15-second clip with Voice Engine of OpenAI

Now, OpenAI is previewing a model called Voice Engine, which can use a single 15-second audio clip and text prompt to generate longer audio.

Development of Voice Engine

OpenAI first developed Voice Engine in late 2022 and has since been testing it in various scenarios. The model operates by using a 15-second audio sample and text input to generate natural-sounding and emotive speech that closely resembles the original speaker. This capability is notable because it can produce life-like voices with inflection and tone, rather than a robotic drone.

Early applications of Voice Engine

Voice Engine has been tested in several applications, showcasing its versatility:

Providing reading assistance
Translating content
Reaching global communities
Supporting people who are non-verbal
Helping patients recover their voice

Safety and Responsible Use

OpenAI is committed to developing AI that is safe and beneficial for all. Recognizing the serious risks associated with generating speech that resembles people’s voices, especially in sensitive contexts.

So OpenAi has implemented strict usage policies for Voice Engine, requiring explicit and informed consent from the original speaker and prohibiting impersonation without consent or legal right. Additionally, partners must disclose to their audience that the voices they’re hearing are AI-generated. To further safeguard against misuse, Voice Engine employs safety measures such as watermarking to trace the origin of any generated audio and proactive monitoring of its use.

Future Outlook

OpenAI is previewing Voice Engine but has chosen not to widely release the technology at this time. This decision aligns with their approach to AI safety and voluntary commitments. The preview will not only highlight the potential of Voice Engine but also motivate society to strengthen its resilience against the challenges posed by increasingly convincing generative models.

OpenAI encourages a phase-out of voice-authentication as a security measure, the development of policies to protect the use of people’s voices in AI, public education on AI capabilities and limitations, and the acceleration of technology to identify inauthentic voices. These steps are crucial as the world moves towards a future where technologies like Voice Engine become more prevalent.

Create life-like audio from a 15-second clip with Voice Engine of OpenAI

Development of Voice Engine

Early applications of Voice Engine

Safety and Responsible Use

Future Outlook

Related link

Leave a Reply Cancel reply

Development of Voice Engine

Early applications of Voice Engine

Safety and Responsible Use

Future Outlook

Related link

Related Posts

Leave a Reply Cancel reply