Voice-Cloning Audio Feature Presented by OpenAI
The company is currently showcasing initial demonstrations and practical applications from a preliminary trial of their text-to-speech model, named Voice Engine. According to a spokesperson, they have made this available to approximately 10 developers for testing and feedback. Voice-Cloning Audio Feature Presented by OpenAI
OpenAI is revealing initial findings from a trial of a feature capable of articulating text with a remarkably authentic human voice, signaling an exciting advancement in artificial intelligence while also raising concerns about potential deepfake implications.
We're sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. https://t.co/yLsfGaVtrZ
— OpenAI (@OpenAI) March 29, 2024
The company is showcasing early demonstrations and practical applications from a limited-scale preview of their text-to-speech model, known as Voice Engine, which has been made available to approximately 10 developers for experimentation, according to a spokesperson.
Despite briefing reporters on the feature earlier this month, OpenAI has opted against a broader release at this time.
A representative from OpenAI shared that the company made the decision to limit the release of the tool after gathering input from various stakeholders, including policymakers, industry experts, educators, and creatives. Initially, they had intended to offer the tool to up to 100 developers through an application process, as mentioned in an earlier press briefing.
“In light of the potential risks associated with generating speech that closely resembles individuals’ voices, particularly during an election year, we have decided to proceed with caution,” the company stated in a blog post on Friday. “We are actively collaborating with partners both in the US and internationally, spanning government, media, entertainment, education, and civil society, to ensure that we incorporate their insights as we continue to develop this technology.”
Previous instances of AI-generated audio content have raised concerns, particularly after a realistic but fabricated phone call purportedly from President Joe Biden circulated in January, urging people in New Hampshire not to vote in the primaries. Such events have heightened apprehensions about the misuse of AI technology ahead of crucial global elections.
Unlike OpenAI’s earlier audio generation efforts, Voice Engine has the capability to replicate individual voices with remarkable accuracy, capturing their unique speech patterns and nuances. Remarkably, the software only requires a 15-second audio sample of a person speaking to reproduce their voice.
During a demonstration of the tool, Bloomberg had the opportunity to listen to a clip of OpenAI’s CEO, Sam Altman, providing a brief explanation of the technology in a voice that sounded remarkably similar to his own, despite being entirely AI-generated.
Jeff Harris, a product lead at OpenAI, remarked, “With the right audio setup, it’s almost indistinguishable from a human voice.” He added, “The technical quality is quite impressive.” However, Harris also emphasized the importance of handling such technology with great care, considering the potential ethical implications surrounding the accurate replication of human speech.
One of OpenAI’s current partners in using the tool is the Norman Prince Neurosciences Institute at Lifespan, a not-for-profit health system. They’re utilizing the technology to aid patients in regaining their ability to speak. For instance, the tool was instrumental in restoring the voice of a young patient who had difficulty speaking clearly due to a brain tumor. By replicating her speech from a previous recording for a school project, her voice was restored effectively.
OpenAI’s customized speech model also possesses the capability to translate the generated audio into various languages, making it particularly valuable for companies operating in the audio industry, such as Spotify Technology SA. Spotify has already leveraged the technology in a pilot program to translate podcasts hosted by popular personalities like Lex Fridman. Additionally, OpenAI highlighted the potential of the technology in creating diverse voices for educational content aimed at children.
As part of the testing program, OpenAI mandates that its partners adhere to strict usage policies, including obtaining consent from the original speaker before using their voice and disclosing to listeners that the voices they’re hearing are AI-generated. Furthermore, the company is implementing an inaudible audio watermark to enable the identification of audio pieces created using its tool.
Before making a decision on whether to make the feature available to a wider audience, OpenAI has emphasized the importance of gathering feedback from external experts. “It’s crucial that people worldwide are aware of the direction in which this technology is evolving, regardless of whether we ultimately decide to release it on a larger scale ourselves,” the company stated in its blog post.
OpenAI also expressed its aspiration that the preview of its software will underscore the necessity to strengthen societal resilience against the potential challenges posed by increasingly sophisticated AI technologies. For instance, the company urged banks to consider phasing out voice authentication as a security measure for accessing bank accounts and sensitive data. Additionally, OpenAI advocates for public education initiatives aimed at raising awareness about deceptive AI-generated content and the development of enhanced techniques for discerning between real and AI-generated audio content.
Click on ttime.in to read such a content ttime help you to be with trend.