December 27, 2023
By Kashyap Kompella founder, RPA2AI Research and AI Profs
Interact

Generative AI and Speech Technology: Proceed with Caution

It is no exaggeration to say that generative artificial intelligence has captured both the general public’s and technology industry’s attention and imagination in the past year. Generative AI is also being discussed as companies finalize their technology road maps for 2024.

There are generative AI tools that can, based on simple natural language prompts, produce text, images, code, designs, video and audio output, you name it. It’s been a sort of Cambrian explosion of synthetic, or auto-generated, content. While the prospects of leveraging all these new tools and capabilities are very promising, there are also significant concerns and risks to address.

Policymakers and regulators in different countries are deliberating about the applicability and scope of intellectual property rights regimens for AI-generated content. Courts are hearing whether the creators of image/code/text generators have the rights to the training data their tools are ingesting. There are also questions about where the responsibility lies and how to address the liabilities involved when users rely on generative AI tools. Judicial rulings and new regulations on the horizon will help frame the rules of engagement around AI for the coming years.

Prior to the rise of generative AI, the world of speech technology had been grappling with some of these issues, and generative AI has added to the list. We can broadly classify them into three categories—ethical concerns, new risks, and areas requiring legal clarity. A detailed enumeration is beyond our scope, but let me illustrate with a few examples of each.

Can AI be used to generate the voices of deceased people and use them in new content? In what circumstances is it OK to do so, and when is it not? A documentary about Anthony Bordain that included a small portion of audio generated by AI illustrates the dilemma.
What about using AI to automatically remove any traces of foreign accents of customer support agents as they speak to customers? Is it increasing or decreasing bias?
We are familiar with phishing emails. With speech AI, are we now going to be flooded with new types of audio-based attacks and phishing scams, like this incidentwhen a CEO’s voice was spoofed in a call to a company’s employee, who was asked to make a fraudulent wire transfer, or another recent incident of a fake-kidnapping scam based on voice cloning.
What about the risk of proliferation of misinformation? A purported interview with the Formula 1 racing champion Michael Schumacher, who has been out of the public eye after an accident years ago, consisted of AI-generated responses, causing an uproar (the editor was fired). There are similar risks of audio deepfakes putting words into people’s mouths to defame or malign them or to spread rumors or hate speech. There’s a lot of discussion about video deepfakes, but audio deepfakes are also a big concern.
AI can be used to generate a music track or vocals, as we saw in the case of the fake Drake song“Heart on my Sleeve,” which many fans and critics conceded was pretty good. Such AI-generated music is rife with contentious legal issues and opens a pandora’s box. If performing voice impressions and imitations is OK, why is it wrong to use to AI to generate vocals?
Related, who has the right to use an artist’s name and likeness? Does AI-generated music based on artists’ work (and without seeking their permission) harm those artists and their right to publicity? Does voice cloning violate existing laws? The answer is perhaps clearer in the music industry, but these questions will have to be worked out in other domains, sooner rather than later.

Artificial intelligence is poised to transform the world of speech technology. As enterprises look to take advantage of speech AI, they should be aware of the potential issues and plan for their mitigation. Speech AI vendors, for their part, must put into place guardrails against potential misuse and abuse of powerful new capabilities. Both buy-side and sell-side diligence are required to realize the full potential of speech AI.

Kashyap Kompella is CEO of rpa2ai Research, a global AI industry analyst firm, and co-author of Practical Artificial Intelligence: An Enterprise Playbook.