• July 19, 2022
  • By Leonard Klie Editor, Speech Technology and CRM magazines
  • FYI

Microsoft Reins in Its Voice Capabilities

Article Featured Image

Microsoft in late June announced plans to scale back some of its synthetic voice and facial recognition technologies to make its artificial intelligence more inclusive. It is also setting stricter guidelines to “ensure the active participation of the speaker” whose voice is re-created, limiting which customers get to use the service and highlighting what it considers to be acceptable uses.

The tech giant said its measures would amount to a significant update to the “Responsible AI Standard” it first released in 2001 and an acknowledgment that some of its AI systems for speech-to-text and facial recognition could contribute to societal biases and inequities. Concerns about the illegal or unethical use of deepfakes also reportedly played into the decision.

“This technology has exciting potential in education, accessibility, and entertainment,” Natasha Crampton, Microsoft’s chief responsible AI officer, said in a blog post, “and yet it is also easy to imagine how it could be used to inappropriately impersonate speakers and deceive listeners.”

The steps amount to a limited access policy on the AI technologies that underpin Microsoft’s Azure Cognitive Services product line, including Customer Neural Voice, a cloud-based application that enables users to create synthetic voices that sound nearly identical to the original sources. Among the reasons for the move, Microsoft cited evidence that speech-to-text error rates for blacks are nearly double those for whites.

Other Microsoft AI-based products affected include the Face API, Computer Vision, and Video Indexer. The company said it would remove from the Face API service capabilities that infer emotional states and identify attributes, such as gender, age, smile, facial hair, hair, and makeup.

But while Microsoft is going in one direction with its voice technologies, Amazon is reportedly taking its Alexa voice-enabled virtual assistant in the complete opposite direction. The company at its Re:Mars conference in Las Vegas in late June said its Alexa voice assistant might soon be able to re-create the voices of family members, even if they’re dead, using less than a minute of recorded content, sparking widespread concerns about deepfakes and the ethical use of synthetic voice technology.

Rohit Prasad, an Amazon senior vice president and head scientist for Alexa, said the capability would endow Alexa with more “human attributes of empathy and affect.”

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues