2023 Speech Industry Award Winner: NVIDIA Is Making Voice AI Better for Almost Everyone

Article Featured Image

NVIDIA saw blowout second-quarter results, surging margins, and incredible demand, which prompted one analyst from Constellation Insights to conclude that “it’s clear the company has little competition and a lot of pricing power.”

NVIDIA, which is based in Santa Clara, Calif., also has a lot of partnering power, which is no doubt contributing greatly to its financial success. This past year saw NVIDIA team up with some of the biggest names in speech, computing, and gaming to advance its technology.

Interactions, a provider of conversational artificial intelligence, for example, paired with NVIDIA to bring its Riva automated speech recognition and text-to-speech technologies and expanded language libraries to the Interactions intelligent virtual assistant.

“By integrating Riva with its expanded language libraries into Interactions, we’re empowering the world’s most customer-
centric brands to reshape their customer support ecosystems using a combined best-in-class AI and human experience,” said Michael Iacobucci, CEO of Interactions, in a statement at the time.

Fellow speech technology vendor Speechmatics also partnered with NVIDIA to supercharge its machine learning capabilities, having deployed NVIDIA’s AI systems and software to train its larger models.

In yet another joint AI initiative, computer manufacturer Dell launched a total of 15 Dell PowerEdge systems available with NVIDIA acceleration, providing companies the foundation for a wide range of AI applications, including speech recognition, cybersecurity, recommendation systems, and language-based services.

But NVIDIA didn’t stop there. The company this year introduced a speech AI ecosystem built with Mozilla Common Voice to develop automatic speech recognition models that work across languages worldwide. The new ecosystem focuses on developing crowd-sourced multilingual speech corpuses and open-source pretrained models and expanding the speech data that is available for low-resource languages.

According to NVIDIA, the initiative will focus on helping AI models understand speaker and language diversity, accents, and noise profiles. Developers will be able to train their models on Mozilla Common Voice datasets and then offer those pretrained models as open-sourced ASR architectures.

The Mozilla Common Voice platform currently supports 100 languages and includes more than 24,000 hours of speech data available from 500,000 contributors worldwide.

Through the Mozilla Common Voice platform, users donate their audio datasets by recording sentences as short voice clips, which Mozilla validates to ensure dataset quality upon submission.

And NVIDIA also turned its AI attention to the gaming world with its launch of Avatar Cloud Engine (ACE) for Games, a custom AI model foundry service for video games that brings intelligence to nonplayable characters through AI-powered natural language interactions.

Developers of middleware, tools, and games can use ACE for Games to build and deploy customized speech, conversation, and animation AI models in their software.

ACE for Games delivers AI foundation models for speech, conversation, and character animation. These include NVIDIA Riva for ASR and TTS; NVIDIA NeMo, for building, customizing, and deploying language models using proprietary data, customized with lore and character back stories and protected against counterproductive or unsafe conversations via NeMo Guardrails; and NVIDIA Omniverse Audio2Face, for creating expressive facial animation of game characters to match speech tracks.

Several game developers and startups are already using NVIDIA Audio2Face for their workflows. These include GSC Game World, which is using it in STALKER. 2: Heart of Chernobyl; Fallen Leaf, which is using it in Fort Solis, a sci-fi thriller that takes place on Mars; and Charisma.ai, to power the animation in its conversation engine.

Convai, a developer of conversational AI for virtual game worlds, has also integrated NVIDIA’s ACE modules into its real-time avatar platform. In a demo called Kairos, players interact with Jin, the purveyor of a ramen shop. Although he is a nonplayer character, Jin replies to natural language queries realistically and consistent with the narrative back story.

“With NVIDIA ACE for Games, Convai’s tools can achieve the latency and quality needed to make AI nonplayable characters available to nearly every developer in a cost-efficient way,” said Purnendu Mukherjee, founder and CEO of Convai, in a statement.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues