Vertical Markets Spotlight: Speech in Gaming and Entertainment

The gaming industry has become so hot that even office furniture stores like Office Depot and Staples offer a line of chairs made specifically with gamers in mind. Even big business has embraced gaming—or gamification—to better engage employees in learning and development activities.

Gamers are kept engaged through increasingly lifelike interactions that allow them to be virtually—even literally—immersed in the action.

Despite its now mind-numbingly slow movement, lack of color, minimal sound, and lack of interaction with others, Atari’s Pong was the first commercially successful video game. At its launch in 1972, it was innovative enough to capture the minds and money of many. It was a forerunner to an industry that is now massive and continually evolving to provide ever more realistic and engaging experiences for users.

From the ping-pong of early Atari to today’s hyper-immersive games like Red Dead Redemption and Elden Ring, gaming has become big business. The static black-and-white look of Pong has given way to fast action-movie-quality movement, sound, and voice recognition that allows players to interact vocally with their games—and with other players.

Not surprisingly, COVID-19 has had a significant impact on the industry. As people hunkered down in their homes, many either amped up or began their interactions with a wide range of gaming apps and options.

In 2020 the gaming industry generated $155 billion in revenue, according to Investopedia, and is predicted to reach $260 billion by 2025. In fact, it says: “The video game sector is larger than the movie and music industries combined.” That stunning growth potential hasn’t been lost on some major companies that, according to Investopedia, have plans to enter the gaming industry. They include Google, Meta, and Apple.

And as the gaming industry grows in such quick fashion, the options for speech technology vendors are huge, experts agree.

Speech recognition technology, says Brent Hale, chief content strategist at TechGuided.com, is one of the best advanced artificial intelligence (AI) applications. “It has become a common feature in our devices, like smartphones, cars, and smart home devices. So we do expect the same features everywhere,” he says.

In gaming, speech technology is used to allow players to interact with games—and other players—using their voices. Speech recognition was incorporated into games as far back as the late 1990s when Sega released the Seaman video game with Leonard Nimoy providing voice-over narration. Seaman was one of the first games to use voice recognition.

“The state of speech technology today is fairly advanced, with most systems offering accurate recognition rates of around 95 percent,” says Morshed Alam, founder of Savvy Programmer and a software developer with more than 10 years of IT industry experience. However, he adds, there are still some limitations. “For example, poor acoustic conditions or high levels of background noise can affect recognition accuracy. Additionally, speech technology is often reliant on a user’s accent and pronunciation.”

Rob Bartlett, founder and CEO of WTFast.com, a client/server solution company, agrees. Speech technology is rapidly improving, Bartlett says, but he adds that “in terms of widespread application in video games, it still has a long way to go.”

Remy Cadic, CEO of Acapela Group, a voice solutions provider, says one of the things his company is doing holds a great deal of promise: the creation of diverse voices that represent characters that could be from different cultures, countries, or backgrounds. Acapela recently created two new digital voices—Darius, a male African-American English voice, and Tamira, a female African-American English voice.

Technologies like artificial and virtual reality, eSports, and 5G continue to drive innovation in gaming options. But game developers have a way to go to truly leverage these options. As nippon.com reported in 2018, Seaman creator and designer Saito Yutaka felt that voice recognition was still in its infancy in terms of “responding to specific requests or following set patterns.” His goal is to “create a conversation engine for the Japanese language that will make talking with artificial intelligence a more natural experience.”

“In gaming technology, especially VR, voice recognition can be a game changer,” Hale says. He points to the recent launch of a VR speech sandbox by Ubisoft Studio and IBM. It uses IBM’s Watson unity SDK, Watson speech-to-text, and Watson conversation to simulate “a virtual world for developers, where they can create interactive interfaces with the power of voice recognition in virtual reality.” One of the best developments here, he says, is Star Trek: Bridge Crew, where players can give voice commands to the virtual crew while playing in real time.

Perforce, a software development firm, surveyed more than 500 game developers in 2020 to assess the state of the industry. As Perforce predicts: “Immersive experiences will become standard. 8K resolution will arrive. Unreal Engine 5 (UE5) will drive technological advancements. And AR/VR will no longer be a separate category. It will become an expected feature in game development.”

The gaming industry—and gamers—are looking forward to more sophisticated voice recognition and real lifelike interactions.

“While voice recognition in video games has been teased since the early 2010s in flagship games like Call of Duty: Black Ops and even World of Warcraft, we haven’t seen truly developed speech tech become a fundamental and reliable part of gameplay,” Bartlett says. Even though games like Starship Commander, released in 2018, made waves with its use of speech, Bartlett says the tech was still somewhat simple. However, he adds, “while we haven’t seen much growth in the gaming industry, the tech itself continues to improve, so maybe we’ll see a watershed crossover soon.”

At Acapela, Cadic says, “our vision is that we are entering into what we call voice-first environments with voice becoming the main communication channel between humans and devices.”

The next challenge, Cadic adds, “is diversity and the ability to create exactly the voice needed by each of our customers.”

And even beyond that, he says, the goal is “creating the voice that every single person needs. We are 8 billion people on Earth, and we could create 8 billion different voices, or even more than that.”

In gaming, Cadic says, the next step will be the ability to design one voice for each gamer. “After a few minutes of recording we would be able to generate the voice of the gamer and use that voice in the game.” It would be “very, very interesting to watch different types of voices going into different characters that would be created by the gamers,” he says. So, for instance, the gamer’s character would interact with other characters using the gamer’s actual voice. That is inching more and more closely to the creation of a lifelike virtual world, or metaverse.

Cadic feels there is great potential for voice technology in the metaverse—a concept that is “characterized by persistent virtual worlds that continue to exist even when you’re not playing—as well as augmented reality that combines aspects of the digital and physical worlds.”

Cadic says Acapela has been having ongoing discussions about how its voice technology could be used to “create the voice of any person or any non-fungible token (NFT) so that the voice could be closely linked to the NFT.” That, he says, could be the voice of a person—or the voice of a brand, another new area of exploration.

Interestingly, advances in gaming technology don’t just benefit gamers—developers are also reaping the advanced capabilities in the apps and programs they use to create their virtual worlds.

Text-to-speech provider ReadSpeaker in March released its first-ever text-to-speech multi-platform plug-in for the Unreal and Unity game engines. As gaming continues to attract users around the world, Matt Muldoon, president for North America at ReadSpeaker, sees enhancing the experience for game developers as critical to allow them to experiment with new ideas and concepts while delivering the best product possible more quickly. “With that being said, digital environments and the development of the metaverse are becoming more limitless, putting pressure on developers to deliver unique and higher-quality experiences for players—including voice technology,” he says.

The ability to leverage apps, like ReadSpeaker, Muldoon says, means that “developers do not need to produce and manage individual audio files and can code speech directly into the game, improving accessibility by leveraging text-to-speech to add user interface narration and audio description. With these capabilities, players can have text read aloud to them with near zero latency and also have the option to have in-game chat messages narrated to them, a critical component of action-heavy multiplayer games.” x

Linda Pophal is a freelance business journalist and content marketer who writes for various business and trade publications. Pophal does content marketing for Fortune 500 companies, small businesses, and individuals on a wide range of subjects, from human resource management and employee relations to marketing, technology, healthcare industry trends, and more.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Vertical Markets Spotlight: Speech in Gaming and Entertainment

Triton Digital Partners with ekoz.ai on Voice-Cloned Podcast Ads

Soul App Launches Full-Duplex Voice Model

Mistral Unveils Voxtral Open-Source AI Voice Model

Vonage Partners with AWS for AI Voice Agent Integration