The Game Changer in Gaming: Voice Recognition Technology

The world of gaming is expanding its user experience more rapidly than ever before. I can’t believe that only a few short decades ago, gaming companies were just introducing games like Pong and PAC-MAN. But as new technology develops, so does our entertainment. From virtual reality to motion sensor technology, video games are becoming more immersive and the line between real life and art is blurring. One of the advances I am most excited about is also one of the most complicated. That is the implementation of voice recognition technology. Familiarity with voice recognition software has increased exponentially with our use of smartphones (thank you Siri!), voice controls in our cars, and smart home devices like Google Home or the Amazon Echo. It seems like voice recognition technology is appearing everywhere I turn. So why would I want my gaming experience to be any different?

Fun fact: In the mid-80s a video game console called Halcyon was released by RDI Video Systems. It was thought to be the beginning of a meteoric rise in voice recognition based video game systems, even though the RDI only actually created about 12 units. At a staggering $2,500 per console, the Halcyon featured a laserdisc player, an attached computer, and was promised to be entirely voice activated. Inevitably, only two games were released for it. Though the company claimed it would be entirely voice-activated with an AI-like HAL 9000, the software left much to be desired (for those of you who don’t get the reference, check out the sci-fi all-time classic 2001: A Space Odyssey). At the time, the field of voice recognition technology was still in its infancy and processing voice input took a considerable amount of time and data. Additionally, the variations from speaker to speaker almost guaranteed that the games would be slow to implement action and be frustrating at best.

Yet, I applaud the work of the people at RDI Video Systems. The time in which they were working gave them little to work with, but that can give us insight into why the video gaming world has been a bit slower on the uptake when it comes to implementing voice recognition technology.

Voice Recognition in Video Games: Taking Inventory of the Obstacles

Making a video game is incredibly difficult--and making a successful one, even more so. Developing a plot, characters, dialogue, etc. can take years, and that’s just the basics. When creating an open world game, developers have to anticipate multiple scenarios for each part and in every possible sequence. Making the game compatible with speech recognition adds yet another level of headaches to the mix.

Integrating the technology in and of itself can become overwhelming. On top of that you need to have a cache of voice data collected so that the gaming system has enough information to process different inflections and variations in different voices. If the game is to be distributed globally, accents, dialects, and foreign languages all need to be taken into consideration to make sure the software recognizes the input from each speaker. Because the aim of the game is to allow for natural speech as opposed to requiring players to stick to simple commands, different ways of saying a certain phrase or responding to a prompt need to be considered.

With all the complexity that voice recognition technology brings to the gaming world, why are developers still deciding to move forward? The answer is simple. Voice recognition makes games more immersive, feel more real, and therefore enhances the experience of the players.

Advancing Bit by Bit: the Early Stages of Voice Recognition

A more recent example of a video game attempting voice recognition was Bot Colony released by North Side Inc. in 2014. Players navigate through a future world where robots had become man’s new best friend and helper. The creators--though limited by the technology available at the time--were forward thinkers, understanding the benefits of players using natural language as opposed to scripted commands. Unfortunately, even after players went through a “training” session to get the game acclimated to their voice, the game had continuous issues in recognizing and processing input.

Even earlier, the people at Nuance decided to take a different route. Their contribution to voice recognition gaming: The Dragon Gaming Speech Pack. People familiar with speech recognition software for PCs will recognize the name Dragon immediately. Nuance has actually created a line of products that incorporate voice recognition into many PC programs. This is especially helpful for people who are physically unable to type.

The Dragon Gaming Speech Pack allows players to use voice recognition for simple but numerous commands. Unfortunately, the system did prove to fall into the same trap as its predecessors wherein variations and intonation proved problematic. This meant that the voice recognition didn’t always work and the game didn’t respond right away – definitely not something you want when trying to return fire in Call of Duty.

Games Gain Forward Momentum

Though voice recognition software has a long way to go, especially when it comes to gaming systems, that doesn’t mean developers can’t start melding it with other emerging technologies. In fact, I believe that pairing voice recognition with other new technologies can help to expose where the limits can be pushed and can introduce entirely new directions where voice recognition technology can go. And it just so happens that another gaming trend, virtual reality, is an ideal place to test voice recognition, flush out its problems and find solutions.

Enter Ubisoft Studio and IBM. This new duo started by creating the “VR Speech Sandbox” which combines the features of IBM’s Watson Unity SDK with two other services: the Watson Speech to Text and Watson Conversation. The sandbox is a virtual world in which developers can build and adapt innovative user interfaces, by using the power of voice interaction with virtual reality. And, the most exciting development from this collaboration is the Star Trek : Bridge Crew. Using natural language, players can give their virtual crew commands while still playing in real-time with other human players.

As these interfaces become more common, other large companies are diving in. Creating similar developer platforms to that of IBM’s Sandbox, Nuance, and Samsung have opened up their own respective virtual spaces which allow independent developers to integrate voice recognition technology into their games and apps. It then comes as no surprise that the market is expected to just keep growing. By 2021 it is anticipated that voice recognition technology will have 1 billion consumers.

Scouting the Voice Horizon

Though speech technology in video games had a bumpy start, the end goal has remained the same. That is to move toward creating art and stories that are immersive while pushing ever closer to creating experiences that are indistinguishable from real-life. And, in addition, making the gaming experience as convenient and efficient as possible, as every second counts when it comes to gaming.

Up until recently, voice recognition technology has been sidelined by the major developers in the gaming world but that is starting to change. Perhaps the most significant push for utilizing voice recognition in games has been the increase in its familiarity in other facets of our lives. Apps, voice controls in cars, and smart home devices have made voice recognition a regular feature in our day. As our familiarity with it increases, its absence in video games becomes more conspicuous.

Ubisoft is starting to invest in voice recognition gaming, and if other popular video game developers such as Bandai, Blizzard, EA, or Nintendo want to follow suit, they could really push the technology forward. And, fortunately for them, there are companies out there that provide services for speech technology integration. As we start to see these partnerships form, we get closer and closer to a future where speech recognition is incredibly accurate, localized, and totally ubiquitous in the gaming world.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Q&A: Human Emotions and Digital Agents

Bruce Balentine is a Chief Scientist at Enterprise Integration Group specializing in speech, audio, and multimodal user interfaces. In almost three decades of work with speech recognition and related speech technologies, Balentine has designed user interfaces for telecommunications, desktop multimedia, entertainment, language training, medical, in-vehicle, and home automation products. Balentine will moderate the panel "How Can Digital Agents Make Use of User Emotion?" at the SpeechTek conference in April. Conference chair James A. Larson interviewed Balentine in advance of this year's conference.

28 Mar 2018

The Rise Of The Voice-Enabled Associate

Speech technology helps retailers deliver on a connected and productive workforce.

23 Mar 2018

Let’s Step Up the Fight Against Voice Fraud

Service providers lose more than USD $38.1 billion from voice fraud annually, according to the Communications Fraud Control Agency (CFCA). In a voice market where margins are declining, any loss from fraud is too much. Service providers have to take action or face potentially going out of business.

16 Mar 2018

Q&A: Strategizing Customer Experiences for Speech

Crispin Reedy is a Voice User Experience designer and usability professional at Versay Solutions. She has over 15 years of experience on the front lines of the speech industry, in the design, usability, and tuning disciplines. She is presenting the SpeechTEK University course "Strategizing Customer Experiences for Speech" on Wednesday April 11 at SpeechTEK 2018. SpeechTEK program chair, James Larson, talked to Reedy in advance of her conference session.

07 Mar 2018

When Dolphins Attack: How Voice Assistants and Speech Recognition Software Can be Fooled

If you own a smart speaker, you know that it can be fun trying to trick Alexa, Siri, or Google into doing or saying something it shouldn't—like obeying your friend who imitates your voice commands. While such ruses are fun and harmless, the truth is that bad actors are undoubtedly attempting trickery of a more nefarious nature and voice-controlled systems (VCSs) and speech recognition systems (SRSs) can be easily fooled via clever techniques.

23 Feb 2018