Market Spotlight: Gaming--Playing with Speech

Article Featured Image

The November 2006 release of the Nintendo Wii revolutionized video gaming with a gesture-based interface. After only a year on the market, The Financial Times declared the Wii as the world’s best-selling gaming console.

A large part of this success was due to the intuitive, easy-to-use interface targeting casual players. Voice interfaces in video games, by contrast, are rare, as game developers are still figuring out how and when to incorporate speech controls into gaming scenarios.

Harmonix’s music video game Rock Band offers the most pervasive use of speech in a video game. Released for Playstation 2, Playstation 3, and Xbox 360 during the 2007 holiday season, Rock Band uses Fonix’s speech recognition engine to accommodate the game’s karaoke aspect. Rock Band was well-received, yet some gamers noticed the game’s speech recognition listens to pitch instead of recognizing actual words. Consequently, users can cheat the system by humming; it’s a problem that plagued previous karaoke games like Konami’s Karaoke Revolution and Sony’s SingStar. Additionally, music games are specialized and by definition require an audio recognition component.

However, Datamonitor analyst Daniel Hong says speech recognition will continue its gradual uptake in the gaming industry largely due to the hardware—headsets are standard features for the Xbox 360 and Playstation 3—and vastly increased CPU power that can accommodate voice controls.

"User interface [UI] design is paramount in gaming," Hong says. "Speech recognition provides differentiation and opens up UI choices for the gamer."

But according to John Falcone, senior editor of CNET Reviews, voice controls in the past have "been a tacked-on thing, where [the games] might recognize a handful of terms." More recent games, like the SOCOM series, Ghost Recon: Jungle Storm, Rainbow Six: Lockdown, and Rainbow Six: Vegas, all incorporated limited speech controls allowing users to communicate with the AI (see box below).   

In fall 2008, Ubisoft will release for the Xbox 360 and Playstation 3 Tom Clancy’s EndWar, a real-time strategy game that uses Fonix’s speech recognition as a core element in the UI. In these games, players command entire armies; the complexity of the controls typically requires a full keyboard and mouse, which limits game play to PCs.

While EndWar will also be available for the PC, its user-independent voice controls are necessary for console systems. Instead of highlighting various battle units with a mouse and issuing commands by punching them into a keyboard, players hit the control pad’s right trigger to activate the voice recognition system and issue orders verbally.

Currently, speech recognition quality is 90 percent, but developers want it to be higher than 95 percent by the game’s release. Jeff Bakalar, editor at CNET for gaming and home theater, was impressed after witnessing a demo of the game. "I’m not going to lie," he says. "It was perfect. And this guy had a really thick accent, too. Every five seconds he issued a new command, and we watched him play for 20 minutes."

According to Ubisoft Shanghai creative director Michael de Plater, developers for EndWar used a neural network engine that trains on a language’s sound instead of a specific word set. Thus, the speech recognition isn’t as susceptible to accents. In addition, developers devoted enough processing power to ensure the accuracy of the recognition engine.

"It’s going to take a breakthrough title for [speech] to hit the big time," CNET’s Falcone states. From a marketing standpoint, EndWar has advantages: The real-time strategy genre is immensely popular, as are video games in the Tom Clancy library. If voice in EndWar proves successful, Falcone anticipates greater uptake of speech recognition within the video game community. "It’s a very trend-oriented industry," he says. "If you look at something like Grand Theft Auto that came out in 2001, it’s been copycatted a thousand times. People were just trying to duplicate the genre, and that’s true of the actual features in the game as well."  

Still, de Plater is hesitant to pronounce voice recognition as the interface of tomorrow. "The future for video games isn’t necessarily in one specific form of input but in making the right input for the right experience," he said in a recent Q&A. "Voice command is to strategy games what steering wheels are for driving games, a gun for a shooting game, or a musical instrument for a music-based game. It’s simply the best interface for the job."

Video Games with Voice Commands
Rainbow Six 3: Raven Shield
   Number of Voice Commands: 80
   Number of Keywords: 25
   Release Date: March 2003
Ghost Recon: Jungle Storm
   Number of Voice Commands: 160
   Number of Keywords: 27
   Release Date: March 2004
Rainbow Six: Lockdown
   Number of Voice Commands: less than 10
   Number of Keywords: less than 10
   Release Date: September 2004
Rainbow Six: Vegas
   Number of Voice Commands: less than 10
   Number of Keywords: less than 10
   Release Date: November 2006
   Number of Voice Commands: 5,000
   Number of Keywords: 77
   Release Date: Fall 2008

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues