April 26, 2005
Features

Speech Technologies Make Video Games Complete

Speech recognition technology has reached a significant milestone: its use in mainstream console-based video games. That might seem like a dubious achievement considering the serious ROIs that speech recognition offers for large corporations in automated call centers, but the video game industry is huge, bigger than the movie industry. The costs to produce a new video game make the introduction of any new feature like speech recognition a serious risk. And for speech technology the video game environment is a major challenge: recognizing speech from loud, excited, untrained voices of users young and old, of both sexes, in a noisy background. But today, speech recognition engines are running on major video game consoles in a growing number of titles. Verbal commands for action and dialogue with virtual characters are offering a new dimension in game play.

The Video Game Industry
The video game industry, which had its start with Pong in 1970, is big business. Estimates of worldwide revenues in 2004 are about $30 billion. This compares to estimates of $20 billion for worldwide movie box office receipts.

The rewards in the video game industry come with substantial risks. The cost of developing a $50 console video game is comparable to that of making a movie, millions of dollars. A developer must be confident that a new feature like speech recognition will add value. For the speech industry, convincing potential customers of the value of speech recognition technology is a familiar task and we seem to have made it to the first level of the video game. Console manufacturers, game publishers, and game developers have all moved to implement speech. An advantage for speech vendors is that the video game industry is used to working with middleware vendors. These vendors provide graphics engines, animation, and audio software, which now includes speech recognition technology. The SDKs (speech software development kits) offered by Fonix (VoiceIn) and ScanSoft (VoCon) are specifically designed to allow game developers to spend more time creating new games and less time coding.

In addition to the console-based video game market, the smaller PC segment of the games market represents a healthy $3 billion in annual worldwide sales. In this market Fonix and ScanSoft are joined by several other speech recognition vendors including eDimensional which offers Voice Buddy, Realize Software’s Realize Voice, and Mindmaker’s Voice Commander.

Video Games with Speech
The Microsoft® Xbox, Nintendo GameCube™, and Sony PlayStation® 2 consoles all offer games with speech input/output. Currently, most games are war-action-shooter games. In these, speech recognition provides high-level commands to virtual teammates who respond with a variety of recorded quips. In online games the headset microphone can also be used for VoIP chats with other real players. The PlayStation® and Xbox use USB microphones available from Logitech, Microsoft, Plantronics, and others. Nintendo’s GameCube™ uses a custom handheld microphone for its new Mario Party 6 game.

In order to get some hands-on, voice-in experience, I used a PlayStation® 2 console. I tried several games including the graphically-realistic, tactical squad-based, shooter games Ghost Recon 2 and SOCOM II: U.S. Navy Seals. The speech recognition systems for these games are provided by Fonix and ScanSoft, respectively. Despite my raw recruit status, I was able to get a good taste of what speech input can do to enhance the game experience. In both games the speech recognition, which was activated by a designated push-to-talk button on the gamepad, worked flawlessly.

The *Ghost Recon 2* Captain and his team with the Recognized Speech on the Screen

In Ghost Recon 2, the user is the leader of a team of three secret Special Forces soldiers who must capture various military targets in North Korea in the year 2007. The team is critical to the user’s survival from enemy gunfire. Saying “Move out!” directs the team to move ahead of you as you make your way through the virtual, hilly terrain toward various objectives. The speech commands (“Move out,” “Covering fire,” “Grenade,” “Take point,” “Hold position,” “Regroup”) are easily-recalled, high-level instructions to the team members. The commands that can be obeyed depend on the immediate situation. If you say, “Take point,” and the hostile fire is too great the designated team member may say, “No can do, Captain.” Occasionally, the retort is somewhat less respectful.

*SOCOM II: U.S. Navy Seals*: The first person player with some speech command choices on screen

In SOCOM II: U.S. Navy Seals, a team of four men including the first person leader attempts to stop an arms smuggling group in rural Albania. The team has to avoid the enemy, meet an informant, blow up weapons caches, and make their escape. The speech commands in this game are spoken in three parts, using a simple grammar. The commands may be addressed to “Fireteam” (all other team members) or individuals like, “Able” (your partner). Then there are approximately 12 action commands including “Fire at will,” “Deploy,” “Move to,” “Get down,” and others. The third part of the command includes nine letters of the military alphabet (“Charlie,” “Delta,” etc.) indicating where the “Move to” and similar commands are intended. They represent the specific locations of game objectives.

In these games the use of speech interaction felt like the way a team leader would control his men in a real life situation. The relatively complex and varied movements of several men were easily executed. The voice commands could also be executed using button presses on the gamepad controller, but using the gamepad for some commands would require more than 10 separate button pushes. Speech input allowed the gamepad controller to remain free to exercise full control of the first person shooter. In both games, I found the use of voice commands fast, easy, and very empowering. The verbal interaction seemed appropriate and natural. My only suggestion is that the speech input option should always include a “now-what-should-I-do?” command.

Modern video games are complex, with intricate rules and simultaneous tactile, visual, and audible information thrust on the user. Many require as much or more strategy as shooting. In this environment, speech commands are highly useful. Speech commands may make novice players more able to progress through the levels of the games and make expert players able to use a wider range of skills and strategies.

Training users to remember natural voice commands would not seem to be a major issue in video games for several reasons. Speech input is valuable even if limited to a few obvious commands. When more complex commands are required, prompts may appear on the screen. In any case, gaining proficiency from experience is a part of the engaging nature of video games, from learning how to use the multi-button gamepad to psyching out the game strategy.

Meeting the Needs of Video Game Developers: SDKs
The interface between video game developers and the speech technology vendors is the SDK. The Fonix VoiceIn and ScanSoft VoCon SDKs offer tools and documentation for programmers and designers of the major video game consoles. For game designers, vocabularies and grammars can be created without coding or concerns about speech technology. Acoustic models for words or phrases are generated automatically. For programmers, sample code is provided to make the implementation of speech input fast and reliable. The Fonix games’ SDK features a neutral net-based speech recognition with a small computing footprint and is cross-platform compatible for Xbox, PlayStation®, and PC platforms. The ScanSoft SDK supports Nintendo and PlayStation® consoles and features the special adaptation of its acoustic models for children’s voices. Both SDKs offer support for multiple languages.

The Future for Speech Recognition in the Video Games Environment
The video games market should provide a strong opportunity for speech recognition technology, which has proven successful in its technical capability and its value to the game experience. The video game market is projected to grow by 20 percent per year over the next few years, leading to a worldwide market of more than $55 billion by 2008, according to a PricewaterhouseCoopers report. The market penetration for speech recognition technology is still very small with speech-enabled video games currently accounting for only a fraction of the total. Almost all the console video games with speech recognition that have been developed so far are in the action-adventure genre. Significant opportunities for speech in other genres exist as well. In sports games like football and basketball, calling plays or defensive responses by voice are natural. In golf, players can choose clubs and shots. Racecar drivers can make voice requests and responses to their pit crews. In role-playing games, speech can be used to cast spells and manipulate objects. Multiple uses for speech input in education and in both training and entertainment games exist. The added cost of speech is comparatively modest both in development resources and royalties.

The expanded use of speech input/output (I/O) in video games will be accelerated by the expected introduction of new, more powerful hardware by major console manufacturers. These platforms will create new opportunities for speech input using these computing resources. Larger vocabularies, more complex grammars, more natural language processing, and more artificial intelligence (AI) technology will be available for engaging, interactive dialogues. In most video games, except those that are basically of the “swivel and shoot” type, the extra computing power will provide a much richer experience. Expanded vocabularies and natural language processing will allow players to issue commands or make requests in many different ways and be understood. AI capability will create opportunities for human players to consult with virtual players to gain information and insights on strategy. Virtual reality entertainment applications like the popular Sims series could utilize complex speech interaction with emotional content in their virtual worlds. Dialogue will be an essential new element in these games.

A research group at Teliasonera in Sweden is developing a fairy tale-based video game called NICE to explore spoken dialogue systems in which children use speech as the “primary means of progression through the story.” In the U.S., Michael Mateas of Georgia Tech and Andrew Stern, an AI game developer, have created a social interaction game incorporating emotion and interpersonal conflict. If speech rather than text input were used to interact in games of this genre, their immersive power would be increased dramatically.

The creative freedom of video games using speech recognition and synthetic speech response may lead to the development of more powerful dialogue systems for commercial applications. It may also provide some models for advanced human-computer interactions like speech-bots or conversational agents. Online transactions or information retrieval based on voice interaction allows human beings to use the channel with the greatest bandwidth from the brain to the outside world: speech.

Thanks to David Thomson CTO of SpeechPhone (http://www.speechphone.net/) for his assistance with this article.

Video Games with Speech Recognition

Console Vendor

Game

Publisher

Developer

Speech Technology

Microsoft: Xbox

Ghost Recon 2

Ubisoft

Red Storm

Fonix

Rainbow Six 3

Ubisoft

Red Storm

Fonix

Rainbow Six 3: Black Arrow

Ubisoft

Red Storm

Fonix

SWAT: Global Strike Team

Vivendi Universal/Sierra

Argonaut

Fonix

Nintendo: GameCube

Mario Party 6

Nintendo

Hudson Soft

ScanSoft

Sega: Dreamcast

Seaman ( Japan)

Sega

ScanSoft

Sony: PlayStation 2

Deka Voice ( Japan)

Sony Computer

ScanSoft

Ghost Recon 2

Ubisoft

Red Storm

Fonix

Ghost Recon 2: Jungle Storm

Ubisoft

Red Storm

ScanSoft

LifeLine

Konami Digital

ScanSoft

Operator’s Side ( Japan)

Konami Digital

ScanSoft

Rainbow Six 3

Ubisoft

Red Storm

ScanSoft

SOCOM: U.S. Navy Seals

Sony Computer

ScanSoft

SOCOM II: U.S. Navy Seals

Sony Computer

ScanSoft

SWAT: Global Strike Team

Vivendi Universal/Sierra

Argonaut

ScanSoft

John A. Oberteuffer is the chairman, advisory committee at Fonix Corporation and a member of the board of directors of AVIOS. He was the founder and editor of the speech industry newsletter ASRNews. He can be reached at: JAO@fonix.com

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Speech Technologies Make Video Games Complete

Nex-Gen Chat Solutions with Generative AI You Can Trust

Speech Technologies in the Low-Code/No-Code World

Meeting the Rising Demand for Voice-Based Biometric Systems

More Web Events

Philips SpeechLive Integrates Nuance's Dragon Speech Recognition

Natural Language Processing to Grow by $53 Billion by 2027

RUSH Partners with Suki

Greenway Health Partners with Nabla