A Look at AVIOS' Speech and Multimodality Contest
The Applied Voice Input-Output Society (AVIOS) announced the winners of its second annual Speech Application Student Contest at the recent Voice Search Conference in San Diego. This year’s 19 submissions provided voice-only applications on a BeVocal host and multimodal applications using X+V, VoiceXML, or SALT scripting languages running on Internet Explorer 6, Opera, Voxeo/Skype, Firefox, or Windows Mobile 5 browsers. The resulting applications represented the collective works of 37 students and eight faculty advisers at nine institutions in four countries.
Applications were evaluated by five speech technology leaders from Microsoft, Nuance, Convergys, and Fonix on the basis of technical superiority, innovation, user friendliness, and usefulness. Winners received software packages, popular hardware, or remunerative awards, such as airfare and lodging for the next Voice Search Conference, from corporate sponsors Google, Microsoft, Samsung, and Voice Objects. Student resumes were made available to the sponsoring corporations.
The winning application in the voice-only category involved five children’s educational games about counting, adding, feelings, days of the week, and seasons of the year. This cleverly designed voice interface, which made good use of barge-in, offered appropriate prompts to cue a child to the task and to appropriately narrow the task to enable good performance with children’s voices. The runner up’s application accepted voice input of common fast-food items and provided a calorie count of the selected lunch meal. Again, clever use of the prompts appropriately narrowed the task for the speech recognition system.
The winning application in the multimodal category was a speech therapy game in which the player (a child) spoke the name of different items in a path that led to a picture of a cake, the reward. Speaking the name of each item provided practice of phonemes (e.g., /l/, /k/) that are typically difficult for children to produce. Numerous encouraging retry prompts and error-handling actions helped the child reach the goal while learning to correctly pronounce the items on the path. The runner-up multimodal application converted English units to metric units for values of temperature, length, and weight. It provided a solid use of speech recognition and text-to-speech.
Two additional prizes were awarded for compelling applications that were outstanding in one or more, but not all, of the evaluation criteria. One of them was a GPS-based directory assistance application that ran on the Windows Mobile platform. It supported a navigation tool that accepted a spoken city name and brought up the corresponding Google map and allowed for voice commands to change the map scale and compass direction. The other was a Visual Flight Rules communication tutorial that provided textual training material with a graphic aircraft control panel, then proposed typical communication tasks that a student-pilot would say to an airport flight controller. Training feedback included displaying and saying the correct communication statement, and having the student try the task again. The goal is to have the student successfully complete the simulated dialogue with the flight controller. Other student projects included a voice-controlled dictionary interface, a multimodal voting application, an adventure game framework, and a step-by-step tutorial for repotting an orchid. To access and sample the applications, visit our Web site at www.avios.org/contest/.
AVIOS’ goals in sponsoring the contest were to foster creative thinking in the use of speech technology and to encourage student participation in AVIOS. A beneficial side effect of the contest was helping students learn more about the corporate sponsors, while helping the corporate sponsors find emerging talent. Comments from participating students indicate that the contest was a success: "The contest was a great chance for me to gain some in-depth knowledge." "I was very satisfied with learning to write voice applications." "There is a notable difference between theory and practice in speech recognition." And (from a sponsor) "Wow! I want to contact that student!"
Planning for next year’s contest is already under way. Visit our Web site for details or to let us know your suggestions for improvements.
Patti Price has more than 20 years of experience in developing and transferring speech and language technology. She also cofounded Nuance, BravoBrava!, and Soliloquy Learning. Bill Scholz, Ph.D., is a speech technology consultant with more than 30 years of experience in research and product development in computer-based training and expert systems. Matt Yuschik, Ph.D., is a human factors specialist at Convergys.