Speech Technology Magazine

New Uses for Speech Demonstrate Its Multimodality

Multimodal applications are being used in new and different ways, in everything from voting booths to college students' computers, it was revealed today at SpeechTEK 2008 in New York's Marriott Marquis.
By Meghan Goth - Posted Aug 19, 2008
Page1 of 1
Bookmark and Share

NEW YORK (SpeechTEK 2008) -- Multimodal applications are being used in new and different ways, in everything from voting booths to college students' computers, it was revealed today at SpeechTEK 2008 in New York’s Marriott Marquis.

Juan Gilbert and David Thornton, both of the Human-Centered Computing Lab at Auburn University, presented ideas of how to use multimodal applications in ways that affect just about everyone.

Gilbert has been doing research for five years on Prime III, a voting system in which voters can speak and touch a screen interchangeably to cast their votes.

"Multimodality can break into novel domains," Gilbert said. "In 2012, it’s likely you all could be voting on a system like this."

The visual aspect of the system incorporates large fonts and touchable names on a touch screen. There is one race per screen and the voter confirms the ballot twice before the vote is recorded.

The verbal aspect of Prime III includes a headset with earphones through which the system speaks to the voter. Through an embedded microphone in the headset, voter confirmation is verbal—so anyone can vote for a specific candidate by speaking or making a sound. The system responds to any sound, even if someone blows into the microphone. This is beneficial, Gilbert said, for a paraplegic or someone who may not be able to speak. With the multimodal system, anyone with disabilities can independently vote.

Should someone sneeze or cough in the voting booth, would he inadvertently vote for someone he did not intend to vote for?  "You can’t accidentally cast a ballot," Gilbert said. "You would have to cough at the exact time. The prompts are set up so the next prompt would only cause you to change something, not cast a ballot."

Gilbert received an Auburn University research grant to fund his research and received the award for most usable system in the University Voting Systems Competition.

After Gilbert, Thornton spoke about studying speech-based cursor control mechanisms.

Thornton’s research was based on the question of whether it would be possible to control a computer with speech rather than a mouse or joystick.

However, one of the problems he ran into was that many systems have a large number of selectable objects that are similar or identical in appearance and have no textual label, so there is no easy way to distinguish between them.

Thornton asked college students to sit in front of a computer with several objects of similar size and shape on the screen. Their task was to drop one of the objects into a larger box, or to essentially to move it from point A to point B. The students could choose to move the boxes using speech, in which they would say the name of a labeled box, or a joystick, in which they would pick the object up and place it in the larger box.

He also had the students compare use of the joystick to a grid cursor, a 3 X 3 grid that is displayed over the screen. The person would say‘on’ and the grid would then shrink to the size of the first box in the grid, labeled with the number one, narrowing the scope until the person selects the correct object.

Thornton found that the grid can get down to a very small size. "It’s slow, but it can reach every spot," he said.

As for the labels, they work much faster than the grid, but the desktop would be very cluttered if everything were to be named, and it may be impossible to label everything.

Thornton found that the joystick worked better than the grid—it was simpler. But then Thornton gave students the opportunity to label the objects and use the grid cursor to move them, which worked faster than the joystick.

Then Thornton let students use whichever mode that they preferred."When they were able to select objects with only speech, they chose to select it with speech 84 percent of the time," he said. He found that the more objects there are on the screen, the more multimodality really outperformed the joystick.

"People who had used the joystick performed very well with it," Thornton said, "but the fact that speech beat the joystick out at some point is very impressive."

Page1 of 1