Encouraging Good VUI Design

Voice dialing is a great speech technology application. As a frequent user, both on my cell phone and my company's automated operator, I find it quick and easy: hands-free, eyes-free, and numbers-memory-free. So why, when the contract from my wireless carrier entitled me to a newer model with a more advanced voice dialer (as well as a camera and other goodies), did I go back to my old cell phone?

Easy. The voice-dialer on my old phone (which I re-activated) required only one button push and one voice command to make calls. The voice-dialer on the new phone (donated to a battered women's shelter via our local police) required up to seven separate voice commands (see sidebar). Neither of the two speech recognition improvements on the new phone, speaker independence and more voice entries, outweighed the tedious voice dialing process.


Voice Dialing

 Old Cell Phone (Re-activated):

(1)   Push button on side of phone

(2)   Name please

(3)   "John Smith, work"

(4)   Calling

(5)   Call is connected

  On New Cell Phone (Discarded):

(1)   Push button on side of phone

(2)   Please say a command

(3)   "Call someone"

(4)   Please say a name

(5)   "John Smith"

(6)   Did you say John Smith?

(7)   "Yes"

(8)   Call mobile?

(9)   "No"

(10)           Call home?

(11)           "No"

(12)           Call work?

(13)           "Yes"

(14)           Calling

(15)           Call is connected

For years the speech industry suffered the ridicule of users and commentators. "I ordered a pepperoni pizza and it called my mother-in-law," or "Text-to-speech sounds like a drunken Swede." Those days are over. Speech recognition, even in small computing footprints can be robust, speaker independent, and provide voice models for any text word or phrase. Text-to-speech can be highly intelligible for embedded applications and almost completely natural in server implementations. However, poor voice user interface (VUI) designs often exasperate users and obscure the advantages of the speech technology. Everyone is, at least, an occasional user of speech technology in telephone interactive voice response (IVR) systems. So the problems of VUI design are widely recognized: from skits on "Saturday Night Live" to the tips on the GetHuman Web site for opting out of IVR systems to reach a human being.

Several issues can affect good VUI design: human- vs. machine-like interfaces, over-emphasis on ROI, and the lack of effective resource integration.

More machine-like systems can be simple and efficient with short greetings, minimal menu choices, and an operating paradigm that acknowledges their inherent limitations. Like the voice-dialer on my cell phone, less can definitely be more. Efficient doesn't mean rude, it just means not trying to put lipstick on a robot. An example of a useful, well-designed speech application is Tellme's 1-800-555-1212 information system. The recognition is excellent and the response is generally faster than live operator look-up. The interface design is clear to a first-time user; and barge-in is available for power users.

Over-emphasis on ROI doesn't benefit the user or the company in the long run. Using the greeting as a commercial, offering multiple layers of long menus, and not providing an easy "zero out" option to a live agent all reinforce the perception that the company is wasting the user's time to reduce its costs.

Technology resources peripheral to the automated telephone system are not well integrated in many IVR systems. Repeating information already spoken or punched-in to a live agent is a common complaint. The use of caller ID to reduce the information required from a caller seems rare. Walt Tetschner of "ASRNews" relates an example of this lack of the integration of company information into a speech system. His wife was informed by Amtrak's "Julie" that her train was on time, but she discovered at the station that her New York to Boston segment had been cancelled. We expect more of intelligent automation.

In addition to simple applications like voice dialing, speech technology offers the potential for information-rich, fast-responding, transactional IVR systems for a variety of tasks. If users are turned off by the current shortcomings of automated speech systems, then the acceptance of these valuable applications will be delayed. As enthusiastic promoters of speech technology, we need to encourage the adoption of truly efficient, user-centered VUI designs.


John A. Oberteuffer is the chairman, Advisory Committee at Fonix Corporation. He is a member of the board of directors of AVIOS. He was the founder and editor of the speech industry newsletter ASRNews. He received his doctorate in physics from Northwestern University in 1969.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues