In the 1990s, before the dot-com bubble burst and the Internet seemed an untamable, untapped wilderness, everyone wanted to cash in. It wasn’t just start-ups or venture capitalists looking to make a quick million. It was flocks of regular people, too. But rather than devising schemes to harp on e-commerce, the latter thought of the Web as another channel through which they could express themselves. Personal Web site providers like Angelfire, GeoCities, and AOL made that possible and inspired various demographics to stake their acreage within the Web landscape. Of course, all of this freedom came with a caveat: Sites created by amateurs weren’t guaranteed to be perfect. Design purists recoiled at animated .gif files, seemingly seizure-inducing color schemes, and pages upon pages of information of little use to anyone but the site’s creator. As the Web evolved, so did the demand for better content and site design. Industries like graphic design and Web software development answered the call. Lackluster sites remained, but most people chose to go to the pros.
The same can be said for voice user interface (VUI) design. While designed using different code than the Web, and harnessing the flow of dialogue rather than visual information, VUI design still works under the same mandates laid down by the Internet generation. Should a VUI project be left in the hands of novices, or should an interactive voice response (IVR) system be designed by the pros? That is left to a business to decide, but both sides of the argument refuse to go down without a fight. While some VUI consultants push for a stronger awareness of their field (primarily that there is much more to designing an IVR than meets the eye), some companies think the IVR could use some democratization. The argument for speech may seem nearly won, but subindustries within the area haven’t quite come to a conclusion.
VUI design, by which an organization creates the dialogue, flow, and responses used in an IVR powered by automatic speech recognition (ASR), is still a hotly contested area. And for good reason; both sides of the argument say VUI design can determine whether an IVR fails or succeeds. In the 2004 book Voice User Interface Design, authors Michael Cohen, James P. Giangola, and Jennifer Balogh state that "The VUI is perhaps the most critical factor in the success of any ASR system, determining whether the user experience will be satisfying or frustrating, or even whether the customer will remain one."
Statements like that are not rare within the industry, but what remains to be seen is whether members of a regular IT team can learn to design an IVR as successfully as a professional. Will they try feverishly to learn the skills needed and create something usable? Or will their IVR end up just like those early Web pages, full of bells and whistles, neither of which takes the user anywhere?
Susan Hura, founder and principal at SpeechUsability, says the chances of success in DIY design are less than average.
"People can try to do it themselves, but I think the chances of the average person, with no training in speech or human factors and no understanding of how people use conversation, are pretty small," Hura states. "There’s a whole technical side of things. I can’t imagine writing a grammar with no knowledge of speech acoustics. Many of these things require some background; if it works, it’s luck."
First Impressions Matter
Though other channels, such as the Internet, allow the end user to establish contact with a company, the phone remains the most popular point of contact for consumers looking to solve a problem. While they may have tapped a company’s Web site for background information, they come to the call center with unresolved questions or tricky situations. Therefore, the consumer’s first interaction with a company is typically its IVR. So when designing dialogue flows, Eduardo Olvera, senior user interface designer at Nuance Communications, thinks it’s always smart for a company to put its best foot forward. For him, that means bringing in the big guns to do some of the work.
"If a company is deploying [an IVR] as the first application they’ve done, for me it’s really important to engage someone who has some experience," Olvera states. "They will know what to expect, and chances are most people won’t know what to expect. Really deploying an IVR and introducing it to consumers is straightforward at this point, but if you’re introducing speech, you need to know or have someone in your team who knows how to introduce that to customers."
This, however, is an ideal. While companies may throw money at marketing initiatives, the IVR is not always viewed as a means to drive customer loyalty or retention. Therefore, the IVR can sometimes be an afterthought, something simply assigned to the same developers who crafted the system, but who may not have experience with VUI design. Until recently, VUI designers or IVR developers had no tools with which they could graphically express their ideas. Then came products like Voxeo’s Designer and Microsoft Speech Server, both of which allow companies to build an IVR dialogue flow using a graphical interface. Users can literally drop a transfer to operator command within a call tree being constructed on the computer screen. While the products help democratize the space, Voxeo and Microsoft have different views as to how their DIY products should be used. For Voxeo’s Jose deCastro, Designer’s lead architect, it’s all about companies getting their foot in the front door.
"These professional services companies are in abundance, and there is a lot of possibility to negotiate per volume," deCastro says. "Either way, there’s a pretty high cost to entry. What Designer is about is that we allow people to come in with low cost-to-entry, prototype their applications, get [an application] to 80 [percent] to 90 percent to where they want them, and then have professional services come in."
Another way to approach the topic of DIY VUI design is by comparing VoiceXML to HTML coding or a program like Visual Basic. While VoiceXML is the language with which speech technology developers design IVR systems, it is not as well-known as HTML. Some, however, view the skill set as transferable. This is what Microsoft’s Speech Server developers thought when rolling out their DIY IVR software. Like Voxeo’s Designer, Speech Server lets customers build their own IVR, but rather than push for professional services after developing the system, Microsoft wants its users to become educated. Albert Kooiman, senior business development manager at Microsoft Speech Server, says his company wants to break down the borders of the IVR’s old guard.
"If you look at [Speech Server] now, the people we are targeting are not the typical IVR telephony people; we are very much targeting the .NET developer—the people who use Visual Studio day in and day out," Kooiman says. "The message we’re trying to get across to them is that everyone who can write a Visual Basic, .NET, or C-Sharp program can actually build speech IVR apps for self-service over the telephone."
Statements like this make designers like Olvera nervous—more inexperienced people creating poorly planned IVRs.
"When [Microsoft] released Speech Server, that was their motto: ‘Our Web developers can now develop your speech applications,’" Olvera says. "They really believe that since you’re a developer you can develop a VUI application. It’s very different. One is visual and one is audible."
Bolster Your Staff
The best laid plans in VUI design and IVR deployment are the most simple. While the move to speechify a system may gain momentum quickly, experts advise taking things slowly from the beginning. Rather than rush to automate every aspect of a business, Olvera says a company needs to learn how to do basic functions first. This holds especially true for small to midsize businesses (SMBs).
"If you’re an SMB, don’t try to do everything at once," Olvera notes. "Concentrate on the low-hanging fruits—the features that give you the most bang for your buck, that you can automate well— and just go with that."
He also stresses keeping the technical aspects of the implementation in mind. Larger companies, for example, may face extra challenges in integrating the IVR with other systems. In this case, Olvera says it’s usually best to bring in the vendor that sold a company its systems because the vendor has more experience with its own products. SMBs, however, usually require only minor tweaks that an independent or freelance consultant can handle.
The type of IVR also depends on the level of involvement needed from outside sources. While natural language and directed dialogue require more planning, systems like dual-tone multifrequency (DTMF), which operate using touch tones, are often less complex. Hura concedes that most DTMF systems do not require outside help, but the more labor-intensive speech systems do.
For large companies deploying new IVRs, Hura also recommends assembling an in-house speech team when building a new system. In this case, a business could even hire someone who has experience with VUI design.
"Having someone who has ownership of the IVR within your company is a really important thing," she says. "One of the dangers of just going with the [outside] experts is that they hand over your stuff, they get your app up and running, and then you don’t have anyone on your staff who says, ‘This is mine,’ who is the primary form of contact. If you don’t choose to do it yourself, you still do need to have somebody who is going to be the primary person [in charge of the IVR]."
Bolstering your speech team also means keeping them up-to-date with developments within the industry. While Microsoft often points Speech Server users to training seminars by user interface designer Bruce Balantine, Hura instructs speech teams every year at the SpeechTEK conference through a program called Speech University. Even Microsoft, which preaches DIY as a means of freedom from vendors, says strong training is always important for strong deployments.
"[We] encourage people who build speech apps to take training in how to build a good VUI," Kooiman explains. "That doesn’t mean, however, that you cannot build a VUI without having gone to that training."
Why? Because, he says, programs like Speech Server, which retails at $699, are easy to use out-of-the-box. There is one major component of successful IVR deployments, however, that can never be underestimated or replicated with software: usability testing.
Metrics Matter
Usability testing determines an IVR’s ease of use and effectiveness with real end users who put a system through its paces. While not a replacement for in-house or outside VUI expertise, usability testing can help a company find the cracks within its IVR’s dialogue flow and test for overall functionality. As Nuance’s Olvera explains, companies must first determine metrics before beginning the tests, which he suggests take place before deployment.
"Have a clear set of metrics in place; normally they leave those to the end," Olvera says. "When you’re starting something new, you need to measure the success of the application. It forces you to think of what you’re trying to achieve: higher automation rates, customer satisfaction, or reducing transfers. Each of those things can be designed in very different ways. [Testing] will guide your design in very critical places, such as how many layers of transfers you will offer."
The chances of running top-notch, in-house usability testing are somewhat unlikely, according to Hura. While metrics can be difficult to determine, even more trying is analyzing the data in a way that leads to tweaking the system to make it more efficient and effective based on user feedback.
"If you can measure your metrics accurately, I don’t see why you’d need to hire an expert," Hura says. "But I don’t think the chance of you, in-house, running an accurate and reliable usability test is likely."
Still Divided
With so many disagreements within the speech world regarding the best path for VUI design, it seems unlikely that there will be a middle-ground conclusion in coming years, if ever. Companies like Microsoft and Voxeo have opened the door for organizations that could not otherwise afford outsourcing both the technology and consultancy. But industry experts and human factors specialists worry that more sloppily designed IVRs will lead to continued user distrust and lack of faith in the systems. Some, like DTMF, remain straightforward and do not need the same level of attention as completely speech-enabled IVRs. But with increasing pressure to adopt natural language and directed dialogue, more complex IVRs call for more professional services.
Kooiman still stands by the belief that outside speech professionals aren’t all they’re cracked up to be. He points to sites like GetHuman.com, which rate IVRs based on ease of use, noting that many of these systems were designed by outside vendors and consultants.
"All those people who claim to be the experts, they still have a lot to learn as well," Kooiman explains. "[GetHuman-rated IVRs] are sometimes built by those high-paid professionals, and still there’s a lot of crap out there."
This type of view, however, is not legitimate today, according to professionals like Olvera. Contact centers are no longer about just coding an IVR; they’re a tangible representation of the company’s philosophy and customer service abilities. And the phone is evolving. Today’s IVR could be tomorrow’s doorway into multimodal interactions, in which the Web and telephony further converge. And for that kind of sea change, early adopters better be ready to harness the resources of outside professionals.
"To me, there are a lot of things you need to be aware of—human factors, usability, linguistics, cultural settings, and localization, especially if you’re planning to deploy something in different regions or languages," Olvera states. "Marketing and branding—what sort of experience do you want to give those callers? As of late, in terms of how the Web and phone are merging, multimodal interactions are gaining popularity. That’s a whole new set of skills a VUI designer must have."