Blade Kotelly of Edify explains how and why it is so easy to keep callers playing along if the designer just makes sure to treat the caller the way they would want to be treated if they had to use that system.
Q. Congratulations on your recent move to Edify. A little over a year ago you published, "The Art & Business of Speech Recognition: Creating the Noble Voice" (Addison-Wesley, 2003), anything changed since publishing your book?
A. Well, as the quotation goes, "Plus Ca Change, Plus C'est La Meme Chose." In this industry we're always improving recognition performance, packaging up more parts of the design in bigger and bigger components, and improving the connectivity between the platform, the design, the back-end, etc. What has changed is the ability for the common person to start to play with these technologies.
Since the book came out, I started teaching a class at Tufts University, and I am continuing to do that this year and next. This class is unusual because in just half a semester I can get a class to learn the underpinnings of design and create their first speech-application. There is no other technology that allows students to make something so emotionally compelling in such a short period of time - while refining their communication skills as the by-product of making an application that sounds good. By the end of the semester I had a group of two computer-science and one human-factors major deploy a TiVo that you could program over the phone by saying the name of the show (it actually worked). In addition, another group of undeclared freshman made a fan-club line for the TV show.
Q. Speaking of Edify, what is your new role with the company?
A. I'm heading up the Edify Design Collaborative - the EDC - worldwide. The EDC is a group of senior designers and senior speech scientists who are focused on creating what I see are the three key elements of how the company interacts with its users (our clients) and the end-user. The three elements are:
- Custom-solutions for clients who need a unique solution to a business problem;
- Packaged-applications for clients who want to leverage the rock-solid Edify platform with a pre-tested and pre-QA'ed user-interface and yet need to be able to have the flexibility to turn functions on-and-off and move whole sections of the user interface around while being assured they won't break the user-experience; and
- Working with the marketing team to influence the future direction of the user-interface product suite.
The EDC is an integral part of Edify's complete methodology in providing speech solutions to our customers. Our approach for speech solutions ensures that the customer receives a total end-to-end view that incorporates the VUI/persona design through application development, back end integration, connectivity and deployment. Additionally, we provide on-going performance management to fine tune grammars and incorporate changes. We focus on our customers and in helping them solve their business issues with solutions that drive real business benefits.
Q. You designed the United Airlines flight information system which prompted Julie Vallone of Investor's Business Daily to write, "By the end of the conversation, you might want to take the voice to dinner." Can you provide us some of you thoughts on why that application was so successful?
A. Up until that point, speech-systems hadn't been designed by human-factors experts - they had been designed by people from speech labs. And while some of those laboratory designs worked, and even tested well in usability studies, early designs weren't generally designed to be fun to use or to sound engaging. When I directed the voice for United Airlines we spent almost three hours on the first 14 prompts (that's a lot of time!). Since they were the key prompts that almost every caller would hear - they had to be perfect. For instance, words like "Oh -" at the beginning of a sentence had never been used in a speech-application, yet people say them all the time to be able to re-capture someone's attention at the end of a long phrase. Also asking a voice to smile or furrow their eyebrows while recording the prompts helped to create the perception that this voice is talking to you, the caller, and not to you, the mass-audience. Sincerity is what separates the good applications from the best applications.
Q. What are a few other applications you would list as successful and why?
A. There are lots of great applications out there, and the work Edify has done is astounding - excellent production, brilliant use of capturing ANI to shorten the navigation of follow-up calls, and a very clean and compelling design. TicketMaster is an example where speech-recognition is used in conjunction with a touchtone system. Not every application needs to only use speech-recognition - and not every application is well suited for speech-recognition. It is as absurd as telling someone in 1984 that every application on your home computer should be run with only a mouse and not a keyboard. The traditional approach has great value as either a primary or a fallback technique. However, speech provides the ability to help users in ways you couldn't help them with touchtone and a way to create a branded experience that the touchtone method of interaction doesn't allow - since the psychological bond can't be as strong between the user and the application.
Q. What would you suggest for enterprises that are having problems with users not using the speech application?
A. The best strategy is to help the caller stay in control of the experience. If they think that the speech system can't help them, well, they are either right (in which case they should talk to an agent and not waste time in the speech system) or wrong (in which case they don't understand or believe the speech system can help). Let them decide and let them know the consequences of their actions. In a system I designed for Apple Computer - a user can press zero to speak with a representative. However, the system will tell them that while the user will be transferred to an agent if they press zero again, they will jump ahead of the line if they use the speech-system to route the call since the agent they would wait to speak to only routes the calls into another queue. We're telling them the truth, but letting the user decide if they don't want to use the speech-system, or are willing to give it a try.
Q. What are some of the barriers you believe exist for more mainstream speech technology adoption? What can we do to improve on this situation and create more speech solutions?
A. Speech technology isn't mainstream in the minds of the end users yet, and that takes time to change. If you ask someone to imitate how a computer talks, they say things in a monotone style - but we all know that advances in text-to-speech make computers sound amazingly natural. If you look at the television advertisement of a car with speech-recognition you either hear something ridiculous to illustrate the point (like the commercial that said "Rain, Stop." Before the user said "CD player, on") or you hear nothing at all. It's simply not yet part of the common parlance of the end users. However, this is changing and evolving. In the next 5 years, end users will start expecting speech-systems when they call a company - and that will drive adoption.
Q. How important are human-factors design issues compared to the underlying technology, such as accuracy performance for a successful speech solution? What are ways customers can improve the design process?
A. The user experience is the combination of a well designed and produced user-interface and a set of technologies that empower that interface. Great designers can overcome bad technology, but nothing beats having great technology in the hands of designers who know how to take advantage of it. No matter how good the technology is, no matter how accurate the recognizer is, or how clever the design tool is, the user-experience is in the hands of the designer. This is because only he can know how to move the caller through this psychological experience and incorporate the brand of the providing company. I've consistently found that good customers understand their business logic and know their brand. The best customers understand how those two concepts work together and can provide the best insight to the designers.
Q. Thanks Blade, do you have any last thoughts you would like to leave with us?
A. The one experience I consistently have when using speech systems that I have come across is the lack of effort the designers have put into making the first few prompts compel me to use the system. I want to zero out as much as the typical user. And yet it's so easy to keep callers playing along if the designer just makes sure to treat the caller the way they would want to be treated if they (or better yet, their mom) had to use that system. When my mom tried the first system I designed right when it was code complete, she couldn't use it easily and said "well, back to the drawing board" and hung up. I was devastated, but worked on it until I thought I solved the problem. That's how the United Airlines Flight Information system got so good. Thanks mom!