Peter Leppik, CEO, Vocalabs
Q. Peter, Vocalabs doesn't actually build speech applications, so what part does it play in a speech deployment?
A. Our role is to gather data about the performance of customer service systems, whether they're speech-based, tone-based, or live agents. We don't design or build speech systems, nor do we make vendor recommendations, since we feel it is very important to be as neutral as possible in order to give our clients an accurate picture of how their applications are performing.
We're typically brought into a speech project in four different places: first, when the project is initially being scoped out, we can measure what sort of caller satisfaction and completion the existing agents or automated systems are providing. This provides a baseline for comparison, and realistic goals to help a company make a rational decision about whether speech makes sense.
Next, we are often brought in to gather data about a prototype VUI, either a rapid prototype or a Wizard of Oz (WOZ) prototype. We have a service called the One Hour Assessment which provides call recordings and surveys from a minimum of 30 different callers in (as the name implies) an hour. This allows VUI designers to get very rapid feedback about how the prototype is working, so they can test several different designs while still in the prototype stage.
Frequently we're used to benchmark the performance of a new speech system just before or as it is being deployed, as part of an acceptance test. These tests typically involve at least 500 callers in order to provide accurate statistics, and make sure that the caller satisfaction, automation rates, completion rates, and other performance metrics are what they should be.
Finally, we perform post-deployment testing, often looking at the performance of the entire customer service operation including both the automated and live-agent aspects. This allows our clients to avoid "application drift" where small changes often lead to big drops in performance, and also understand where new improvements can be made.
Since we're the only company capable of measuring statistics like caller satisfaction, automation rates, completion rates, and other customer-focused performance metrics throughout the project lifecycle, our clients especially appreciate our ability to make meaningful comparisons between, for example, a prototype VUI and a live agent call center. We can tell the client (before most of the money has been spent) whether the VUI design will lead to better or worse caller satisfaction, and whether it will meet targets for things like automation rate and call completion.
Q. Please provide some examples of where you all have improved an application.
A. On average, when a client uses the data we gather to improve its speech application, we see a 9.5 percentage point improvement in call completion rates—for example, going from 80 percent of callers completing their tasks on the first call to nearly 90 percent of callers completing their tasks on the first call.
Caller satisfaction typically posts dramatic improvements, too, with the average system jumping two letter grades in our satisfaction benchmark which runs from A to D.
These improvement statistics are based on direct before-and-after comparisons of the same system, and translate into hard-dollar savings for the client. If your speech application is handling 100,000 calls a month, and you can convince 9,500 more callers every month to use it (a 9.5 point improvement), that has a huge impact on the ROI for the entire system.
Q. You're conducting a One Hour Assessment workshop at SpeechTEK West, tell us about this workshop.
A. One problem in the technology industry in general is companies which over promise and under deliver. We're not like that: we've got the real deal, and when people see how we can generate lots of recordings and feedback from live callers in real time, other test methods suddenly look inadequate.
So, at SpeechTEK West we're going to be doing live demonstrations of our One Hour Assessment in our booth on the show floor. These will be real tests of real customer service operations from companies, with nothing hidden. People at the show will see exactly what a client would see.
Frankly, we enjoy showing off like this. We've been doing this for the past two years at the SpeechTEK challenge, where we've tested each Challenge application with 500 different callers in under 48 hours. We're expecting to see some "Aha!" moments during the course of the show.
Q. How does the One Hour Assessment work? What are the steps?
A. The One Hour Assessment is designed to be a very simple way for our clients to get rapid feedback about a system. All the client does is tell us what information they want to learn from the callers, and then make the test system available for us to dial into. We have about 85,000 consumer panelists available, so when we open the phone lines for an hour, we can often get 50-100 calls and survey responses (we guarantee a minimum of 30 responses).
And that's all there is to it. After an hour, we close the phone lines, and all the survey data and call recordings are available to the client through a secure Web interface. We gather survey statistics, call statistics, and demographic data—we can even tell you if the same person called back more than once to figure out how to use the system. From those statistics, you can use our Interactive Drilldown reports to quickly isolate the problem calls and focus on those areas where callers had difficulty.
Q. Elaborate on the "Interactive Drilldown" format that Vocalabs uses for the assessment?
A. Interactive Drilldown is what we call our intuitive report format which allows clients to go from high-level statistics to individual call details and recordings in just a couple clicks.
Think of it this way: if you're performing a study with 1,000 participants, that's a lot of data and a lot of call recordings: 50 hours of recordings if the calls average three minutes. If a call went well, then it probably won't be of much help when trying to find ways to improve the speech application. What you want to spend your time on are the calls which went badly. You want to focus on the 10 percent of the people who said they couldn't do the task at all, and not spend much time on the 50 percent who said everything went smoothly.
With Interactive Drilldown, a single click allows you to focus on just those 100 calls of particular interest. You can go straight to the most important data and start finding out how to improve the system immediately, and then go back to the rest of the data later.
Q. How can customers benefit from this assessment? What are the takeaways?
A. The One Hour Assessment really turns usability testing for speech applications on its head. Usability testing used to be a major time and money sink: it could take weeks to schedule a usability test, and thousands of dollars to recruit and compensate participants—just for a dozen or so participants. This meant that a design might go through one or two rounds of testing at the most, and if more usability testing was needed, it would be a major setback for the project timeline and budget.
With the One Hour Assessment, usability testing becomes something you do as often as needed and whenever you need it. A VUI designer can literally set up a prototype, go to lunch, and have data from dozens of callers by the time she gets back, for as little as a few hundred dollars. It is even possible to do multiple tests of different prototypes in the same day.
All this results in a better speech application, completed in less time and less money.
Q. Is there a significant benefit to doing the Two Hour Assessment versus the One Hour? If so, what is it?
A. The Two Hour Assessment, and its even-bigger brother the Four Hour Assessment, is an extended version of the One Hour Assessment. The bigger the study, the more problems you uncover, and more reliable the statistical results. Where a One Hour Assessment typically gets 50-100 responses, a Two Hour Assessment is usually 100-200, which is starting to become statistically meaningful. A Four Hour Assessment can get as many as 400 responses, which is enough to reliably benchmark the system's performance.
Q. You have a new book coming out called Gourmet Customer Service; tell us a little bit about it.
A. This book is about using a scientific, methodical approach to improving customer service—whether automated or live-agent. In it, we lay out a five-step process using hard data about the customers' perceptions of the service experience to make improvements then validate those changes and look for new improvements.
So often, companies think that if they somehow manage to answer the customer's questions then they're doing a good job with customer service. We call this "Junk Food Customer Service" because—just like a bag of potato chips may relieve your immediate hunger pangs—Junk Food Customer Service may meet the caller's minimum needs without that interaction being a worthwhile experience.
But the fact is that Gourmet Customer Service - service which goes beyond the bare minimum to actually make the customer feel valued and special - is critical in today's competitive environment.
Consider this: in many industries, a phone call to customer service may be the primary (or even only) direct contact a customer has with the company. Those three to five minutes on the phone will indelibly shape the customer's perception of the company. Do customers hang up feeling like the company cares about them? Or do they hang up feeling frustrated, irritated, or fleeced?
Gourmet Customer Service lays out the big picture in a way which is easy to digest, even for someone not steeped in the lingo of the industry. It is intended for everyone from IVR programmer to marketing executive who needs to understand why quality service is important and how to get there. We lay out the important ideas and major pitfalls, but the details will always depend on a particular company's goals and budget.
By the way, we're doing something a little different with this book: we've made all the chapters available for free on our Web site (http://www.vocalabs.com/GCS/). That's because we think our message is compelling enough that people who read what we're saying will want to click over to Amazon.com and buy the paper version.