Speech Technology Magazine

 

Baselining: Making the Case for New or Upgraded Speech Applications

New applications based on speech are redefining how enterprises deliver support to their employees and customers. Speech applications are moving quickly into the mainstream, with numerous applications identified for business process automation.
By Nathan David - Posted Jan 6, 2005
Page1 of 1
Bookmark and Share

New applications based on speech are redefining how enterprises deliver support to their employees and customers. Speech applications are moving quickly into the mainstream, with numerous applications identified for business process automation. With availability rising and costs declining, speech technology will have a major impact on business communications. 

Speech vendors are enabling enterprises to quickly respond to the demand by providing blueprints on how to address the Speech Application lifecycle. As with any application lifecycle, organizations should continue to re-evaluate their existing offerings. Companies investing in speech platforms must continue to provide callers with substantial improvements over existing services and delight callers to effectively modify current calling habits.

The Speech Application Lifecycle
Fig. 1: The speech application lifecycle

To effectively improve on speech application acceptance, organizations must first baseline existing speech applications from two focused angles to diagnose current performance. By combining efforts in analyzing the voice user interface (VUI) design and quantifying the performance of the infrastructure responsible for the delivery of the application, organizations can focus upgrades and enhancements with precision. Unfortunately by putting emphasis in only one of these two areas, there is a good chance that your delivered applications are easy to use but unpredictable in behavior or conversely are always available but don’t meet customer expectations in terms of performance. 

Instead of relying on opinions and gut feelings, baselining speech applications allows us to focus on data. Instead of starting out with specification and design, we can start with measurements. What’s working reliably today and what is not? How do call success rates vary by time of day, season, caller need, region and load? Where are the bottlenecks? What’s costing me the most right now?

This kind of empirical approach is what we call data-driven usability.


Fig. 2: Design and delivery of speech applications are both critical to creating a great experience for callers.

Design
Though ostensibly designed to improve productivity and provide faster customer response, many IVRs feature poor user interfaces and complex menus that cause callers to zero out to customer service representatives for assistance.  Application development begins with an understanding of the targeted caller population and their expectations. Organizations need to stay closely involved during the design and definition stage to create a successful outcome for their application.
In his book "It's Better to Be a Good Machine ...",Bruce Balentine, EVP and Chief Scientist, Enterprise Integration Group states that there are some straightforward concepts to consider when it comes to design:

  • Customer satisfaction
  • User friendliness
  • Keeping it natural
  • User experience

According to Balentine, one would think that "customer satisfaction" means exactly that — satisfying the calling need of the customer. For example, if a caller wants to learn whether a specific check has or has not cleared, then the best way to satisfy that customer is to deliver information about the check as quickly and clearly as possible. If, after discovering the answer, the same customer now wants to transfer money — implying that a composite goal initiated the call — then the best machine behavior is now to serve that newly identified goal.

Of course, the user interface challenge for an IVR is not to deliver the answer or perform the transaction—that’s easy. The challenge is identifying the user’s goal, which is to say, distinguishing it from the myriad other reasons that the customer might have called. The majority of user interaction is therefore about task identification, not about task completion.

Balentine goes on to tell us that this is indeed what the customers themselves tell us. "I call for a reason, and the best thing you can do for me is to satisfy that reason. Fast." These callers define the word "satisfaction" in a practical and personally focused way.

But we can’t manage to keep it that simple. And the reason is that we are using human language to describe our own design goals—and human language is open to interpretation. The same recognition difficulties we see our customers experience in the IVR are difficulties that we ourselves encounter in our business meetings and telephone conferences. We all mean different things when we describe our goals.
So what happens as we interact with our colleagues is that each of us "colors in" the simple outlines of customer satisfaction with our own individually-preferred decoration. "Customer satisfaction" comes to mean many things:

  • User delight
  • Positive user impression
  • Emotional bonding with the IVR
  • By extension, emotional bonding with the enterprise (i.e. branding)

In other words, we spend more time looking for ways to make the user "like" our application than we do trying to ensure that the application really works well — across all users, despite background noise, in the face of anger or confusion, and regardless of occasional misunderstandings on the part of the caller or the IVR.
These are all good goals. But they are to a degree in conflict with our IVR utilization goals, which is to say our ROI goals, and we can’t be sure where the best tradeoffs are. Part of the conflict is understandable. After all, the reason we went to speech in the first place is because our customers don’t like touchtone, right? So user "liking" is an important offset to the backlash of the past. And yet, we also want our customers to succeed quickly, and "liking" alone won’t accomplish that.

Delivery
Companies that adopt a comprehensive speech strategy will significantly improve contact center performance by redirecting customer calls to self-service and decreasing call-handling time by customer service representatives. But what happens when the infrastructure is unable to support the business’ objectives?  The entire value proposition of speech, no matter how well designed, is diminished as customer behavior turns toward manual agent intervention due to inconsistent service delivery.  By focusing on the "delivery" component, organizations can provide customers with consistent application availability and performance, thus increasing adoption of speech applications.

When evaluating the success of IVR applications, organizations traditionally look at the percentage of callers that stay within the automated systems (contained). Unfortunately, developers must have a complete understanding of this notion of "technology suppliers of success" that makes up this containment rate. As we have seen, there are VUI design issues to address but there are other reasons outside the design that cause customers to abort automation. Baselining your existing applications allows you to flush out some of these influencers.


Fig. 3: The bar chart on the left depicts a typical breakdown of calls into contained or not contained in the IVR. The bar chart on the right breaks down the detail, demonstrating that more than half of calls that were not contained went to agents due to a technology problem.

Callers are no longer surprised to have their call answered by a machine, but will quickly grow discouraged if their words are not understood.  Baselining your existing application’s infrastructure allows for you to focus efforts on providing consistent customer experiences.  There are several key focus areas when addressing these issues:

  • Recognition baseline
  • Application baseline

Recognition baselining evaluates the recognizer performance.  A baseline begins with an understanding of the targeted caller population.  By using automated baseline strategies, testing systems generate  hundreds of utterances by talking to the application, just like real callers.    Realistic customer stimulus attributes considered are:

  • Male and female speakers (age, volume)
  • Different dialects (cadence, slang, garbage "uh, umm")
  • Different noise conditions (car, conference room, airport, outside)
  • Different call quality conditions (cell phone, speaker phone)

Accuracy is measured by comparing the recognition results to a transcription of the utterances. Barge-in, speaker verification, subscriber profiles and dynamic grammars, should also be tested for accuracy with a variety of speakers and calling conditions

Application baselining can break out into two focus areas:  dialog traversal and system load. When traversing dialogs, your baseline should create and executes a series of test cases to cover all possible paths through the dialog to verify that:

  • The right prompts are played;
  • Each state in the call flow is reached correctly; and
  • The universal, error, and help behaviors are operational.

Keep in mind that applications perform differently under real production load than they do in the lab or under low call volumes.  Given the complexity and systemic dependencies within speech infrastructures, it is critical to inject load onto these systems to see how they will perform under real-world conditions.  System load simulates a high in-bound call volume to ensure that:

  • Expected caller capacity can be handled – know how many callers are coming out of the IVR simply due to systems inability to handle high call volumes
  • Proper load balancing occurs across the system – ability to avoid overloading a single system while other systems sit idle can produce negative customer experiences
  • Consistent feature/function behavior – ensure customer experience consistent service delivery without having to compensate for time-of-day quality deviations due to load or due to cell phone recognition failures


Fig. 4: This chart, from an actual IVR testing engagement, shows that error rates tend to rise as call load rises. Many systems perform well under light call loads, but fail miserably under heavy load. A pre-deployment load test can shake out potential issues.

Conclusion
The synergy of an integrated baseline approach in which logging, monitoring and testing are a comprehensive part of the design effort, and in which design is incorporated directly into the testing of delivery—both from the very beginning—cannot be overstated. Data-driven usability will not only help you answer the question, "Is it time to upgrade your speech application?," it will also guide you through the entire lifecycle of your automated system as you upgrade, tune, change, adapt and manage your IVR in the future.

Page1 of 1