The Xs and Os of the Speech Game

Football, more than any other sport, isn’t a game of individual heroics, but of overall scheme. While television broadcasters tend to focus on a quarterback’s late-game leadership or the gutsy pounding of a star running back, football is really a team effort focused on executing the coach’s game plan.

Similarly, responsibility for implementing a successful speech application—be it biometric, analytic, or a standard IVR—doesn’t just rest on the designers who created the application. How the designers, stakeholders, and systems integrators work together is incredibly influential in determining the success or failure of a speech deployment. And while deploying customer self-service for an airline differs greatly from building a biometrics solution for a bank, there are a few overarching guidelines to which it is important to adhere.

Opening Kickoff
Just as a football squad needs an offense, defense, and special teams units to win a game, numerous components are necessary to execute a successful speech solution. Not only does the speech vendor have its own group of experts to work on the app, there also might be an enterprise customer team, a telephony partner team, and an integrator or implementation team. Often, multiple companies are involved—sometimes as few as two, though occasionally as many as four or five. "I’ve worked on projects where I was the designer, but there was a designer or a customer-care person focused on design on the enterprise-customer side," says Deanne Harper, manager of Speech University at Nuance Communications.

It’s particularly important that all the staff members critical to a speech project are present at the kick-off meeting. For Harper, this means any key decision-makers, stakeholders, marketing representatives, call center managers, systems administrators, and application designers.

And of course, it’s crucial to include people well-versed in speech. "A classic mistake that people continue to make is that in order to save money, they don’t use speech experts," says Donna Fluss, principal at DMG Consulting. "You want someone who’s a speech scientist, someone who knows the vertical, and has a proven track record of success. The folks who do this well really know what’s going to work and what’s not going to work."

This initial meeting is the start of the requirements gathering and marks an opportunity to educate individuals not familiar with the idiosyncrasies of speech-related tasks. The project time frame should be outlined, project expectations clarified, necessary benchmarks and metrics agreed upon, and accommodations for usability testing and tuning ironed out. "A lot of this stuff is complex," Harper says. "It’s not necessarily difficult, but it can become difficult without good planning."

Team Drills
It’s particularly important that the individuals on the enterprise side undergo training to work harmoniously with the vendor’s designers and developers. But because vendors are hired specifically to provide services, why must the customer force what seems like unnecessary education on its own staff, especially given a deadline crunch and dwindling budget?

"You don’t go to a three-day training class and be as good as Nuance or one of the vendors of platform services," Harper says. "But you can certainly talk to them as much or more efficiently because of that three-day class. You can engage in a conversation that lets those things happen." And once the enterprise takes over the application, even a little speech training will provide a better foundation for understanding how the system works and should be maintained.

In fact, at Nuance’s Conversations 2007 conference in Boca Raton, Fla., Glen Graham, senior vice president of IVR platform strategy at Bank of America, emphasized the importance for well-trained technicians from the enterprise-customer side. "Increased speech technology usage and application development requires additional well-trained staff," he said.

"Of course, my bias is training," Harper concedes. "It’s not because I was hired to do that; it’s because I genuinely know and believe it’s critical."

Scout in Advance
Bell Canada faced a potential public relations crisis when it implemented PerSay’s VocalPassword technology for its Voice Identification Service. This new security system relies not on the passwords and PINs that were familiar to Bell customers. The primary concern was customer acceptability.

"We did some customer research to make sure what we were doing fit within our customers’ perceptions of Bell," says Charles Giordano, Bell’s marketing lead. To access their financial information, users had to speak a prompt tied to their voiceprint. Suggestions were made for tweaking and the proper scripting to use in the prompts. Gone, for instance, was the obtrusive prompt Bell is my telecommunications company. The company swiftly changed it to At Bell, my voice is my password.

"If you want to implement quickly, be ready to be nimble. Make sure the players are at the table, ready to address the key issues so the project can proceed forward," Giordano says. In Bell’s case, those issues were privacy, security, scripting, training, technology, and the way the company’s existing technology would coexist with the new.

Bell’s swift deployment underscores the importance of doing extensive research before deploying a speech solution; doing so will benefit both the vendors that build the apps and the enterprises that need them installed.

A speech analytics solution provides enterprises with a wealth of information, but what good is that information if it’s unclear what is valuable and what is expendable? Enterprises should understand the various contexts available for keywords and verify application compatibility to determine whether the solution will function in the current and future technological landscapes.

"One of the things we really talk about is requirements gathering and thinking about them in a number of categories—caller requirements, because we want caller-centered design; system requirements; business customer requirements; our clients’ requirements," Harper says. "The whole requirements-gathering, documentation sign-up phase is certainly critical."

She recalls how a colleague once spoke to a customer ready to deploy a new version of a speech application, only to realize that some recording requirements had been neglected. Harper was surprised. They had discussed every stage of the application in great detail. How could they miss something like this?

"You want to know the whole life cycle so that at every stage, you’re anticipating not only what you need to do but how it might impact something else," she says. "So again, this ended up being a bit of a speed bump just as we’re going to deployment. Suddenly someone on the customer side created a much more detailed list than we had ever gone through."

Call Smart Plays
When a traveler calls flight information at United Airlines, the first thing the system says is, Tell me the flight number or say I don’t know. By contrast, Continental Airlines asks: Do you know the flight number? Both instances require a flight number, but United’s IVR initially assumes the caller knows it. The reason for this difference is that business customers, who tend to know their flight numbers, typically travel United. Continental, on the other hand, flies a larger percentage of casual travelers who might not know to keep their numbers readily available.

Thus, research into an enterprise’s customer base allows designers to both understand and justify their choices. If changes are made in a speech application, it’s important to know the ramifications.

Even that can be a problem, though. Depending on the size of a project, there can be anywhere from three to 15 people on a team—not to mention multiple teams—so procedural conflicts are inevitable. Melanie Polkosky, a clinical speech-language pathologist and VUI design consultant, recalls a particular dust-up when she designed an application for an electronics retailer. She wanted to resequence the prompts, but the customer commissioning the project balked. So she put her design through usability testing.\

"One of the advantages of being a psychologist is we’re very good at getting data to support our assertions," she says. "And I ended up very strongly proving my case and the customer acquiesced. It was awesome. When we implemented, it was a night-and-day difference in how much better the stuff in the new application was."

Calling an Audible
With IVRs, a bad voice on an otherwise well-designed application can undermine the entire system. In a recent white paper, Harper emphasized that amateur voice talents, or talents familiar only with touchtone systems, might not be acceptable for speech applications. "The progression from one prompt to the next [should produce] a consistent and smooth effect," she wrote.

Accents can also be distracting. The best voice is the neutral, newscaster accent, experts agree.

And while different enterprises may be better served with a male or female voice, Fluss emphasizes consistency. Accents and voices shouldn’t change in the middle of the system because it can confuse callers. "You have to do what’s right for your entire customer base," she says. "And it gets more complex when you try to do something international. That’s a whole other article."

Get Benched
Benchmarks vary from project to project, so it’s important to establish early what will and what will not be measured. And these bounds should be established as early as the kick-off meeting.

"I’ve had these conversations constantly with our customers," Harper says. "Figure out what you want to measure early. You’ll recognize additional measurements later that you hadn’t fully anticipated, but everybody does as much as they can to anticipate metrics."

Even that anticipation can be tricky. It seems logical that a common IVR might have a standard metric for correct recognition. Yet such scenarios aren’t always that simple. For instance, if the IVR is a survey application asking callers yes or no questions, or if it’s an application that collects telephone numbers or ZIP codes, then a recognition success rate in the mid- to high 90s makes sense.

"But if it’s address recognition or email recognition, I can’t possibly say what that number should be," says Frances McTernan, multimedia applications speech solutions leader at Nortel. "That number is a red herring; it’s misleading. The environment is so variable and there are many things that go into it."

Like Harper, McTernan emphasizes the importance of establishing expectations up front. If an email recognition application is requested, the client will have to accept a recognition rate as low as 40 percent to 50 percent—substantially lower than the 90 percent expected from a yes-no grammar.

But it’s important not to equate a superb technical performance with customer or caller satisfaction. "Even if we can make everything perfect," McTernan says, "it still doesn’t mean the user understands it. We might be using terms the caller doesn’t understand and therefore we have low customer satisfaction."

"We have a very rigorous way of measuring because our success is closely tied to the success of call flows," says Phillip Hunter, vice president of the Voice Interaction Design Group at SpeechCycle. "We look at hard data that says how many people got into the application and how many people acknowledged to us that their problem was solved." It’s a particularly accurate measurement in that, instead of generically questioning whether or not the caller emerged from the application satisfied, it considers exactly what task the application was supposed to accomplish and how thoroughly that goal was met.

Document the Game Plan
The implicit complexities of a speech project mean that despite scrupulous planning, some problems go unanticipated and need modifications and redirections. Of course, stakeholders from the customer enterprise typically have more lucrative things to do than monitor every turn of fortune. Yet when the stakeholders return at the end of the project, they’re occasionally surprised and disappointed because the final product doesn’t represent what was discussed at kickoff.

To avoid any such skullduggery, project managers should fully document discussions and changes; they also need to get the go-ahead from the higher-ups. "The stakeholders don’t necessarily have to be at the weekly status call, but whenever anything changes and occurs, there needs to be a sign-off and it should be included," Harper says.

Additionally, keeping an accurate log of team activity will allow managers to stay on top of every aspect of an application, including what’s been updated and changed.

Run Practice Drills (for Usability)
Given time and budget constraints, testing and tuning are often first under the guillotine. This is painfully ironic because usability testing is perhaps the most important phase in developing and deploying a speech solution. "It’s something I think is so critically important," Polkosky says. "Even people with a really deep knowledge base can’t anticipate in every circumstance how somebody will react to a certain application."

Though testing is typically farmed out to for-hire companies like SpeechUsability or VocaLabs, it’s still important to test an application as frequently as possible, particularly after a design alteration. It’s also imperative to get a sampling of real callers to test the app in a live environment, as doing so often reveals problems that hadn’t been conceived in more simulated scenarios.

It’s beneficial to conduct an early-design test, when the big ideas are already laid out but before any large specifications have been committed. "Once you have a finished design and a built application, do another usability test to make sure you didn’t get off-track somewhere and correlate that to the results of your earlier test," SpeechCycle’s Hunter advises.

Ultimately, a good speech solution should be tested with an array of people who represent the enterprise’s entire customer base. "You’re not only testing your technology, but you’re also testing your script," says DMG Consulting’s Fluss. It’s easy to run a checklist of features that the system should have, "but when you’re talking about speech, you want to make sure it’s tested by a set of people that well represents your customer base."

And More Drills (for Integration)
But even after the application has been developed, several problems might arise with deployment. For instance, if information needs to circulate across multiple channels, are the enterprise customer’s databases in sync? This was one issue that Giordano faced when Bell implemented VocalPassword. "We had issues regarding the history of Bell having multiple businesses," he says. "And in order for one customer to identify himself across the business units, the databases have to read to each other."

This can be difficult considering all of the engines and platforms that run behind a speech solution. When a customer calls into an IVR, a telephony platform answers and activates an application server based on either the number dialed or the initial question that routes the caller. A client listens to the call to promote barge-in—in other words, if a caller interrupts a prompt, the prompt stops and the caller’s utterance goes immediately to the recognizer. Additionally, there might be a text-to-speech engine or a verification engine.

"Integration testing is so critical," Nortel’s McTernan says. "That’s the point when everything has to come together. And it’s the customer’s joint responsibility to do their level of testing, just as we’ve gone through and done the integration testing, to make sure those pieces are ready. It’s like putting a puzzle together. In the beginning, it’s easier, but at the end, it’s tougher getting those last pieces to fit in."

Some components are obvious, such as making sure the back-end system—a database that validates the caller as a customer—works. "What we’re also prepared for," McTernan says, "are those special cases that are out of the norm. All of life is this 20-80 rule. Eighty percent of everything will work fine, but then you’ve got this 20 percent of conditions or circumstances that will put you out of your 80 percent norm."

McTernan describes a store locator application she designed that looked good after usability testing. But she noticed a tiny percentage of interactions had long, awkward pauses. Ultimately, a handful of stores in Cincinnati had provided incomplete data—mostly phone numbers and ZIP codes. "It’s not that the information wasn’t in the database," she says. "It’s that the information was incomplete. No data is one of those obvious paths. Any developer who’s done any kind of code will always test whether it works or not. That’s obvious. This was more subtle than that."

Train the Agents
Speech applications don’t work in a vacuum. They’re often bolstered by human agents, yet it’s surprising how little agents sometimes know about an application. "If the caller gets the agent and is upset about something that happened in the application," Harper says, "it’s so valuable for the agent to understand what’s happening in the application, to be able to explain it to the caller, and to pass that information back to the people running that application."

In London, local government authorities are installing Digilog’s Advanced Validations Solution, a voice risk analysis (VRA) application that detects when individuals are lying about insurance claims. The system, however, is supplemented by a trained human agent. "Only when we have a correlation between what the technology analyzed and what the operator, through his own skills, came to a conclusion with, do we [decide whether to award the insurance claim or conduct further investigation]," says Lior Koskas, business development director at Digilog.

One in the Win Column
Customer service as a whole is incredibly complex, and a speech application, though it’s only a facet of the customer service gestalt, is also incredibly complex. Yet because a speech application is so often the voice of an institution, it behooves enterprises to devote the time and funds needed for preparation.

"The real reason for using these applications is not to improve the quality of service," Fluss says. "It’s to reduce costs. But the best way to realize the benefits is to develop them from the customer’s perspective. We should give the customer what they want to do, not what we want to do."

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Xs and Os of the Speech Game

Leena AI Launches Agentic AI Colleagues

Hyperlink InfoSystem Launches Clever247.ai Voice AI

SoundHound Partners with Acrelec

Deepfake AI Market to Generate $41.36 Billion by 2032