Speech in the Call Center

Over the past several years, speech technology has evolved to become an integral part of companies’ customer service operations. The airline and stock brokerage industries led the way in adopting speech in their call center environments. Other vertical markets including health care, utilities, government, financial services and others are now following the early adopters. The need to reduce operating expenses in today’s economy is hastening this trend. The reason for speech technology’s continued emergence is clear: speech helps enterprises to save money, improve customer service and increase revenues. In some ways, call center speech applications are like symphony orchestras; there are a lot of moving parts, or “sections” that must come together simultaneously to achieve harmony and great results. If there’s one section or element that is not functioning effectively, and in-sync with the others, it can negatively impact the entire caller experience. The following highlights some of the key elements of a good call center speech application and underscores the important contribution that each element makes to the overall caller experience. It also highlights the importance of synchronizing these elements together to ensure a positive caller experience. Need a High-Quality Speech Engine as a Starting Point
Several years ago, the key question call center managers asked about speech recognition was “does speech software really work?” At that time, the answer was a qualified “yes”; industry players and early adopters knew that while the technology could perform basic recognition tasks well, further advances were required for speech recognition to be highly effective and proliferate into the mainstream customer service environment. Those advances have been made and continue to be made. Today, the latest speech recognition software from the major vendors works very well in call centers and other operating environments. The most recent releases have enhancements that can meaningfully improve the caller experience. Prospective buyers/renters of the technology that haven’t investigated these products should do so. However, just as the brass or percussion sections alone don’t make a complete orchestra, speech recognition software itself does not make a good call center application. There are other important components that must come together successfully to achieve great performance. These components include dialog design, host interface connectivity, computer telephony integration, application personalization and other elements including robust, comprehensive testing of your application at various stages. A Great Dialog Design Is A Key Success Factor For Your Call Center Speech Application
Dialog design, or the scripting of the system-caller interaction, highly influences the caller experience. You can have the highest quality speech software in the world, but if your call center speech application interface is not intuitive to callers, you’ll have recognition problems – not because of the speech engine, but because of poor dialog or prompt design. If you’re considering speech for your call center, make sure you ask your prospective vendor/outsourcer about their dialog design expertise, e.g. who are their “maestros” and what are their qualifications? Ask them to show you a sample dialog specification document from a live, production application. A dialog specification outlines the call flow, prompts, error messaging and related information. One additional important point: Although your company may have DTMF (touch-tone) system scripting experts in-house, past DTMF design experience does not necessarily qualify these individuals to be good speech recognition dialog designers. In fact, DTMF experience can be more of a hindrance than anything else. Be sure to get the expert advice you need in this area; otherwise your speech application could end up being no better than some of the bad touch-tone systems that are out there today. Wizard Of OZ (WOZ) Testing, Usability Testing And Pilot Testing Are Also Vital Elements Of A Good Call Center Speech Application
Before orchestras perform for live audiences, they rehearse to ensure that everything is running in a smooth, synchronized manner. The equivalent to the orchestral rehearsal in the speech recognition world is the combination of WOZ testing, usability testing, pilot testing and tuning. Without carrying out these important tasks, your “performance” is doomed to failure. We noted above the importance of good dialog design, but the fact is, no matter how good you believe your our dialog design may be, you must test your assumptions with real callers before you roll out your application in a production environment. That’s where testing and tuning come to play. These are good ways to catch (and fix!) potential design and recognition problems in the application before going on center stage for “live performance”. Take some time to “rehearse” and “tune your sound” - you can’t go wrong! If you need help or advice on how to carry out the above-mentioned tests, ask your prospective vendor or outsourcer for assistance. Computer-Telephony Integration (CTI) Can Help Make Your Speech Application More Caller-Centric How many times have you called your credit card company, punched your 16-digit number into their touch-tone system, and then waited on hold for 10 minutes only to be asked by the agent “can I have your 16-digit credit card number please”. This isn’t a function of the fact you have used touch-tone, but rather an indication that the company you’re calling has not integrated the “voice” and “data” components of their customer-serving applications. Most call center speech applications, (like the touch-tone example mentioned above), require operator backup for numerous reasons. As you contemplate implementing speech in your call center, look for an opportunity to use CTI to “close the integration gap” mentioned above, so that operators can have access to the information your callers speak into the speech system - and don’t have to repeat the same questions when initiating their portion of the call. In addition to providing a fantastic caller experience, the speech-CTI combination helps to shorten the operator-handled portion of the call, which in turn saves you money. One final note on the topic of connectivity to the call center: In addition to CTI, another way to facilitate a smooth transfer of a caller from a speech recognition application to an operator is to minimize the caller’s “connect to operator” time. There are a number of technology vendors who provide sophisticated, network-based call routing functionality for call centers. Check with your internal IT group or ask your speech vendor/outsourcer for more information. Enhanced call routing functionality can positively impact the performance of your call center speech recognition application, and the way your callers perceive their self-service experience. Host Interface Connectivity is Required to be Able to Perform Meaningful Transactions Using Automation
Without host connectivity, your speech application will be of limited value to your callers and your company because you won’t be able to offer callers automated access to real-time, accurate information (e.g. account balances) or the ability to transact business (e.g. buy airline tickets) using self-service. As you proceed with your call center speech project, be sure to seek some expert advice and assistance on host interface and CTI-related matters; they’re critical components of an effective, caller-friendly, self-service speech recognition system. Personalization of Your Speech Application Can Also Enhance the Caller’s Experience
Just like orchestras sometimes tailor their repertoire for specific audiences, so should a good call center speech application be tailored to address specific subsets of callers. The more the experience can be customized, the more positively it is likely to be perceived by the audience. You can personalize your speech application for callers by using data attached to their Automatic Number Identifier (ANI) and/or their specific account information (member number, etc.) as the “key” from which to tailor the remainder of the call. (ANI is the technology that allows you to know who specifically is calling you on the telephone e.g. your spouse, a particular friend, etc.). For example, if a hotel chain can detect when their frequent guest number A58W (Melinda Lopez) is calling their reservations line, they can use the customer information attached to her ANI to make proactive suggestions, e.g. “Do you want your King-sized bed on the 18th floor again, Ms. Lopez?” Not only does this make for a better caller experience, but also it’s also much faster than going through the question and answer session that would normally be required to collect her accommodation request information. Eliminating the need to ask callers a series of “identifying” questions can reduce call length by 30 to 40 percent in a typical speech self-service session and help save you some money as well! Your system won’t always be able to detect the incoming ANI, and even if it can, there is no guarantee that the person calling from that ANI is the person you think it is. However, the point here is that in over 60 percent of the cases, ANI will correctly identify a specific caller, enabling you to offer them a customized speech session which will make for an improved caller experience and, in turn save you money. If you cannot collect ANI, or there is no customer data linked to the ANI, the next best thing to do is to have the caller speak their self-identifying information (account number, frequent guest number, etc.) before proceeding. By collecting this information up-front, you still may be able to avoid asking extensive identifying questions and shorten the call time for callers. In addition to host interface connectivity, CTI, ANI use and other techniques and factors mentioned above, here are some additional things you can do to ensure you have a caller-centric speech recognition application in your call center:

Offer Confirmation Numbers After Callers Have Performed Self-Service Speech Transactions
Unless you offer transaction confirmation numbers, you will find that at least 50 percent of your customers who have performed self-service speech transactions in speech will call back to speak to a live agent to ensure that their transaction was completed (e.g. that their money was received, etc.). Callers are accustomed to receiving confirmation numbers from call center operators, so be sure that your speech recognition system offers the same feature!
Make Sure Your Operators Know About The Speech Recognition System
This way, they will be able to offer intelligent, accurate responses to callers who may ask about a particular transaction they just did in the speech system or who may have more general questions about the system. The more your operators know about the speech recognition system, the more streamlined the entire transaction will be for callers.
Carefully Plan, Proactively Manage And Continually Improve, The Speech System
Make sure to do detailed capacity planning and provide for a fully redundant speech application so that callers won’t be inconvenienced by busy signals, ring-no-answer scenarios and messages that your speech system is “unavailable”.

Also, don’t rely on customers to report problems with your speech recognition system – manage quality proactively by making regular test calls (using an automated and manual systems) to ensure the speech recognition application is functioning properly. It’s also a good idea as well to survey callers who have used the speech system and make improvements based on their feedback (and information acquired via meetings with the call center agents, etc.) This way, the system will remain functional and fresh – and self-service usage will continue to grow. Designing and developing high-quality call center speech recognition applications that callers will love and use frequently requires strong focus on the caller experience. Some key elements that impact a caller’s experience and perceptions include the quality of speech software, dialog design and call flow, and the amount and type of testing and tuning done on the application before it’s rolled out in production. Additionally, there are some critical, non-speech related factors that can impact the caller’s experience, including your use of CTI (so callers don’t have to repeat themselves to operators), host interface connectivity (so callers can retrieve information and perform transactions real-time), use of ANI and other caller-identifying information (to facilitate a more concise, personalized call) and the use of confirmation numbers following speech transactions. Outstanding call center speech applications integrate all of these elements and more to provide callers with a well-orchestrated audio experience. Bruce Pollock is a speech recognition consultant with West Corporation. He can be reached at brpollock@west.com.

Speech in the Call Center

Eltropy Expands Voice Authentication Ecosystem with Illuma, IDgo, and Pindrop

Modulate Expands Velma with Voice-Native Real-Time Conversation Intelligence

Corti Launches Symphony for Speech-to-Text

Why Voice AI’s Next Big Challenge Isn’t Accuracy. It’s Relationship Design.