The Business Case for Speech in the Call Center

Speech recognition technology is "ready for prime time" and is about to move beyond the early adopter stage into many mainstream customer service organizations. Speech can be a major cost saver for inbound call centers in a variety of applications, according to several recent studies. "Unless an organization has very serious security issues, directed speech recognition can be used very cost effectively in customer service applications," said Donna Fluss, research director at Gartner Group, a market research firm based in Stamford, CT. Examples of effective speech recognition applications include package tracking, stock quotes and trading, insurance claims and travel booking. Research at Gartner has indicated that by the year 2003, 30% of new customer service voice response ports purchased will use speech recognition. "There is room for improvement with the technology," said Ms. Fluss, "but the accuracy rates have dramatically improved in the last two years to a point where they are well over 90%, and that is good enough for most applications." Ultimately, speech recognition is not just IVR with a voice, Fluss said. Speech’s real value is in allowing responses that are too cumbersome for IVR systems. For example, IVR systems are often ineffective in handling inquiries that require both alpha and numeric data because it is too challenging to differentiate between letters and numbers. Also, users find it tedious to use IVR touch tone systems for a great deal of input. ("Press your 16 digit credit card number now.") With well engineered scripts, speech recognition technology can produce a return on investment within nine to 18 months in organizations with 50 or more customer service representatives, according to Fluss. The key to success with speech in the call center is in the scripting of the dialogues. "When IVR scripts fail, and they still sometimes do," said Fluss, "it is usually because of the script. I expect that will be happening with speech recognition as it is implemented." Cost Savings
No matter how difficult it is to write the scripts, the cost savings are such that call centers almost have to adopt speech recognition. Call center managers, maybe all managers, are always searching for ways to do more with less. If a technology allows the call center to handle the same number of calls with fewer agents, then that technology will eventually be implemented. A recent study by Nuance Communications indicates that speech recognition can create savings that amount to more than 90% of the cost of the call by off loading calls from customer service agents. A comparison of the costs of customer service agents and automated systems shows how.
Key Comparisons between Agents and IVR

Live Agent

Speech-enabled IVR Port

Calls per Year



Cost per Year



Cost per Call



The Nuance study showed the time to recoup the cost of a complete speech system can be as low as two months.

Payback Analysis for a 72 port Speech-enabled IVR System

Savings per Call


Savings per Day


Payback Period (in days)


Cost of Agents
Speech recognition systems produce the greatest savings by automating self-service applications that could not be automated before, or had a low success rate. Flight information, stock purchases and other complex transactions are examples of interactions that are extremely difficult to automate using touch-tone phones. Multiple layers of menus, difficult spelling methodology, and the near inability to correct mistakes drive callers to avoid touch-tone systems and demand access to customer service agents whenever possible. A study by VCS last year found that almost 40% of all callers "zero out" (dial zero to talk to a representative) even when they know the extension of the party they are calling. This becomes expensive quickly. Labor, equipment, supervision, recruitment and training all add up to make the annual cost of an agent $30,000 or more, even if the base pay rate for that employee is only $18,000 per year.
Estimated Customer Service Representative Costs
Annual Salary


Payroll Taxes, Benefits (at 20%)


Facilities, Computer Equipment, Overhead


Supervision, Quality Control


Recruiting and Training (Annual and New Hire)


Total Annual Cost per Agent


In fact, many call centers face significantly higher costs than this, with high turnover and associated hiring costs, increasing salaries, overhead, training, and quality assurance. At the same time, agent productivity can only be so high. Due to the complexity inherent in most transactions, calls with customer service agents can take a long time. Allowing for vacation days, training days, and holidays, an agent may handle around 30,000 calls in a year. This yields a cost estimate of around $1.00 per call.

Customer Service Representative Productivity

Calls per Hour (Average length 3 minutes, 90% productivity)


Hours Worked per Year (225 work days, 7 phone hours per day)


Total Calls per Year


Agent Cost per Call


Automated Solutions
Automated solutions can provide per call costs that are significantly lower than live agents. In order to provide examples of the cost savings enabled by speech recognition, it is useful to have some estimate of the cost per call. This is found by calculating the total cost of ownership of an IVR system over the life of the system, and then dividing this number by the volume of calls the system is expected to handle during this period. The study shows the approximate development and installation costs of a 72 port system.
Estimated Installation Cost of a 72 Port IVR system
VRU and Associated Hardware


IVR and Speech Software


Application Development




Total Cost of System



Estimated Total Cost over Four Years
  Initial Annual Total
Net Installation $494,000   $494,000
Maintenance and services   $89,000 $356,000
Overhead   $25,000 $100,000
Total Cost of Ownership     $950,000
Including maintenance, the full cost of an IVR port over four years might be around $13,200 (per port for 72 ports). The annual cost of a port would therefore be approximately $3,300. This is less than one-tenth the annual cost of a customer service agent. At the same time, an IVR port can handle even more calls than a customer service agent, because it is available 24 hours a day, 365 days a year. One IVR port can handle 32,000 three-minute calls per year. The per-call cost of the system is therefore under ten cents.
The Cost Per Call for Speech Enabled IVR
Fully Loaded Cost per Port per Year


Calls per Port per Year


Cost per Call


ROI for Speech
Speech recognition solutions can provide service at around 10% of the cost of a customer service agent ($1.06 per call vs. $0.10 per call). So every call the system successfully processes could save at least ninety cents. Cumulative Costs of Agents and IVR Ports [IMGCAP(1)] With a cost differential of this magnitude, it is worthwhile looking more closely at the payback period that a speech-enabled system may have. Two variables will impact the system's ability to provide the projected level of savings. The first variable is the successful completion rate. Any customers that are unable or unwilling to use the system will need to be assisted by a customer service agent. In general, Nuance has found that this number is very low (a few percent). The vast majority of callers find speech-enabled systems faster and easy to use. In fact, surveys have shown that many callers prefer speech-enabled systems to customer service agents because they are connected right away and get the information they need quickly. The second variable is the capacity utilization rate. The system may be under-utilized to account for unusual call volume spikes or growth in the customer base. Most companies choose to over-provision the number of ports on their IVR because it is inexpensive, and ensures a high grade of service. By lowering the estimated savings to account for both unsuccessful callers and under-utilization, it is possible to create a more realistic assessment of how much a speech-enabled system can save.
Payback Analysis for Speech Recognition
Installed System Cost


Call Capacity per Day


Savings Per Call-savings of automated service over customer service agents.


Successful Completion Rate- percent of callers who perform the transaction successfully within the IVR


Capacity Utilization-percent of provisioned capacity that is used on a normal operating basis.


Realized Savings Per Day-estimated savings adjusted for throughput and capacity utilization (8860*.96*.95*.8)


Days to Payback- system cost divided by savings (per day)


Savings over One Year


That is a remarkably high savings, but Nuance maintains the numbers are realizable. Speech systems can automate tasks that IVR cannot. Some deployed customers have been able to automate over 100,000 calls per day using speech recognition. Others have been able to reduce costs significantly and have reported payback periods of well under six months. Other Benefits
The benefits of speech recognition go beyond replacing agents. Other savings can be realized through the reduction of call holding times and more efficient use of IVR resources. Callers to toll free numbers often wait on hold for several minutes. At around 7 cents a minute, the cost per call adds up quickly before the caller is even connected with a customer service agent. An automated interface will reduce holding times and save money on telecommunications. Another area where significant savings can be found is in integrating speech recognition with CTI technology. By allowing customers to identify themselves and provide other pertinent information before being connected to an agent, the agent is freed from asking routine questions like "where are you traveling to?" and "what is your account number?" This saves money by increasing agent productivity and can also be used to route callers to the appropriate agent faster. An agent’s time can cost around $0.35 a minute, while an IVR may cost $0.03 a minute. Therefore, having the IVR perform routine qualifying tasks before bringing an agent on line has an immediate payback. Shortened connect times and less agent time on the phone can combine to save hundreds of thousands of dollars per year. In addition, speech recognition interfaces can automate new types of services that were not feasible using touch tone interfaces and were too expensive to service with live agent representatives. By making information easier to access and transactions faster to complete, speech recognition allows call centers and customers to do more with the telephone. The study referred to in this article is available from Nuance Communications at http://www.nuance.com/products/roi.htm.
Directed vs. Natural
The Gartner Group makes a distinction between what they call "directed" speech applications, the term the research firm uses to describe applications from Nuance, SpeechWorks, Philips Speech Processing and Voice Control Systems, and "natural language" recognition, which is the term more commonly used by speech vendors to describe their own products. To use Gartner’s terminology, natural language recognition allows users to speak in complete sentences while interacting with the application, without direction from the system to respond with specific words or to use a set vocabulary. Directed speech means that the caller is prompted for specific responses with pre-defined grammar and vocabulary. It is Gartner’s view that directed speech is a more mature technology than natural language recognition. Their bottom line is that directed speech is recommended for call centers, and natural speech is not ready yet.
SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Companies and Suppliers Mentioned