A New Year

Article Featured Image

The speech industry, much like every other industry, found itself struggling during a tough 2010 economy. Some might say the downturn has come to an end, though it seems closer to the truth to say the economy is improving, yet still has a way to go.

As the economy recovers, many businesses are seeing the benefits of using speech, and many industry insiders and analysts are predicting some exciting and interesting new ways that speech will become integrated with other technologies. In particular, new speech integrations could help to change the bad reputation interactive voice response (IVR) systems have received. IVRs, themselves, will likely change for the better, hosting will become even more prominent, and smartphones will continue to become practically ubiquitous.

In addition, speech and video will become fused in the 3.0 update to VoiceXML. Jim Larson, a VoiceXML trainer and speech applications consultant, says customers will be able to both read and hear directions. “Anyone who has tried to assemble a child’s bicycle on Christmas Eve can identify with that need to use a video and hear directions,” he says.

Larson also says companies will be able to, for example, read long faxes to customers. Basically, speech is becoming a more necessary interface in many places, not just on mobile devices. Even HTML might soon be speech-enabled, allowing computers to speak to people, and people to respond with certain commands, he adds.

Dan Faulkner, vice president of product management and marketing at Nuance Communications, also says speech will be better at predictive care in the coming year. “Speech and predictive technology have reached a level of sophistication where it is now possible to understand what people want and mean, not just what they say, when they contact customer care.”

He adds that multichannel care will become even more popular because it provides a richer customer experience that includes not only voice, but text and graphics. “When you consider that by 2011, 99 percent of all mobile phones will be data-capable and the fact that 25 percent of consumers are mobile-only households, the way companies will communicate with their customers has to evolve to adjust to the new mobile paradigm.” 

Some companies and Web sites, such as Live Mocha, already have begun to use speech and video together to help those who want to learn foreign languages. Speech and video will continue to be used together in more ways in the translation field, says Bill Scholz, president of the Applied Voice Input/Output Society (AVIOS) and president and founder of speech consulting firm NewSpeech Solutions. The ability to hear exactly how a word is pronounced is crucial to the learning process, he adds, and interactive voice and video response (IVVR) will assist those seeking to learn another language.  

“The only way to do a reasonable language learning application is to be able to present a video of some sort, or at least recorded audio that can take the correct pronunciation that you’re trying to convey to your student. That’s a very good outlet for interactive voice and video response,” he continues.

Scholz is currently working on merging speech recognition technology with the kind of 3D animation and graphics usually seen only in video games. “It’s often extended to the point it can approach reality in its degree of realism, hence the term virtual reality,” he says. “It’s a fun new area.”

The Frontiers of IVR

IVR hasn’t always been a popular area of speech with the public, but hopefully change is on the way. Larson contends that because customers have been so unhappy with bad speech systems, IVRs will not only be improved, but companies also will relax more on call routing, and dialogues will become more flexible.

In addition, Scholz predicts dialogues, which have already seen much improvement, will become even better. He expects adaptive dialogues, which can change dynamically and in real time to fit the caller, to become even more important. “If a person starts talking, [the system] indicates some particular aspect in the conversation that we can relate to a content area—now all of a sudden we have a greater expectation for words that fall within that [category],” he explains.

Also, systems will be capable of detecting a caller’s emotions and know, for example, to transfer an annoyed customer to an agent. “When a customer becomes cranky or genuinely angry, we need to engage a different style of interaction with him or her than if we did if we were just trying to obtain information,” Scholz maintains.

IVRs are also being designed and tracked better, which will hopefully reduce the number of frustrated callers. Donna Fluss, president of DMG Consulting and a former IVR designer, says she never would have built an IVR without trackers to see when callers opted out, but adds that not everyone had been doing so. “It used to be that people thought it was a ‘field of dreams,’ [where] you put it in and people will come, but they’re realizing that you really do need to invest in improving the performance,” she says.

Improvement means more than what many in the industry simply refer to as tuning, Fluss adds. It’s more important to go through scripts and look at the customer interfaces and technology on an ongoing basis, she contends.

Also, as some companies market their customer service as IVR-free—as is the case with Chase’s Sapphire credit card—other companies will loosen restrictions around IVR containment, meaning callers will be able to opt out of the IVR and reach an agent more quickly, Larson predicts. While containment keeps costs lower, customer service will improve when companies let customers get to agents without hassle. “[Allowing opt-out] usually provides a better customer experience, better customer satisfaction, and more return customer business,” he remarks.

NL Versus IVR

While natural language (NL) might seem like the future—as in, what has been imagined in science fiction—its arrival would offer a system with which we can interact naturally without worrying about adjusting our speech. Larson points out, however, that the phrase is a bit misleading. “Natural language is a marketing term used by companies wanting to impress their customers. All it really does, however, is adds to the confusion and leads to disappointed users who can’t speak naturally,” he says.

Joe Outlaw, principal analyst of contact centers at Frost & Sullivan, says NL is, in some regard, still the frontier of speech; many companies, such as Nuance Communications, West, and Microsoft/Tellme, are already working on these kinds of systems.

“One of the things that the industry talks about as the Holy Grail is an application that acts like the Hal computer of 2001: A Space Odyssey—not a vicious personality, but it’s open to what would seem almost like a human conversation versus a directed dialogue,” Outlaw points out.

However, while NL interfaces are especially useful for some industries, in some cases a touch-tone interface might cost less, representing a critical component that gets weighed heavily by companies looking for the best business case possible in this tight economy.

Daniel Hong, lead analyst at Ovum, says the majority of last year’s speech applications were built with dual-tone multifrequency (DTMF) interfaces. DTMF certainly has its place, and Larson says a need will always exist for touch-tone because it’s better-suited to customers entering numerical information, such as account numbers and PINs. However, he adds that many companies have switched over to voice inputs because in some circumstances it’s easier to speak an option than to remember a digit.

And while building applications with statistic language models (SLMs) can have high start-up costs, some of those costs might be mitigated somewhat, according to Scholz, who cites new ways of gathering information for an SLM. Because their builders can look back at common characteristics from other SLMs, a company might not have to spend as much money recording new responses for the model.

“I can reach back into the dozens, hundreds, or even thousands of samples and suck them into this new language model that I’m building and save them the necessity of collecting the information again,” he explains.

According to a recent Frost & Sullivan survey of more than 300 enterprises collecting data about their customer service/contact center plans, most companies see a clear value and business case for speech and plan to build speech applications in the future. That said, many of those firms weren’t looking to change existing DTMF systems that were working well. Outlaw adds: “When we asked about their plans for building IVR/voice portal-based applications over the next 12 to 18 months, about half said they were looking to replace some applications, while 17 percent said all of their new applications would be built with speech interfaces.”

Speech, particularly NL, can handle tasks DTMF applications cannot, which can make it more desirable. One of the top priorities in 2011 will be a greater focus on customer service, Outlaw argues. “Speech interfaces can be more natural and easier to use,” he says, noting that menus can require callers to press numbers or listen to a long list of options, both of which can take awhile. With speech, he says, one can be asked a question, such as, What city do you want to fly to? and simply answer it. “Also the navigation can be more direct, so instead of having to go through nested menus to get to the thing you want to do, you can just say, I want to change my checking account, and it goes directly to that instead of saying, Do you want savings or checking?

Natural language might also facilitate more customer personalization. “No two customers are alike, so a ‘cookie-cutter’ customer care strategy won’t work if retaining loyal customers is a key business objective,” Nuance’s Faulkner says. He further expects advancements in natural language to be a key driver in personalization.

Overcoming the Recession

Even in an area with a good business case, however, the economy has taken its toll. Ovum’s Hong cites small growth from 2009, with budgetary constraints inhibiting IVR investments.

The economic downturn has slowed movement, at least for 2010, but that doesn’t mean enterprises changed their minds about the fundamental value of speech, Outlaw adds. Instead, those projects were frozen. “I thought that as the economy picked up—I also had sort of a rosy view as to when it was going to pick up—that those projects would become unfrozen and that there would be quite a bit of speech application development taking place in 2010,” he says.

Some of the work was turned on again, Outlaw elaborates, but some of it continues to wait for the economy to get better.
Even though development stalled in 2010, speech can certainly improve customer service, which makes it attractive to all sorts of companies, though costs are always an issue. One way the enterprise can take advantage of a speech solution is by opting for a hosted solution as opposed to an on-premises one. In general, hosted offerings have seen a dramatic increase, and most analysts agree they will likely continue to be a popular choice.

It’s a growing trend, says Outlaw, who calls it a misconception to think only smaller companies do hosted IVRs, even if they still keep a large in-house call center or multiple in-house call centers. “The highest adoption rates are in the 500-agent seats and above [call center] category,” Outlaw affirms.

Companies with at least some hosted offerings cite rapid deployment, flexibility, and scalability as reasons for adoption. “The most popular function with a hosted call center is hosted IVR voice applications or voice portal applications,” Outlaw says. “Speech is still relatively complex and expensive, and companies that don’t have the in-house resources or don’t want to have them look to the hosting provider to build and support those applications with their expertise. It enables perhaps a company that doesn’t have that to get advanced technology in a way that they don’t have to ramp up their people to do it.”

Essentially, hosted IVR can be easier and, to some extent, more cost-effective, since one can rent and not buy, which also minimizes the risk.

“More money was spent on hosted IVR in 2010 than premises-based IVR,” Hong says.

Larson shares that view. “Smaller companies may not have the expertise or experience to host their own IVR. Hosting companies vary in how many services they provide, and how the customer [company] manages the IVR software, backup services, and cost. When the company feels it has the expertise and experience, it may switch from hosting to on-premises by doing its own hosting,” he states. Many of the hosting IVR providers also provide continuous application improvement that is bundled into the cost of the service, Outlaw adds.

Fluss, who also points out that the largest hosted market in the contact center arena is IVR, says while IVRs used to stand alone, companies now will be turning to vendors that offer more than just a hosted IVR.

As far as call centers go, speech analytics will always play an important role, and that role will continue to become more important. It makes sense that it would: Agent performance can be improved because managers will be able to respond immediately.

Outlaw says the most powerful piece of real-time speech analytics will be the ability for the system to listen to the customer, especially as the agent performs other tasks. “Often the agents will lock into the primary reason the caller called, and they’re working away. In the meantime, the customer is chatting away, the agent is trying to focus on this primary thing, and perhaps they’re missing some things while the customer talks about three other topics,” he says.

WAC Still at Work

On yet another front, the newly formed Wholesale Applications Community (WAC) brings a lot of potential for the development of speech solutions on mobile devices. WAC—whose goal is to simplify application development by giving developers the opportunity to write apps that can be deployed across multiple platforms and multiple operators, and to address a potential global market of more than 3 billion users—now has its first application in beta, with demos due out in February.

“We’re looking to enable a really simple model where a developer can come to WAC to create an application, submit it, and then it is made available on the operator and other storefronts that are connected to the wholesale platform, says Tim Haysom, marketing director at WAC.

Because the application uses Web technology, it can be downloaded anywhere the Internet is available, which these days is everywhere—on the TV, in the car, or on a mobile device. Also, because the application is Web-based, a developer can create an interactive application with minimal skills, Haysom explains.

“One of the key things that we will be enabling is Web developers to write something using the technology and standards that they are used to,” Haysom says, “so anybody who can write a Web site can now create something that will work as an application on a mobile phone.”

That, he says, “is quite a big deal, because if you look at the number of people who can write Android apps versus those who can write a Web site, you’re talking an order of a magnitude of difference.”

The WAC is somewhat limited, though, in that some industry leaders, such as Apple and Nuance, have yet to join. Haysom did say, however, that membership continues to grow, with about 16 new companies ready to join the fold very soon.

Also, it could take some effort to make sure the application will work on various devices. “The developer can’t abdicate responsibility and say, ‘I’m writing this, and it will just work on every single device WAC supplies,’” Haysom cautions. “It’s not that simple.”

It will, however, greatly simplify development from the existing model where an Android app will likely work only on an Android phone and won’t work on an iPhone. Haysom also adds that WAC is not trying to be a standards body, but instead is looking to produce specifications that can be implemented far more quickly than standards can.

Socializing with Speech

Mobile devices, especially smartphones, are increasing in ubiquity, and more often than not speech is becoming an important interface because of their small screens and keypads. Also, users looking to interact with their devices in the car are continuing to get more speech interfaces that are easier to use and, in some states, required by law.

“We’ll see more introductions of speech technology into social media,” predicts Scholz, who says social media has become increasingly popular and the ability to navigate it by voice will be particularly helpful when users have to keep their eyes on the road.

Over the next year, more companies will develop a social media strategy for customer care, Faulkner asserts. While business use of social media is still very new, he sees changes in customers’ ideas. “According to Forrester Research, 75 percent of people would prefer to do business with a company that is involved in social media,” he argues.

Translating voice into text for text messages and email will also become more popular, says Scholz, who predicts recognition accuracy will continue to improve, even in the face of more background noise.

“It can be frustrating when a person tries to operate a speech-activated device and they happen to be in a car going 70 miles an hour, the windows rolled down with all the air rushing around. Algorithms are now being developed to increase quality,” Scholz adds.

Overall, the use of speech in cars, and improvements to the technology in other areas, will likely make speech an easier, more intuitive way to perform all kinds of tasks. That will hold true in 2011, and is likely to continue in the years ahead.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues