Hosted Speech: Trend or Foe?
The cost of implementing an on-premises speech solution can be steep, pushing the technology lower on many businesses' priority lists. A hosted speech solution provides the benefits of speech with lower upfront costs, standards-based applications, and multiple deployment options. What makes hosted speech so attractive is that it is not only easier on the bottom line, but also relieves the enterprise client of some of the risks associated with deploying a speech solution. Depending on the type of deployment, the vendor can alleviate the time, skills, and manpower needed to build, deploy, and manage the lifecycle of the solution.
Associate editor Stephanie Staton spoke with Daniel Hong, a senior analyst at Datamonitor; Bruce Balentine, executive vice president and chief scientist at EIG Labs; Bill Andrews, general manager and vice president of speech solutions at Convergys; and John Hibel, vice president of marketing at Voxeo. Combined, the four have decades of experience in the hosted speech solutions market.
STM: What are the most common reasons for selecting a hosted speech solution over an on-premises solution?
Andrews: One is the expertise and knowledge that you get—from speech scientists, application developers, testers, and voice user interface designers that have built dozens, if not hundreds, of these applications—with a hosted service provider who knows how to design things very effectively and efficiently. Clients really look to draw upon that type of knowledge and expertise. Second, many businesses want to focus on what they do best and allow someone else to come in and take responsibility for key elements that enable their businesses to succeed. That is why they look to outsource it. There are also financial considerations.
Hibel: We look for companies to make up their minds based on their individual business cases, but in our experience it primarily comes down to infrastructure and expertise. Some customers have infrastructures in place and strong IT departments with some experience, related skills, or the wherewithal to project manage implementing their own speech-enabled solutions. Other customers have the infrastructure but would rather focus on other things. In terms of the hosting infrastructure, it is the same argument. It is a business decision on whether you want to invest in the capital and operating costs associated with maintaining that yourself or purchasing it by-the-minute from someone else.
Balentine: I would add an extra comment involving change management. You build an interactive voice response (IVR) differently when you think you are going to lock tight on it and not make a change for a few years than you do when you think your IVR is going to be in a constant flux. The hosting people are much more skilled at being able to develop an infrastructure that is manageable and configurable. Quite often we see enterprise customers that are interested in outsourcing all of the change management complications to the hosting service provider so that they can just enjoy the benefits of the IVR. IVRs are turning more and more into the audio equivalent of Web pages, so you don't spend six months working on them and going through quality control and then not do anything with them. They are always changing, and managing change has become a real important feature today.
Hong: What everyone said is fairly accurate and it jibes with what we have seen in the market. We did a survey at the beginning of 2006, and the number one reason that respondents gave for why they have not invested in speech was I don't have enough in-house expertise to build, maintain, and support a speech solution. I think having a hosted option transcends this issue and provides a viable solution.
STM: What are some common customer concerns with purchasing a hosted speech solution? How are hosted speech vendors addressing these concerns?
Andrews: There are a couple of areas where they are concerned. The first is control. In terms of relinquishing control over a critical part of their customer strategy, they want to ensure a good comfort level and trust with the party to which they hand this over. There are a couple of layers for handling and adjusting to this: ensuring that the vendor has the most highly-trained individuals who spend time with the clients to quickly establish a sense of confidence in the solution that the team will be delivering and who give them more control and insight into how the application is provided, as well as offering them tools for measuring performance of the applications. The second thing that they are concerned about is security, particularly when you are talking about customer-sensitive information. Ensuring that you have an extensive approach toward security is very critical. One way vendors combat this is by offering a hybrid approach in which the application layer is pushed out of the data centers and onto the customers' premises to allow them greater control over the application, while the speech vendor continues to manage the other aspects of the operating environment in its locations.
Hibel: Adding to Bill's comments about control, one of the observations that I have made over the years is that with proprietary IVR systems, it was harder for customers to have control over their applications. One of the things accelerating the growth in hosting is the move to XML-based, standards-based applications, allowing customers to maintain a lot of control over their applications. Whether they are deployed in the hosting company's facility or on the standards-based architecture also allows you to deploy your application on premises and have just the telephony hosted. By moving towards the standards, it gives customers control over a significant chunk of the application deployment that they didn't have under the old, proprietary systems. The industry in general has been migrating to more standards-based, XML-based, hosted IVR.
Balentine: Certainly control and security are the two biggest concerns that buyers exhibit. Maybe I would just throw in trust and mistrust. That is very important because a hosting relationship with an enterprise is a long-term, committed partnership and there are always fears about what if we fall out of love, how are we going to handle any kinds of transition? Sometimes, those fears can slow sales or lead to cautious relationships between the enterprise and hosting people who have to have good access to proprietary information to be able to make these systems work well.
Hong: On a higher industry level outside of just speech and IVR, we are witnessing a shift in philosophy in terms of IT investment, and this is where you see the whole software as service/on-demand models gaining more traction in the market. With the mindset and philosophy shifting, it is a lot easier for companies to justify going to the hosted model of outsourcing a lot of the application development work and the infrastructure.
STM: Are there any new developments that are making hosted speech solutions more compelling to companies? If so, what are they?
Andrews: I would offer the concentrated expertise and knowledge as it relates to speech in the hosted speech provider organizations. VUI design and speech science are not get-it-off-the-street types of skill sets. Those are highly skilled individuals and for each organization to build and attract those types of skill sets has been a challenge. It is true across the entire speech industry that there are centers of excellence that have the knowledge to build, deploy, and manage very successful speech applications.
Hibel: There is a large group of traditional IVR customers who are building traditional IVR applications with speechenablement or other enhancements, but there is another group of innovators with developers who are jumping on XML-based telephony and have a more Web-centric approach. This group seems to be doing a lot of hosting because, for the most part, they come out of the IT/Web-centric world and are less familiar with the telephony that a hosting company can\ provide for them. Because they don't generally have that in place today, they are more inclined to start with hosting. We had a number of customers who started as developers (sometimes three guys in a garage) who are now multimillion-dollar customers. They started with some really cool ideas and built some really great apps, but they wouldn't have existed five or six years ago because their model wouldn't have worked very well if the standards didn't exist.
Balentine: I would add exactly the opposite—the extremely conservative buyer. They are beginning to realize now that, in certain market niches, extreme innovation is not the right strategy
and they need companies with experience and scar tissue in both touchtone and speech, that have done a lot of work with how speech can degrade to touchtone and how touchtone can upgrade to speech and what the mechanisms are for prompting users when switching from modality to modality. They recognize increasingly that this is a multidisciplinary niche market of skills. They neither can afford, nor do they have the right corporate culture, to attract multidisciplinary, creative designers that are able to manage the broad set of artistic and technical skills that are required. They are finding that their hosting relationships are giving them the best of both worlds—a look toward and access to the future, solid skills, a traditional IVR architecture, touchtone dialogue designs, maintenance, and stability.
| What Do They Want? |
FIVE THINGS THAT CLIENTS ARE LOOKING FOR ARE:
1. corporate stability: they want a partner that is stable, secure, and committed to being a leader in this market;
2. expertise: they want an organization with a center of excellence in how to design, build, and deploy very successful applications and operating platforms;
3. scalability: they look to partner with someone that has a proven track record of scaling applications;
4. reliability: they deem a proven track record of being a reliable and trusted partner very important, that the operating environment and application are consistent; and
5. security: having a good security approach is a major and growing area of focus for organizations.
Source: Bill Andrews, general manager and vice president of speech solutions at Convergys
STM: What are the various hosted speech models available to companies? Which are most popular? Why?
Hong: There are three models that we see today. The first is a traditional premises-based one where the entire solution resides on the customer's premises. The second is premisesbased managed services where the application server resides on the customer's premises and the IVR, back-end work, and routing is based in the data center of the service provider. The third is fully hosted where everything resides in the hosted provider's data center. There may be a growing number of companies that will deploy the IVR on their premises and have the application provided by the hosting provider. We are really seeing the advantages of VoiceXML here because you can disaggregate the solution stack that used to be one box in a traditional IVR and propriety system and split it among remote locations as long as there is Internet connectivity.
Hibel: The news isn't so much that one or two of those are more popular than the others, but the news is that now you have a lot of choice for how you deploy a hosted solution. This goes back to the standards enabling new architectures; it gives you the ability to map an architecture to your specific business requirement whether it is related to security or throughput or what-have-you. Customers have a lot more flexibility and, therefore, it is a lot easier for them to make a business case for hosting as a result of these abilities.
STM: What have you seen in uptake for premises-based managed services models, where the application resides on the customers' premises and the IVR and back-end is handled by the hosting provider?
Andrews: We have seen a tremendous increase in interest in that type of model. It is very attractive in a couple of areas. From a client perspective, it gives more control over the application layer and it addresses some of the security issues about having customers' sensitive information remain on-premises. It also opens up a whole new model for other application development firms. We've got some great organizations in the industry today that focus a lot on building applications and by offering them this type of hosting environment they can build applications and deploy them in a consistent and effective manner.
Hibel: It not only gives them a considerable amount of control, it gives them the feeling of control because the applications physically reside in data centers that they can control. I agree that it opens up to a broader set of vendors who can do application development, but where the application physically resides doesn't really affect who can write the application for you. The benefit is more directly related to the standards and customers can take advantage of drawing from a number of external sources and still have the flexibility to choose among any of the four models.
Hong: The premises-based managed services segment will continue rapid growth through 2010 because of the flexibility and ability to retain control over security. When you look at it as a CIO, you still want to be able to maintain as much control as possible. Although on-demand software as a service in the IT world has gained in popularity and is still making an impact on the industry, there are still the CIO pressures with having to maintain as much control as possible. At the same time, they are going down the hosted route more but they also want to have some control. This provides a level of granularity and provides a hybrid model that is more favorable in the long term.
STM: How much uptake have you seen in natural language call steering and voice authentication applications?
Balentine: Voice authentication has been a real disappointment. There has been very little uptake and it is one of those perennially promising technologies that keeps not happening. There are just so many barriers to the entry level. Nobody has found an appropriate cost/benefit balance between the value proposition for the technology and the user hassles of the enrollment, the extra database storage for enrollment, and the difficulties involved in managing voice prints. The "say anything or speak freely" technology has been real trendy lately, but there was a bigger uptake of that about three years ago because it is a more limited technology than people originally thought. There was a misunderstanding in the industry that this technology was the beginning of the future of speech and would scale up to a fully conversational, natural language environment. Instead, it is just a good call steering technology and it is quite expensive to implement, so we are seeing more and more customers now moving away from SLM and natural language and back to stable, directed dialogues. They are much more concerned now about throughput, speed of handling, performing well in noisy environments, and those kinds of things. The more exotic technologies have been back-burnered.
Hong: We are going to start seeing some voice authentication applications being deployed. The voice authentication market contains a lot of small shops that created voice authentication technologies in the past and had quite a bit of success; custo
mers who deployed those solutions did so internally. They weren't in customer-facing applications, but now, with RSV Security's acquisition of Vocent's assets, this provides a lot of credibility into that technology. A lot of companies out there were reluctant to invest in voice authentication because the vendor didn't have any brand recognition in the market, and it seemed as if they were more technology-focused than solution-focused. Now we are seeing that a lot of the vendors on the security side are finding opportunity here after having talked to their customers and done the necessary market research to gain the proper feedback. That is why I firmly believe that we are going to see a handful of large deployments among the larger firms, such as financial services.
Top 10 Requirements
CUSTOMERS LOOK FOR FROM HOSTED SPEECH VENDORS:
1. solution and product;
2. the number of customers;
3. experience/expertise in speech and technology magnets;
4. capacity and differentiators;
5. brand leadership;
6. evangelism and culture;
7. stickiness and customer loyalty;
8. financials (the vendor has to be financially stable);
9. a business with sustainable operating leverage; and
10. a partner ecosystem.
Source: Daniel Hong, Datamonitor
STM: Which vertical markets are the early adopters of hosted speech solutions and what can we learn from them?
Hong: We classify the early adopters as communications, financial services, and travel and tourism. For comunications, you have the telcos, cable service providers, wireless carriers, etc. The firms in financial services are really picking up on the hosted speech option. In travel and tourism (the airlines and travel agencies), the airlines are going to outsource everything that they can, outside of security, to reduce costs. We are also seeing an increasing number of deployments from healthcare, speech health, technology, and outsourcing markets as well.
The early adopter markets are having success with speech. Once the early adopters have success, they establish that speech is commercially viable, that it improves costs, customer service, and profit margins. There is also a kind of best-practices element here, where you are seeing customers establish best practices with the vendors to optimize certain areas within the customer service chain. With that said, a lot of the other markets— the more conservative verticals—are able to see that retail banks, for instance, are doing quite well with speech recognition and now are more comfortable with investing in speech down the road. The early adopters and the pragmatist markets are the ones leading the way and really showing that speech is commercially viable. The best practices that they form will only help to galvanize the practices for the rest of the verticals.