July 1, 2005
Q & A

Bruce Morse, Vice President of Contact Center Solutions, IBM

John Kelly - Over the last year, what has IBM been doing with speech and how has IBM evolved the way it is looking at speech?
Bruce Morse: Let me start in the contact center. Talking with contact center executives and managers about speech technology alone does them a disservice. Speech is an enablement capability that addresses the immediate need to improve self-service to clients, and it holds the potential to improve customer service and satisfaction while reducing operational costs. What's critical to our clients' success, however, is the ability to provide these benefits across all channels, in order to provide an on demand environment. The goal is a consistent, common customer experience regardless of the channel they choose to use: speech, Web, e-mail, fax, chat, SMS, and IM. IBM and our business partners are best positioned to help our clients provide that experience to their customers, while also reducing their operating costs, and increasing their opportunities for revenue generation.

Over the last year, we have made major enhancements in our offerings. We've worked closely with Genesys, Avaya and Cisco, among others, to allow for greater interoperability between our products. We've also become more closely aligned with them from a go-to-market perspective. We have been working to extend the Web development model to allow Web developers to do some of the speech application development, particularly in the areas of business logic and back end connectivity, and we've worked to establish a consistent J2EE VoiceXML approach for doing this across the industry. As you know, at SpeechTEK last fall, we launched the reusable dialog component (RDC) initiative, which was specifically targeted for this purpose, i.e. to reduce the complexities of and to expand the number of developers who can contribute to speech application development. The objective is to accelerate the growth of the market, and to broaden the adoption of speech. We have also been working with a number of different tools providers on aligning our tooling activities to accomplish these same objectives.

We've also seen expanded application of speech technology in embedded solutions such as auto navigation systems and connected environments such as health care and electronic program guides. Here, we're seeing the usefulness of adding a speech user interface to existing graphical user interfaces for environments and situations where audible interaction in addition to traditional touch and see interfaces is required. We call the combining of both traditional and speech user interfaces together into a single one a multimodal interface. Basically, one of the outcomes of process re-engineering has been the appearance of computing devices across the spectrum from desktops to dashboards. The re-engineering of business processes sometimes puts computing in new places where hands or eyes are busy but access to IT is critical. The addition of speech to the existing user interface of computing in these environments quickly becomes obvious.

JK - You mentioned three IVR vendor partners; are there any other partners aside from these three main vendors?
BM: Our strategy is to work with any IVR vendor who is committed to using MRCP, and to certify their interoperability with our speech offering. We happened to pick these three initially because our IGS/BCS organization had some fairly established relationships already in the contact centers. We aren't just going to stop at three. There are other vendors out there that we are currently working with, and still more that we will be working with in the future. JK - Companies are utilizing WebSphere more now than I have perceived in the past, how has that impacted your organization?
BM: It has allowed us to leverage a market-leading offering that over 87,000 customers worldwide are using today. We take advantage of IBM's investment in this award-winning platform by exploiting capabilities such as scalability, fail over, load balancing, and a common admin console. Because the WebSphere platform provides these capabilities and we built our speech offering to utilize them, we don't have to recode a lot of those underlying functions and we can benefit from enhancements being made to the application server on behalf of those 87,000 customers.

The second reason for leveraging WebSphere is because we see a bigger push to integrate contact center business processes and technology into the IT side of the house. What we call horizontal integration and services-oriented architecture have the ability to link different processes and to manage things like workflows, etc., many of which are implemented today on the WebSphere platform. So logically you don't want contact center technology or capability to exist as a business and technology "silo" outside the rest of the business and IT shop. Delivering a speech server that exploits the Web application server and tooling environment is a logical extension of the IT organization, and enables clients to take advantage of the integration work that's already going on there.

Earlier this year, IBM folded our Pervasive Computing organization, which had been focused on a number of growth initiative areas including speech, into the mainstream business. We did this to scale the business, to better leverage the rest of the go-to-market and technical development activities, and ultimately to grow and broaden our reach in the market place. We now benefit from far greater linkage from a product and a go-to-market perspective with the WebSphere organization. We can also leverage things like WebSphere business integration offerings to offer the contact center greater capabilities in workflow, applications and process modeling, and deployment for application connectors. In addition, we now work more closely with the Workplace organization, whose focus includes providing end customers access via a Web portal as well as an agent desktop. My team drives the requirements from the contact center into the WebSphere product teams for future product releases and go-to-market activities to maximize their relevance to the contact center.

JK - Over the last year have you seen a greater penetration within the telephony side of an organization or the more Web-centric side of an organization? I know one of your initiatives has been to knock down some of the silos within the organizations, but is there one side of the house that has taken to this approach more than the other?
BM: There are three contact points for engaging in this discussion. First is the telephony call center. Our strategy for engaging there includes working with Avaya, Cisco, Genesys and others. By partnering with these vendors, we have increased IBM's visibility in the contact center. In addition, the pending acquisition of Nuance by ScanSoft is increasing customers' desire for a second source vendor. It is opening up more opportunities for us with vendors and customers who are saying, "I want an alternative. I want to make sure that I am not being led down a single path here." Many of these customers will be facing migration choices, as the new Nuance/ScanSoft rationalizes its disparate product lines, and that creates opportunity for us as well.

Secondly, we are engaging with the IT side of the business. Increasingly contact centers are better integrating with the IT side. CIOs are trying to standardize the contact center infrastructure on the same IT infrastructure where possible. Our WebSphere-based speech server is getting much more attention from the IT folks, who are making a single infrastructure a requirement of consideration in an increasing number of the bids we are seeing. Certainly, the Web side has been an area where we have been very strong in the past. And customers want to converge Web and, frankly, all channels of the contact center into a consistent multi-channel contact center approach. Our market position and experience in the Web side of the business has provided us with the forum and opportunity to extend our capabilities and business discussions to other channels including speech.

Then, finally, at the business level with the CEO, COO, CMO and CFO, if you will, they are really looking at the broader strategy of the company and saying, "How do I do a better job of differentiating my company by providing superior customer service? How do I better retain and grow revenue with my existing customers because new customer capture is expensive, and switching costs are low?" and "What is my external facing customer engagement model?" IBM Business Consulting Services helps executives answer questions about how to create competitive advantage and how to improve customer relationships and still be able to reap better returns. BCS becomes the front end of the discussion about business transformation, about business process re-engineering, and contact center optimization. Self-service through speech is an important component of that overall capability.

The approach that we are taking has strengthened our position significantly because it is multi-pronged. Our partnerships help us provide a complete self-service solution to contact center executives. Our WebSphere capabilities resonate strongly within the IT organization. And our BCS team provides unparalleled contact center transformation services.

JK - The Web side of the house has traditionally been a little harder barrier for speech to penetrate - what are some of the things that you are hearing from that side and what are some things that we need to do as an industry to help improve that integration?

BM: The single thing that has been the most attractive and been sort of a pleasant surprise to the Web side of the house is again this notion that you can provide a developer environment that they are familiar with and build and reuse business logic across multiple channels. In addition, you can execute that capability through portal technology using, in our case, WebSphere Portal or WebSphere Voice Application Access, which allows you to voice-enable a Web portal. So what they find attractive is the fact that they can write much of the application once, and with the exception of the user interface, basically reuse it across multiple channels while maintaining consistency in the service available to the end user, regardless of the channel being used. The result is quicker time-to-market, cheaper cost to develop, and a consistent customer experience. VoiceXML or an XML aggregator inside of a portal can access a speech server or spit out HTML to use in a Web interface. That programming model has been an eye opener for the Web side of the house as they had not understood what they could do, and IBM, along with our partners, brought that to the table with our approach.

The second part is that increasingly a lot of the Web side are understanding the fact that there are three billion voice-centric devices out there today - including non-Web enabled devices and devices that may be Web enabled but don't provide an easy means for inputting non-numeric data. They are realizing Web access can no longer be separately managed from voice access. Users are much more demanding and increasingly expect consistency and continuity of service across all channels. They are now seeing the importance of speech and voice interfaces to the Web side of the house, with the understanding that there is something they need to be able to adapt to and leverage.

JK: What is the other side of speech at IBM?
BM: Enterprise customers are also looking for ways to improve their employees' effectiveness and efficiency by extending access to information, to transact business, or initiate workflows through a wide variety of devices. The problem is many of these devices have limitations, such as the lack of a keyboard, or the users require or prefer hands-free interaction. Speech access is a great way to deal with these types of requirements and this is an area that we are focusing on. We are leveraging our industry solution "domain" experts to identify those business processes where speech can add real value. A good example is the solution we built with a business partner, Teges Corporation, for Miami Childrens Hospital. Teges, a developer of Web-based information for intensive care units, uses IBM WebSphere® software that enables doctors making their rounds to enter or access patient information using speech, a keyboard or handwriting via handhelds. IBM's software works with an integrated clinical information system that provides physicians and other health care providers with instant access to patient data through a single Web interface. This multimodal approach provides physicians and nurses with the option of using spoken commands to access patient records and enter repetitive information. Teges' i-Rounds (short for "Internet rounds"), combined with the IBM WebSphere® technology, based on XHTML + Voice or X+V multimodal programming language, is the latest advance in medical rounds for the intensive care unit. Physicians at Miami Children's Hospital are currently using the technology, which provides the entire cardiac team with real-time pediatric ICU records.

We have great success in the embedded area. You have probably seen in the press our award-winning accomplishments in driving IBM speech capabilities into areas such as automotive telematics solutions. IBM Embedded ViaVoice powers the Honda and Acura navigation systems and four of the six top JD Powers rated voice navigation systems were powered by our embedded speech capabilities. We also provide the embedded speech inside GM's OnStar offering, not in all models today, but soon on all models.

The opportunity for embedded speech exists because of the limitations of the devices, the lack of a keyboard, the fact that people need to interact safely while driving cars and that people can't key information in other areas. We have seen a tremendous level of success in the embedded speech area with the uptake of our capabilities.

JK: IBM has a long history in speech, in the embedded side, what are you seeing that is different now from four to five years ago?
BM: First of all, the speech technology itself has reached a point of capability and maturity where it works quite well. This is the result of a number of factors that you have to deal with including: the acoustic models, the understanding of how to build acoustic models for environments, the capabilities of the actual speech engine itself, the algorithms and the amount of speech data collected to tune the algorithms. These elements have improved to the point where speech recognition is pretty good and, as a result, it provides a more attractive or pleasing experience to the user.

The second aspect is the absolute capability of the devices that these things are running on. Computing power has reached a point, not just for speech, but in general, where we have a much more robust set of capabilities. This allows the speech engine to execute on the device that much better, it allows greater amounts of data, much more grammar, and much more information. The engine thus has a larger vocabulary. It also means that the capabilities of the actual applications and business logic that you download to and execute on the device in a disconnected fashion are significantly greater as well. The richer the capabilities of the application that I can execute on the device, the more important it is that I can use speech as an interface to execute those processes or applications.

The third aspect is that the application programming environment and tools have made speech cheaper and easier to use in an application. The use of embedded APIs for embedded applications such as car navigation has created powerful speech applications that solve age old problems such as programming a navigation system. The use of markup languages such as XHTML+Voice for multimodal applications, make it easy to add speech to existing Web applications. And lastly, the use of disconnectable Web programming environments enables business transformation to reach into places where devices don't always have connectivity but need to access and create data.

So what's improved in four to five years is both the voice technology, the speech programming models, and also the power of the actual devices themselves - the capabilities, the memory, the whole bit.

JK - As a leader in the space, what do you believe the industry, as a whole, should be doing to make speech a little more pervasive than where we are right now?
BM: The number one issue is the programming model and making it much easier and cheaper for developers to create high-quality speech applications. How many developers do you think there are in the world who can write a speech application?

JK: I think that it is probably in the low thousands and that may be a little bit generous.
BM: I agree and to me that is the number one bottleneck of adoption of speech applications in the marketplace. If there are only a few thousand uniquely specialized programmers out there, every time you want to write or retool one application you have to go find one of them to write the application so that it works well enough that users are happy with it.

There are a couple of things here that we are doing to try to improve upon that. One is, while I think VoiceXML did a great job, it left some gaps in specifications that required people to fill that gap by, for example, creating their own VoiceXML runtime environments, which has fragmented the industry a little. What we are doing is working with our partners to promote Java as the application development and run time environment where VoiceXML is standard. There can still be variations of tooling that are appealing to different users, but it all logically integrates with the Eclipse tooling framework.

There are over 3 million Java developers out there, many of them developing on the WebSphere platform and if we can enable these programmers to perform just 10 percent of the coding efforts required to build a speech application, we can significantly grow the speech developer population. IBM's efforts in working with the tooling partners, driving the reusable dialogs components initiative, contributing code to Apache.org, and tag libraries to Eclipse.org…. these are ways we are trying to enable the J2EE development community with greater capabilities to do some of the programming work, so that the skilled VUI programmers can concentrate on building world-class VUIs. I think this will be the single biggest enabler to help drive broader adoption of speech in the marketplace.

We've also been addressing the programming model for embedded speech applications. WebSphere Everyplace Multimodal Environment brings the power of Web programming to embedded devices. Now, Web application developers can quickly and cheaply speech enable their GUI Web applications. Reducing both the cost and development time of using speech in the user interface is a significant change to make speech more affordable and therefore more pervasive.

JK - So do you believe that the tools are in place to have significantly more than the existing few thousand, speech application developers?

BM: I believe we have made good progress in this regard. We introduced reusable dialog components, have made open source contributions to Apache.org & Eclipse.org, a number of our partners have aligned their tooling plans to adopt J2EE as the runtime environment and better integrate with Eclipse, we are building and deploying applications based on RDCs. In terms of "have we moved 10 percent of the work over to the Web development community?" - I don't think we are there yet, but we are making steady progress. I think you will see, over the next 12 months in particular, more of the lower level programming work moving over to the Web developers.

JK: Are there last comments or any other thoughts that you would like to leave us with, Bruce?
BM: One final point. IBM, unlike some of our competitors, doesn't view speech as a point solution. Speech is one of a number of important interfaces. The contact center is the primary - and sometimes the only - way a company is visible to its customers and these customers are increasingly sophisticated and demanding. They expect consistency and continuity of service across all channels and you can't deliver that by approaching speech as a point solution. You need to look at speech in the context of a broader multi-channel approach of Voice, Web and Multimodal if you are going to deliver real value to your clients and that's what IBM is doing.

Bruce Morse is vice president, contact center solutions, IBM Contact Center Solutions, with responsibility for establishing IBM as a major software provider for developing, deploying and managing contact center solutions. His responsibilities include directing software product development, marketing, sales and partnering activities in pursuit of this objective.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Bruce Morse, Vice President of Contact Center Solutions, IBM

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

CivAI Launches AI Voice Game to Demonstrate the Future of AI

The Healthcare Industry's Strategic Advantage Is Now Voice AI