Speech and J2EE - A Foundation for More Creative Dialog Design

Creating speech applications in any environment instantly brings to mind dialog design.  You immediately start thinking about how you plan to interact with the caller via the use of speech recognition, as well as how to play back prompts or utilize TTS.  However, the manner in which you physically encourage dialog does not begin and end with audio components.  IT managers now easily draw comparisons between HTML and VoiceXML, creating expectations of seamless integration to existing IT resources to drive more "reactive" dialogs.   

Sure speech software, persona design, and call flow are critical, but the foundation for more interactive and unique speech applications is increasingly being fueled by an ability to easily integrate speech systems with dynamic data retrieval to feed a speech dialog. Integration to back-end systems can actually become the "secret sauce" for making speech applications more interactive and conversational.

VoiceXML has created a strong speech parallel to the World Wide Web.  VoiceXML has emerged as the de facto speech content delivery language where HTML is the de facto Web content delivery language.  While the differences between the associated voice and graphical user interfaces are vastly important, both channels provide equivalent access to enterprise systems and data.  The Java 2 Platform, Enterprise Edition (J2EE) environment is one of the key technologies that facilitate enterprise system integration.

Stovepipe No More

Repurposing existing data for speech is frequently a new concept for IT managers whose expectations have been shaped by the proprietary speech systems exclusively available in pre-VoiceXML days.  Speech industry standards such as VoiceXML can take full advantage of J2EE to allow designers of speech applications to more easily leverage data that already exists with protocols considered by most organizations to be standards-based.  Integrating speech applications within a J2EE environment via Java Server Pages (JSPs), Servlets, etc. provides a means to abstract an interface between the many agents desktop systems and data repositories found across organizations.  As a result, speech can become a peer with Web, chat and other resources for any IT manager. 

A Java foundation facilitates deployment flexibility for a hosted or premise-based speech infrastructure.  IT managers generally don't take kindly to demands to insert a specific type of Web server to support a speech system if they are a BEA or IBM shop or committed to another type of Web sever.  The J2EE underpinnings of VoiceXML platforms today help to reduce IT "turf battles" when introducing speech by either using an existing Web server environment or adding servers that identically match the corporate standard.

Flexible deployment of VoiceXML application servers is just one example where the J2EE infrastructure has a significant impact on speech in the enterprise. The concept of hosting has been changed by an increasingly popular deployment model of locating speech application servers within an enterprise, but leveraging the value of a hosting providers telephony network and speech resources.   This has required supporting a number of Web server brands as customers and partners have sought to build their own speech applications and/or simply maintain control of the speech application while leveraging a hosted infrastructure.  This hybrid model is made feasible by taking advantage of benefits provided by both J2EE and VoiceXML.

Terminology associated with alternative deployment models such as flexibility and remote location of application servers conjure up the specter of security vulnerability.  There are certainly a number of security issues to consider when deploying speech either as a hosted solution or on premise.  A Java foundation for VoiceXML-based speech systems can act as a more seamless programming resource for incorporating security from a variety of vendors.  For example, Java Secure Sockets Extension (JSSE) is utilized by some vendors to implement security between the varieties of servers frequently required by speech solutions.  JSSE is particularly useful in encrypting speech data being exchanged between Windows and UNIX systems. 

Beyond the Speech Browser

J2EE also provides great value by allowing speech application developers the ability to both integrate with enterprise systems and package their enterprise integration components into reusable components. This support for increased packaging of integration elements results in more efficient and faster time-to-market for speech solutions and evolves the packaging model of the past which only included dialog elements such as prompts, grammars, and call flows.  If you are looking for signs of a maturing speech market, this is certainly one of them.

J2EE promotes the concept of "containers" allowing speech vendors to more easily integrate within the IT framework.  Web server, database, and network integration thus become an effort to leverage existing resources.  For example, speech platforms take advantage of standard Web services to retrieve VoiceXML application code as well as instructions of where to fetch required prompts and grammars.  J2EE makes multi-vendor integration with BEA, IBM, Sun, and other Web servers a matter of making a JSP request for the VoiceXML browser.

Web services have become one of the most common means of integrating with databases from a variety of vendors.  XML is a common foundation for repurposing existing data for speech.  The issue is not simply the efficiency gained by reusing existing data.  Perhaps more importantly is the ability to make speech applications more intelligent, in some cases creating the perception of personalization for each caller.  For example, applications are now being created for the broadband market that instantly recognize what equipment each customer has and their billing status.  Call flows can then be immediately guided toward handling a call as a billing issue or guiding the caller through a technical support framework.

The J2EE Advantage for Application Development

VoiceXML-based speech applications provide an overall framework to not only integrate with back-end systems, but also link the speech application with agents and systems providing intelligent call routing and screen pops.  Pre-packaged VoiceXML application components with a Java foundation are not new.  In fact, legacy SpeechWorks Dialog Modules and Nuance SpeechObjects have been available for years, speeding application development with proven call flows and grammars.  IBM also has contributed similar packaged elements to the open source community with their Reusable Dialog Components (RDC).  Web developers can embed RDCs within JSPs, generating VoiceXML code.   These elements, however, have generally been more visibly focused on dialog elements.  The Java foundation associated with VoiceXML can also allow organizations to extend packaging as links to existing corporate resources, making it easier to code database queries, CTI integration and call transfers.

As an example, "ExpressWare" is the name Convergys applies to a library of VoiceXML and Java classes that include dialog and non-dialog components.  Some of these examples include application-to-application transfer using a DNIS number, Computer Telephony Integration (CTI) and access to data locally stored in a Relational DataBase Management System (RDBMS) or, more importantly, remotely accessed via integration with clients' data systems.  Embedded within each component is the Java code that makes multi-vendor integration a configuration parameter versus a custom coding effort.

More sophisticated combinations of packaging dialog and integration elements are certainly possible from speech application and tool vendors.  Examples of this are elements that could encapsulate dialog and integration components for flight information.  A flight selection element could recognize dialog for arrival and departure cities and dates, and then look-up against an airline's data system.  Results could then be organized and presented in an audio format.   Additional code could also allow a caller to navigate the list, creating a perception that the call flow was personalized.

Attention frequently defaults to complete automation where the flight information examples provided would be able to complete a call with speech recognition.  However, there are many situations where callers engaged with a speech dialog benefit from agent assistance.  Packaged VoiceXML components based on a J2EE foundation are making this effort less time consuming in terms of coding and testing

Transitioning a caller from speech automation to an agent loosely falls into the cross channel integration category.  This is a hot subject in contact center circles as organizations struggle to cope with an increasing array of access demands from their customers.  Many are exploring a Services-Oriented Architecture (SOA) with hope to take advantage of a more modular architecture where J2EE could play a role in supporting the ability to reuse or repurpose code versus customizing for every channel of access.

Integration to transition from speech to agent systems has been a reality for a long time. However, the number and variety of call routing and agent desktop systems have multiplied over time.  One way to deal with this is to provide a layer of abstraction from the speech platform to common CTI vendor systems as VoiceXML package code with a Java foundation for systems integration.  A speech vendor could thus arm an application developer with a proven means of linking a caller to an agent, delivering information such as DNIS, utterances gathered from the caller, and other details to avoid having a caller repeat his/her self.


Java-based tool vendors are providing Web and speech-oriented Integrated Development Environments (IDEs) to speed time-to-market and reduce costs.  IBM's open source Eclipse has become the foundation for some prominent VoiceXML-based tool vendors, delivering graphical speech development and run-time solutions. In some cases, a graphical approach can be used to drag and drop pre-packed speech dialog and systems integration elements to quickly build and test speech applications. 


Application servers supporting the J2EE standard are proliferating across organizations, providing the intelligence for all types of contact center and other corporate services.  The path toward more efficient use of existing corporate resources, potentially across channels, is thus available.  From a speech perspective, this provides vendors with many alternatives for application personalization via J2EE access to existing data and more efficient integration to agent and CRM resources.  The bottom line is that previous barriers of cost for custom integration and levels of effort for painful "one off" solutions will continue to erode as VoiceXML-based speech integration crosses the chasm from interesting technology to indispensable custom service and cost savings.

Steve Chirokas is senior director of products and channels for Convergys Corporation's Customer Management Group (CMG). He is responsible for product management for the Speech Solutions Group, and for managing channel relationships with organizations that are partnering with Convergys to leverage the value of their speech applications within a hosted infrastructure. He reports to Bill Andrews, general manager of Speech Solutions for CMG. Chirokas holds a master's degree from Bentley College, a marketing degree from Stonehill College and a technical certification from Northeastern University.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues