May 24, 2004
By Bill Scholz President - NewSpeech LLC
Features

Easing Speech Application Development With Tools

Novice speech application developers are often discouraged by the complexity of the development task. Their apprehension is reinforced by horror stories concerning the difficulties of grammar construction, mediocre dialog designs, the complexity of integrating business process with the voice user interface and the need for extensive tuning before an application performs well. Recently, however, the availability of speech application development tools has significantly reduced both the complexity of the development process and the time required to complete a development task.

Constructing a quality speech application requires the following steps:

1. VUI Design

2. VUI Development (component reuse, multi-user collaboration)

3. Prompt Creation

4. Grammar Generation

5. Back-end Integration

6. Packaging and Installation

7. Post-deployment Evaluation and Tuning

Figure 1. Steps in the Development of a VUI Application

There are more than a dozen speech application tools. Here are four specific tools and descriptions of how each addresses specific development steps.

Audium Builder¹

The Audium Builder provides a graphical user interface that permits users to create and manage multiple applications. Elements selected from the Audium Elements Library are interconnected to construct an application. Audium Elements include general functions for managing audio, currency, numbers and digits, database, option menu, date and time, phone and transfer and yes/no as well as specific activities such as credit card, e-mail, Social Security number and ZIP code processing. A developer initiates the process by selecting global application settings and a target voice gateway. Application creation is done by dragging Audium Elements to the workspace to construct the call flow. As elements are added to a project, their properties can be configured to load pre-recorded audio or text-to-speech prompts, and configured to play naturally to callers. Elements are interconnected using the GUI to assign "exit states" to reach an end goal. The developing application can be validated and published to any Java application server.

The Audium Builder addresses the following steps: 1.VUI Development, 3.Prompt Creation and 6.Packaging and Installation, effectively using a smooth, intuitive user interface.

VoiceObjects Factory²

Like Audium Builder, the VoiceObjects' tool suite provides a GUI that uses standard visualization techniques and a tree structure for dialog design and representation. Designers define their dialog flow and configure it to the required specifications using point-and-click authoring. The tool uses a concept called "layering" which includes system layers for event handling (no match, no input, help, etc.) and user-built layers for dialog properties, persona and other personalization. One click packages an application for deployment and then deploys it on a Web server. Back-end integration is facilitated using a "connector" architecture that supports both server-side scripting and J2EE code execution.

The tool is Web-browser-based and includes integrated diagnostic capabilities as well as online reporting. It uses object-oriented concepts that allow the designer to focus on assembling an optimal dialog design. The VoiceObjects are grouped into four categories: components (menu, sequence, list, summary); resources (audio, plug-in, resource locator); logic (variable, assign, if, case, loop); and action (hyperlink, call transfer, record or exit). The tool's architecture carefully separates the presentation layer from the business logic layer and from any speech engine dependencies. Web services are tightly integrated, and previously-developed VoiceXML code segments can be easily included. Server-side capabilities include speech platform drivers which ease the integration with many target voice platforms. Developers write their own grammars using a comma-delimited syntax.

VoiceObjects' tools smoothly manage the following steps: 2.VUI Development, 3.Prompt Creation, 5.Back-end Integration and 6.Packaging and Installation.

Natural Language Speech Assistant³

The Unisys NLSA is a GUI-based design tool that enables developers to define an abstract dialog or "conversation flow" between a computer system and a user. A design can be tested in its entirety, using a built-in simulator which performs iterative testing as a new project is created. And, as each component is developed, it can be tested immediately using various testing features built into the tool. Developers can deploy a completed application to any J2EE-compliant Web application container serving documents to a VoiceXML browser. A unique characteristic of the NLSA is its reliance on a state-based dialog model rather than a programming-element-based model, making the tool suitable for speech application specialists rather than limiting its use to programmers.

Back-end integration is done by completing methods in classes that reflect the state-based architecture. Optionally, developers can use a closely tied third-party tool⁴ incorporating presentation level integration, a noninvasive integration technology that allows existing applications to become re-usable components at the presentation layer and are integrated into the speech application using a bi-directional XHTML API.

The NLSA uses a patented grammar construction tool that uses the familiar spreadsheet metaphor to identify the words, phrases and tokens that comprise an anticipated response. The tool generates SRGS and other grammar formats, supporting all major vendors.

The NLSA manages these steps: 1.VUI Design in addition to 2.VUI Development, 3.Prompt Creation, 4.Grammar Generation, 5.Back-end Integration and 6.Packaging and Installation.

OpenSpeech Insight⁵

ScanSoft's OpenSpeech Insight is a tuning tool representative of similar tools from TuVox, LumenVox and Nuance that perform reporting and analysis of speech application performance. Using OSI, developers review application performance data in an easy-to-understand graphical form, and when trouble spots such as "hang ups" or frequent re-prompts are identified, they can be further analyzed and addressed quickly. Detailed reports provide important information about peak usage times, call duration, problem areas, transaction completion rates and much more. By analyzing these reports, customers can track the efficiency of the user interface and can continually improve the caller experience.

OSI is uniquely capable of managing this step: 7.Post-deployment Evaluation and Tuning.

Conclusion

The speech industry has matured to the point where commercial grade, field-proven tools are available to address and simplify every step of the application development process. Some developers will be most comfortable with Audium's element model; programmers with object-oriented training are likely to favor working with VoiceObjects; and dialog specialists may prefer the NLSA's state-oriented dialog model. Such tools contribute significantly to helping the developer deal with the complexities of speech application development.

Dr. K. W. 'Bill' Scholz is a co-founder of the Speech and Natural Language business initiative at Unisys Corporation and a member of the board of directors of AVIOS. He has written a detailed description of the speech application development process as a two-part article in the previous and current issue of Speech Technology Magazine.

¹Audium, New York, NY, www.audiumcorp.com

²VoiceObjects AG, Bergisch Gladback, Germany, www.voiceobjects.com

³Unisys Corporation, Blue Bell PA, www.unisys.com/communications/solutions

⁴Clickmarks Studio, Clickmarks, Fremont, CA, www.clickmarks.com

⁵ScanSoft Inc., Peabody, MA, www.scansoft.com/openspeech/insight

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Easing Speech Application Development With Tools

Triton Digital Partners with ekoz.ai on Voice-Cloned Podcast Ads

Leena AI Launches Agentic AI Colleagues

Hyperlink InfoSystem Launches Clever247.ai Voice AI

Mistral Unveils Voxtral Open-Source AI Voice Model