Speech Technology Magazine

 

Tools of the Trade

Accelerating Time-to-Market Speech Development Tools
By Peter Gavalakis - Posted Mar 6, 2005
Page1 of 1
Bookmark and Share

In recent years, our industry has made great strides in lowering the barriers to speech deployment by providing tools and methodologies that simplify the more challenging areas of speech. Enterprises are now able to deploy speech applications more quickly and easily - and less expensively - than they could a few years ago.

Speech tools enterprise-based application development and testing environments for standards- based speech solutions (VoiceXML or SALT) are commonly referred to as integrated development environments (IDEs) or service creation environments (SCEs). Many of today's standards-based tools have their roots in proprietary IVR systems and have established track records in enterprise voice deployments. Others are more "Web-centric," that is, they are tightly integrated with Web frameworks and development environments, and therefore may be more appealing to Web developers.

Which tool is right for you? Choosing is not always easy. Organizations need to match a tool's capabilities with their application requirements, staff developer skill sets, and existing telecom and Web infrastructures.

Speech tools can accelerate your deployment through the understanding of the following areas:

  • Impact of Web-centric solution architecture on tool offerings
  • Application development and deployment simplification
  • Role of pre-built speech applications
  • Testing tools and iterative refinement
  • Guidelines for selecting a tool vendor

Web-Centric Solution Architecture Leverages Existing Infrastructure

Speech applications have historically had a proprietary solution architecture in which the voice user interface, application logic, media processing, and telephony interface were all integrated. Choosing an application development tool meant committing to a hardware platform for media processing and telephony connectivity.

The introduction of the VoiceXML and SALT speech markup languages changed this situation. They enable a Web-centric solution architecture that allows enterprises to share application and back-end infrastructure across both voice and Web applications. In this model, application logic runs on a standard Web server and is independent of the underlying telephony infrastructure (e.g., T1/E1 CAS, ISDN, analog, VoIP, etc.).

Decoupling the voice and application platforms means that today's tools look and behave much more like tools for Web development (e.g., Microsoft FrontPage or Macromedia's Dreamweaver) than traditional voice toolkits. This has made voice technology more accessible to the Web development community.

The trend is continuing as voice capabilities become more tightly integrated into Web frameworks. Both Microsoft and IBM now offer voice capabilities integrated with their Web frameworks. In addition, many Java-based tool suppliers plug in to the Eclipse framework (http://www.eclipse.org/) and can leverage the complementary tools that also plug into this framework. (See Figure 1a & 1b)

Speech Application Development Considerations

Two common themes emerge in the speech tools available on the market today: the extended use of GUI and the modular, reusable application components.

Extended Use of the GUI: Graphical abstraction enables developers to do their jobs more quickly and easily. In the early days of speech tools, GUI interfaces were primarily used to define call flows and dialogue states. Today they are used to build and manage more complex dialogues, prompts, and grammars, as well as in debuggers, simulators, and tuners. (See Figure 2)

Modular, Reusable Application Components:
Separating voice applications into modules simplifies development and debugging. Software modules also allow the reuse of generic components across applications. Examples include grammars, audio files, subdialogues, and even "generic" applications that are designed to be customized. For example, Angel.com allows developers to fill in a form to specify parameters for a wide range of frequently used applications.

Today more tools are adding support for modular application components and may even offer libraries of reusable audio files, grammars, and sub-dialogues. (Caution: Legal restrictions may prevent you from reusing components on other platforms.)

Application Design and Development

Dialogue Design: Dialogue design tools visually represent how a call will flow, including the various dialogue states and conditions for handling errors. Most tools also include pre-built dialogue components for common tasks. (See Figure 3)

For relatively simple applications, such as a retail store locator or password reset, a very sophisticated dialogue design tool is usually not needed. More complex applications - for example, those employing mixed-initiative dialogues - require more sophisticated dialogue design for error handling, disambiguation, and other functions.

While much of the dialogue can be specified using graphical tools, a developer sometimes needs to see and edit the actual code. Because VoiceXML and SALT are XML languages, developers can use an XML editor to view and update code. Most tools provide convenient ways to program directly in these XML-based languages.

Grammars and Prompts: Many speech applications also require complex grammar construction. Even in seemingly straightforward applications, such as directory assistance or voice dialing, grammars need to support many possible pronunciations by callers with different accents. Vendors have several approaches available including a "road map" paradigm, tree-structure notation, or a spreadsheet format.

Similarly, applications may utilize TTS, prerecorded prompts, or a combination. A prompt tool can allow the easy creation, editing, storing, and managing of prompts. Tools simplify the task by hiding verbose syntax from the developer and providing concise representations of the designs so they are not distributed across several pages of code. (See Figure 4a & 4b)

Testing and Debugging
No tool is complete unless it provides effective ways to simplify debugging and testing that move applications out of the lab and into trial deployments as quickly as possible. Because Webcentric voice applications run on a separate platform from VoiceXML or SALT servers, many tools include software that simulates a voice gateway and allows testing without an actual connection to a telephony infrastructure. (See Figure 5)

Many tools use rehearsal facilities that enable a developer to read prompts (rather than invoke the speech synthesizer) and type responses (rather than invoke the speech recognizer) to test the dialogue. For example, both TellMe Networks and BeVocal offer rehearsal tools in their online Web development environments. Some tools can even detect missing parameters that integrate dialogues, event handlers, grammars, and prompts together.

Logging and debugging capabilities differ. Some allow definite breakpoints within an application to stop execution at pre-determined places to assess progress. This is particularly useful for applications that generate content dynamically.

Accelerating Deployment with a PreBuilt/Packaged Application

Increasingly, vendors are offering "packaged" or "pre-built" speech applications, which can be very helpful in deploying some applications quickly and less expensively. Pre-built applications are available for many common "horizontal" functions such as voice dialing, password reset, and retail store location. Some vendors focus on pre-built applications for specific vertical markets. For example, Apptera offers a set of pre-built applications targeted at the financial industry.

A common misconception is that pre-built applications limit customization or require the purchase of expensive professional services for customization. In reality, many pre-built applications come with convenient tools that allow customization.

The pre-built application space is developing rapidly with new applications from different vendors coming on the market frequently.

Post-Deployment: Tuning, Logging, Monitoring

Every speech application requires tuning. Usability testing measures both objective performance and subjective preference scores and must be done iteratively in order to continually improve the caller experience. Logging, tuning, and testing tools make these necessary chores easier by providing event logging, event viewing, reporting, and automated analytical tools.

Three steps usually need to be taken after an application is deployed.

  1. Uncover performance or usability problems. Improvements may need to be made due to user feedback or increased call volume (e.g., during a product promotion).
  2. Discover the root cause of each performance or usability problem and determine how can it be fixed.
  3. Check that the fix really solves the problem - and does not introduce new problems!

Effective tuning begins with event logging. Most tools include automatic logging capabilities, and some allow customers to pick and choose the events to be logged (and the level of detail captured). (See Figure 6)

Because manually sorting through detailed event logs can be extremely difficult, tools that have event viewers that present the information in a more easy-to-read format can be great time-savers. Also important is the ability to create reports, which help in data analysis. Some tools even offer automated ways to analyze data and make adjustments to the applications.

Developers should also use tools for measuring user preferences. These tools ask users to perform one or more tasks, and then allow users to evaluate their experiences. Testing companies often provide this service, but usability preference evaluation systems are also easy to buy or build.

Guidelines for Selecting a Vendor

The following things should be kept in mind when selecting a tool vendor.

Vendor Ecosystem: Are you planning to build, deploy, and manage everything yourself? If not, you will need assistance. Some tools suppliers may offer direct support for large customers, but smaller organizations will need to look to the channel for support. Who can you call on for professional services and onsite installation? Many tools suppliers have robust ecosystems of dealers, OEMs, and systems integrators that can assist in specialized areas such as PBX integration, CTI, and other professional services. Be sure to ask upfront.

Standards-Compliance: If the tool you are considering generates VoiceXML or SALT code, does that code work seamlessly with any VoiceXML or SALT server? Most tools vendors certify interoperability with a limited set of voice gateway vendors. Check first, or you may unknowingly lock yourself into a limited set of voice gateway providers. If you are deploying a VoiceXML solution, the VoiceXML Forum offers a platform certification test. You may want to see which voice gateways have passed it.

Know Which Product You Are Evaluating: Is your vendor selling its own tool or private-labeling one from another vendor? Understanding a vendor's go-to-market and OEM strategies helps you make a solid evaluation.

Advanced Feature Support: Does the tool you are evaluating provide functionality beyond what the standards specify? Both VoiceXML and SALT define methods for extensibility - the tag in the case of VoiceXML and the tag in the case of SALT. Tools may employ these methods for functions, such as advanced call control, CTI, speaker identification, and database access.

Match the Tool with the Capabilities of the Developer: Understand the capabilities - and limitations - of the people building the application. Some tools clearly target experienced IVR developers while others target Web developers. Some tools are more suitable for novice developers while others are more appropriate for experienced developers. Be sure to match the tool you will be using with the capabilities of those who actually build the application.

Understand Your Existing Web and Voice Infrastructures: The VoiceXML and SALT solution architecture straddles the voice and Web environments. Do you understand your existing infrastructure in both of these areas and can you assess a tool in terms of its compatibility to each? You telephony infrastructure links through a voice gateway. After learning which gateways your tool supplier supports, you should find out which telephony environments these gateways support. Similarly, a tool must fit easily within your existing application server, administration and management, and back-end data infrastructures.

Align with a Viable Company: Will your tool vendor be in business a year from now? It is important to choose a vendor who will be available now and in the years ahead.

Here are two final tips for guaranteeing your success. First, keep in mind that a tool is no substitute for good application design. If you do not have expertise in-house, you can take advantage of a variety of design services available from the ecosystem.

Finally, remember that you can never do too much testing - unit testing, integration testing, performance testing, usability testing, stress testing, pilot testing, and ongoing monitoring. With so many excellent new tools for developing speech applications, testing can easily become the primary stumbling block to the successful deployment of world-class applications in your organization.


Peter Gavalakis is a product marketing manager in the Intel Communications Group. He is currently driving several Intel initiatives intended to advance the deployment of speech solutions. Gavalakis can be reached at: peter.gavalakis@intel.com.

Page1 of 1