Speech Technology: Science Fiction Gives Way to Real Value

Those working in developing basic speech technology have sometimes noted the fundamental nature of speaking and listening in human activity. Automating that capability should provide significant value in reducing costs and improving productivity for individuals and companies. The value for customers will automatically drive opportunities for vendors.

The problem with this vision is that it leads to a science-fiction view of speech technology. How often have writers compared some speech product to HAL in the movie 2001 or the ubiquitous computer in Star Trek? How many reporters have had fun with dictation software by reading it nursery rhymes and laughing at the result? Too many.

Human speech interactions depend on a foundation of experience and understanding that goes beyond today’s speech technology. Laboratories are energetically working on "natural language interpretation" and "artificial intelligence," and that research will eventually augment speech technology, but today’s commercial speech technology has achieved success by focusing on solving specific problems rather than trying to solve the problem of human communication.

The Buyers’ Guide in this issue of Speech Technology Magazine outlines the way the marketplace has defined the practical use of speech technology. Are you running a call center and need to reduce costs while improving customer service? Do you need to extend Web services to the telephone? Are you running a hospital and need to reduce the cost and delay in transcribing medical reports and speed up billing? Have you developed disabilities that make using a keyboard or screen difficult? Are you struggling with making all the new features in the smart phone you manufacture easier to use? Are you with an automobile manufacturer who wants to add features but must make them usable hands free? Are you a telephone service provider that needs to create services that retain customers or generate additional revenues? Do you need to monitor calls in your call center for business intelligence? Are you concerned about the cost and damage to your company image or fraud and need a voice biometric that works over the telephone? Do you run a distribution center and need a more efficient way for workers, whose eyes and hands are occupied, to pick merchandise for an order? Are you an entrepreneur that sees a business in over-the-telephone services that the voice user interface makes possible? In all these environments and more, speech technology has shown it can deliver.

Core technology
Once you determine you need speech technology, you have more choices than ever before. The core technologies available to you include:

  • Speech recognition: Interpreting what a person said or representing it as text;
  • Text-to-speech: Speaking text from sources such as email or a database;
  • Speaker verification: Authenticating a claim of identity using the biometric qualities of a voice;
  • Audio scanning: Finding parts of an audio file (attached to a video or otherwise) that contain specific content.

Variations in these technologies include the platform on which the software runs. Speech technologies are available in server configurations for telephone applications or for sharing across an enterprise. They can be standalone, running as a PC "desktop" application or as "embedded" software on a small device. Compatibility with operating systems, specific processors, or standards can affect a buyer’s decision.

Options for buyers
One of the barriers to faster adoption of speech technology is the number of options that a buyer can consider in making a decision, including specification of the application, tools with which the application is developed, choosing the core technology suppliers, and choosing the delivery platform. The decision must consider more than the initial application and its cost: continuing maintenance, tuning, and the ability to grow in capability are key considerations.

Customers that want a unique application and full control over its evolution may want to invest in tools and staff training. At the other extreme, a customer may find a hosted, packaged telephone application that meets their needs and pay based on usage. There is a spectrum of options in between, notably tools which make the creation and on-going management of an application easier.

Many buyers will be building on an existing solution, and the process of transitioning from that solution or adding to it can be simple if the current vendor offers an acceptable transition path. The transition can also be more complex if other vendors are considered, since the legacy applications may need to be supported as the new solution is deployed and tested.

One option is to out-source an application or the voice infrastructure supporting the application. Standards and specifications such as VoiceXML and SALT allow logical and physical separation of the application code from the voice infrastructure.

Independent system integrators or consultants can help. Many technology and platform suppliers also have professional services organizations that can work from requirements or recommend alternatives, and there are independent firms specializing in voice user interface design and/or usability testing. Reports from analysts can also help with decision-making by providing insights into trends and vendors. The sidebar on "Buyers consider trends in call center applications" illustrates the interplay of trends and decision-making.

Buyers’ Guide
This month’s STM Buyers’ Guide lists vendors by category. Vendors are a good source of information as well as a resource for specific bids. They will of course tell you why their solution is best, but the criteria they use to do so can be useful in sorting out what is important in your specific case. The decision-making process can be complicated, but, in many cases, settling on the most important criteria for your company will make the process simpler than it sounds. In any case, waiting to pursue this powerful technology can be costly.

Buyers consider trends in call center applications
Call centers are embracing speech technology because it can simultaneously reduce costs while improving customer service and agent effectiveness—Doing more with less is a theme that corporate management loves. Some of the trends in this area that companies consider in their decisions follow:

  • Standards: Standards and proposed standards such as VoiceXML, SALT, and X+V allow the application to provide flexibility in choosing core technologies or migrating from one platform to another, as well as allowing the application to be physically separated from the voice and telephone infrastructure.
  • Development and application lifecycle management tools: A number of vendors feature tools that aim to simplify the development process and/or the process of managing tuning and upgrades to a service over time. Some of these tools allow assembly of an application from pre-tested dialog subroutines.
  • "Packaged" applications: While some customization or configuration is always required, companies are offering applications that are pre-developed and tested for specific tasks and vertical markets. These applications are sometimes combined with development tools to manage and monitor them.
  • Use of existing Web services, business rules, and databases: Most solutions have a way of interfacing with company databases, so that the application can be personalized or provide requested information. Some solutions specifically are based on Web technology, generating code dynamically using standard application-server technologies such as J2EE or .Net.
  • Integration with agent services: If defaulting to an agent is an option in an automated service, tight integration with software that supports agents can ease skills-based routing and present partial information collected by the automated system to the agent.
  • Out-sourcing: Options available for outsourcing or managed services are increasing. Outsourced solutions can be highly customized or almost turn-key. Some companies both outsource and maintain internal applications, depending on factors such as variation in peak demand.
  • Consolidating multiple toll-free numbers into one branded number: Natural-language call routing can make it possible for customers to find any of a company’s services without searching for the appropriate number or navigating a confusing menu of options.
  • Multimodal support: As wireless telephones get features of Personal Digital Assistants (PDAs), speech interaction can be supplemented by larger displays, keyboards, and pointing devices. Some platforms feature optional support for using those modes of interaction with speech.
  • Integration with IP telephony: Companies that plan to transition to IP telephony are requiring that the solution they choose be compatible with that evolution, and some are making the transition to speech and IP telephony simultaneously.
  • Improvements in technology: The core technology continues to improve, enhancing the effectiveness of every application. Speech recognition continues to improve its accuracy and to deal with increasingly difficult situations, such as background noise. Text-to-speech technology creates more natural synthetic speech and is usable in many more situations. Speaker verification can protect individuals and companies with little risk of rejecting a legitimate user. Searching audio files for business intelligence or specific information is increasingly efficient and effective.

For more information on TMA Associates, TVUI or Bill Meisel please visit http://www.tmaa.com/.

Go to the
2005 Buyers' Guide

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues