Speech Technology Magazine

Voice XML Brings IVR into the Future

Continuing demand gives these applications an impetus for improvement.
By James A. Larson - Posted Apr 30, 2014
Page1 of 1
Bookmark and Share

Customers access interactive voice response (IVR) systems via smartphones, cell phones, and POTS (plain old telephone service) phones. They help customers avoid long wait times to get information and perform transactions. VoiceXML was designed to create IVR applications to replace printed, static user guides and outdated instructions, and automate much of the work performed by human agents in call centers. While many find IVR interactions slow and tedious, companies have used IVR applications—most of them written using VoiceXML—to decrease customer support expenses.

Alternatives to VoiceXML

The Internet provides many options that allow customers to obtain information and perform transactions: Web sites containing FAQ pages, blogs and forms, Web-based interactive applications, interactive text-based chat, and social media sites on which company products and services are discussed. Smartphone applications have enabled customers to do much of the same using graphical and touch-based applications. Many customers feel that user-directed, visual smartphone applications are faster than system-directed voice-only IVR applications.

What will happen to VoiceXML applications? As fast as the smartphone market is growing, many consumers still don't have one and will continue to use VoiceXML applications. Smartphone owners without the appropriate applications will use VoiceXML applications. VoiceXML applications will be around for a long time.

A New Life for VoiceXML Applications

Voice XML applications are not perfect. Following are a variety of ways developers can improve them:

  • By displaying text from VoiceXML prompts on the smartphone screen, and capturing text typed by the customer, IVR applications become a type of automated text chat. With text chat, users can switch activities or pause them, then pick up where they left off when ready.
  • Developers can make VoiceXML applications visual by using smartphone visual controls in place of voice. Jacada's Visual IVR presents each field of a VoiceXML form as a visual widget. Jacada displays a VoiceXML date field as a date picker, a digit field as a data box that only accepts digits, and a VoiceXML menu as a visual menu.
  • Developers can convert single-moded voice-based VoiceXML applications into multimodal applications in which users read and listen, speak and type. They can continue to use the speech synthesis engine and speech recognition engine provided by VoiceXML processors augmented with the visual chat or visual widgets. Customers can read or listen to prompts and type or speak responses, or switch modes when appropriate.
  • Developers can integrate the VoiceXML application with other channels. They can download a fax instead of listening to a lengthy list of instructions, or download a video that demonstrates a complicated process.

Developers should also consider at least two types of enhancements to VoiceXML code for use on smartphones: enabling phrase changes and allowing callers to use smartphone functions.

People sometimes use different phrases when writing and speaking. Some text phrases used in VoiceXML might be changed when presented as text on the screen. For example, the introductory prompt Welcome to the Ajax automated bank teller system might be changed to simply Ajax Bank Teller to save space on a smartphone screen.

Rather than asking the user to enter his location, applications should be able to use the GPS function on most smartphones to obtain the user’s coordinates and then access a database to obtain his current address.

While these changes improve the user interface, the code for the IVR and smartphone applications might diverge, resulting in the problem of maintaining two sets of similar code.

Maintaining the VoiceXML code on a server enables all customers to remotely access the most recent version of the application. When the application is made available for installation on customers' smartphones, those customers might avoid some of the latency problems of accessing remote applications, but at the cost of possibly not obtaining the most recent information. Consider a hybrid application in which time-sensitive data is located on a server and much of the processing is performed on the smartphone.

The investment in VoiceXML code made by businesses and the knowledge and skill of developing VoiceXML applications by developers will not be wasted. The additional skills of developing visual applications on smartphones will enable developers to design even more useful VoiceXML applications.


James A. Larson, Ph.D., is an independent speech consultant. He is co-program chair for SpeechTEK. He also teaches courses in speech user interfaces at Portland State University and the Oregon Institute of Technology. He can be reached at jim@larson-tech.com.


Page1 of 1