Controlling Speech and Multimodal Applications

Article Featured Image

Many applications, including spoken dialogues, can be viewed as progressions through a set of states. For example, a voice-enabled airline reservations application might, while interacting with the user, go through these states: identifying the departure and destination cities, selecting the dates of travel, selecting flights, collecting payment information, charging a credit card, and finally finding out if the user needs other services, like a rental car or hotel reservation. An application might be designed to do two or more operations in parallel, such as presenting the user with special offers while the flight database is being searched, or, in the case of a multimodal application, simultaneously speaking to the user and displaying a map. As an application’s workflow becomes more complex and starts to include several modalities, it becomes increasingly useful to explicitly model the flow through the application. 

State Chart eXtensible Markup Language (SCXML) is a new language being developed by the Voice Browser Working Group of the World Wide Web Consortium for modeling and controlling applications, including speech and multimodal applications, through the use of states. To get a feel for SCXML and its capabilities, I talked with Jim Barnett, a member of the architecture team at Genesys Telecommunications Laboratories and editor-in-chief of the SCXML specification.

DD: What is SCXML?
Barnett: It’s a work-flow language based on Harel statecharts. It can be used for describing any kind of a work flow, but we’ve usually had speech applications and VoiceXML 3.0 in mind while we’ve been developing it. 

DD: What capabilities does SCXML add to speech?
Barnett: SCXML doesn’t have built-in speech capabilities. It’s designed as a pure work-flow language; the platforms provide extensions or the capability to call other languages to provide things like speech recognition. SCXML gives speech apps a sophisticated flow-control capability. For example, SCXML makes it easy to do several things in parallel, so you can be interacting with the caller in the foreground and validating his PIN in the background at the same time. SCXML also has a useful go-back capability that lets you back up and start over if the interaction goes astray.   

DD: Can SCXML interoperate with VoiceXML?
Barnett: In general, yes. SCXML is designed to be able to call out to any other language. But existing VoiceXML interpreters most likely won’t be built to handle it. That will change as SCXML matures and companies update their product lines. By the time VoiceXML 3.0 comes out, SCXML and VoiceXML should work together out of the box.  

DD: Is SCXML limited to voice applications? 
Barnett: SCXML can be used for describing pretty much any process. When we started getting public feedback on early working drafts, we were surprised at how people were using it: for controlling medical devices and similar applications. On the other hand, there’s plenty of work on voice applications as well. Professor Torbjörn Lager and his students at Göteborg University in Sweden are doing a lot of research on dialogue, games, and the like. 

DD: What SCXML implementations are available?
Barnett: An open-source implementation is available in the Apache Commons  and another at the University of Göteborg in Sweden. Intervoice has a commercial implementation, and several other companies are working on products.

DD: What tools are available for SCXML developers?
Barnett: A standard XML editor is your best bet. Any commercial implementation will come with tools, but I don’t know of any open-source development tool that is freely available right now. 

DD: What skills are needed by SCXML developers?
: A basic understanding of state machines is all you really need for the SCXML part of it. After that, you’ll need to understand the domain you’re working in, but that will depend on your application. 

DD: When will the SCXML standard be finished?
Barnett: Beats me. Standards work is slow. It’ll be a couple of years before it’s wrapped up with a bow around it. But that isn’t stopping people from using it and even building products with it. The core of the language is pretty stable.

DD: How can Speech Technology readers learn more or get involved with SCXML?
: Check out the latest working draft at www.w3.org/TR/scxml/, then start playing around with the Apache or Göteborg implementation. There’s also a public mailing list that the W3C maintains at www.w3.org/Mail/Lists.

Deborah Dahl, Ph.D., is the principal at speech and language technology consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group. She can be reached at dahl@conversational-technologies.com.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues