SCXML—A New Language for Controlling Processes

Designers have long used state transition systems (STSs) to represent processes.  Basically, a STS consists of a set of states (usually represented as ovals) and transitions (represented by arcs) that connect pairs of states.  Each transition has a condition that, when satisfied, causes the process to proceed from the state at the tail of the arc to the state at the head of the arc. 

About 15 years ago logician David Harel extended STSs in three important ways1:

  • A state may itself contain a STS.  Thus, designers may represent a hierarchy of STSs.
  • Two or more STSs may be processed in parallel.  Thus, designers may represent parallel processes.
  • A state may maintain history.  If the state is re-entered, it may access the information available when the state was previously exited.

Figure 1.  Example SCXML code fragments that invoke VoiceXML Application 1 and then, conditionally invokes VoiceXML Application 2.
Software developers have used statecharts (as Harel’s notation came to be known) to represent software designs in UML (Universal Modeling Language2).  In July of 2005, the W3C Voice Browser Working Group published the first working draft of State Chart XML3 (SCXML).  This language is a candidate for representing various types of processes including:

  • Speech dialogs - For example, each state represents a prompt to be presented to the user and each transition represents a response spoken by the user.  The Voice Browser Working Group is considering using SCXML as a control language for speech user interfaces.
  • Call control flow - For example, each state represents a state of a telephone call (idle, connecting, connected, hang up, and so forth).  The Voice Browser Working Group is considering using SCXML as an event processing language in a future version of CCXML.
  • Multimodal user interfaces - For example, each state represents information presented to the user via speech or display, and each transition represents information entered by the user via speech, keyboard, or stylus.  The Multimodal Interaction Working Group may use SCXML as a conversation manager.
  • Metalanguage - For example, developers represent each software process as a state and represent the conditions from switching between processes as transitions. 

SCXML has a visual representation, so dialog designers can drag and drop ovals and arcs to represent control structures, and add annotation to represent conditions, named processes, and parameters.  A possible visual representation and the corresponding textual representation is shown in Figure 2. 

Figure 2.  Graphical representation of SCXML code from Figure 1.
SCXML unifies the process control structures of several languages, including VoiceXML, CCXML, mulimodal user interfaces, metalanguages, as well as traditional high-level software design.  In the future, speech application designers may use SCXML to specify the control structures at various levels of speech and mutimodal applications.
1 Horel, David, “Statecharts: A Visual Formalism for Comlex Systems,” http://www.wisdom.weizmann.ac.il/~dharel/SCANNED.PAPERS/Statecharts.pdf
2 Rumbaugh, James, Ivar Jacobson, and Grady Booch, Unified Modeling Language Reference Manual, 2nd Edition, Pearson Education, 2004.
3 W3C Voice Browser Working Group Web site http://www.w3.org/voice
Dr. James A. Larson is manager of advanced human input/output at Intel Corporation and author of The VoiceXML Guide. He can be reached at jim@larson-tech.com and his Web site is http://www.larson-tech.com .
SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues