March 29, 2002
By James A. Larson program co-chair, SpeechTEK 2021
Forward Thinking

Building User Interfaces for Multiple Devices

Users can choose from among several devices to access the World Wide Web. These devices include PCs with a visual Web browser, such as Netscape's Navigator or Microsoft's Internet Explorer, to interpret HTML files downloaded from a Web server and executed on the PC; telephones and cell phones with a verbal browser to interpret VoiceXML files downloaded from a Web server and executed on a voice server; and WAP phones with a visual browser to interpret WML files downloaded from a server and executed on the WAP phone. Additional devices, expected to appear shortly, include PDAs with wireless connections to a server that supports both visual and audio interfaces, wearable devices such as a display and microphone on a wristband and a speaker in the user's ear that will support both visual and audio interfaces, and other devices or combination of devices that support visual and audio user interfaces.

How can developers create a single application to support these various devices? By separating the application implementation from each user interface implementation. There exists an architecture to support multiple devices with differing modes of input/output. This architecture consists of:

A database—Enterprise information available to multiple applications. The database is a permanent repository for data accessed by multiple applications.
An application—Business logic that manipulates and manages data in the database. The application contains the business rules and logic for accessing the database, calculating information, and performing application-specific functions.
Multiple user interfaces—A user interface specific to each of the diverse devices for accessing the application by users. Each user interface supports user input (keyboard, speech, handwriting, and selecting), output (display and sound), and dialog (system-directed dialog in which the user answers questions, user-directed dialog in which the user enters commands or mixed initiative dialogs which are a combination of user-directed and system-directed dialogs).

Each user interface supports different requirements. For example, a telephone user interface may require a system-directed dialogue in which the user responds to a series of questions, while a PC user interface may be user-directed in which the user selects and initiates actions. To isolate the application from each of the user interfaces, the application should support a single data structure that can be used by each user interface. One such format uses XML. Data expressed in the XML format is translated to the format required by each user interface. For example, consider a flight query application in which the user requests the arrival time and gate for a specific flight. This data could be described using XML tags as illustrated in Figure 2. The arrival information can be extracted, translated, and presented to the user as one or more controls. A control (sometimes called an interactor or widget) is a technique for presenting information to the user and/or soliciting information from the user. Different devices use different controls. For example,

The PC user interface displays an HTML table — a two-dimensional array of values.
The telephony application presents the four information items as a VoiceXML verbal prompt—a verbal message created by a speech synthesizer or replayed from a voice file.
The WAP application presents a card—a collection of information that is displayed on the small screen of a WAP device

A multimodal application presents an animation and a speech prompt. Controls are the basic components of a dialogue. A dialogue enables the user to perform a task by interacting with the application using a sequence of controls. Some dialogues present controls in parallel—such as the simultaneous display of multiple tables on a PC screen. Other dialogues present controls serially to the user—for example, a sequence of verbal questions and answers in a telephony user interface. To design each of the user interfaces, the designer must perform the following four activities:
1. Extract the data to be presented via the user interface.
The designer determines that the flight number, departure airport, arrival time and arrival gate should be extracted from the application for presentation to the user.
2. Select controls appropriate for the user interface.
Each device supports controls that are specific to its user interface. Continuing with the example above, the designer selects four different presentation controls for each of the four devices. The designer determines that a table is the appropriate control for presenting information on a PC screen. A verbal prompt is used to present the information verbally to a telephone user. A card is appropriate for presenting the information to the WAP telephone user. The designer creates an animation showing red dotted lines indicating the direction the user should follow, and a verbal message presenting a the flight number, departure airport, arrival time and arrival gate.
3. Construct a dialogue by combining controls.
For this example, the sequence of control presentation is elementary for the PC and WAP users. A single table is presented to the PC user and a single card is presented to the WAP user. Verbal interfaces require that information be sequential. Designers determine to present the flight number, departure airport, arrival time, and the arrival gate number in a VoiceXML prompt. The multimodal user interface displays a moving dotted line in addition to seeing a visual message in a box while hearing a verbal message.
4. Add media specific "decorations" to the data to be presented via each user interface.
Each interface is unique in the decorations it presents to the user. Decorations—control-specific specifications such as font size and color of the HTML table, gender and pitch of the synthesized voice of the VoiceXML prompt, color of the animation of the multimodal control—make the presentation pleasant for the user, while emphasizing the extracted information. Figure 7 summarizes the four activities for designing each of the four user interfaces.

	Devices
Activities	PC	Telephone	WAP	MM
1. Extract	Flight Number Arrives From Arrival Time Arrival Gate	Flight Number Arrives From Arrival Time Arrival Gate	Flight Number Arrives From Arrival Time Arrival Gate	Flight Number Arrives From Arrival Time Arrival Gate
2. Select Controls	HTML table	VoiceXML prompts	WAP card	Multimodal animation and voice prompts
3.Construct the dialog	Display a single table	Present a verbal prompt	Display a card	Display a multimodal animation and voice prompt
4. Add decorations	Color, font size, font type and position of the table on the screen	Voice, volume and speaking rate	Card format	Graphics for animation, voice, column and speaking rate for the prompt

The extracted values are transferred from the application to each of the user interfaces, or a complete user interface is generated from the extracted values. Using current Web technology, it is possible to generate the user interface dynamically from XML information supplied by the application. For example, Extensible Stylesheet Language for Transformations (XSLT) is a W3C markup language used to specify translations. Active Server Page (ASP) or another dynamic page generation facility can be used to generate the user interfaces illustrated in Figures 3-6. With the careful separation of the user interface dialog and input/output controls from an application, developers can create multiple user interfaces for different devices that enable users to access the same application. Each user interface has its own dialog and user interface controls. As new types of devices become available, designers only need to create new controls and the corresponding transformation so the new device can access the existing application without modifying the existing application. In effect, the XML format isolates the application from the various user interfaces and also isolates the user interfaces from each other.

Dr. James A. Larson is chairman of W3C Voice Browser Working Group. He is the author of "Developing Speech Applications Using VoiceXML" and teaches courses in user interfaces and speech applications at Portland State University and Oregon Health and Sciences University. He may be contacted at http://www.larson-tech.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Building User Interfaces for Multiple Devices

Voice Deepfake Fraud Surged 1,300 Percent

Sanas Unveils Simultaneous Real-Time Speech-to-Speech Translation

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API