Rob Kassel, Product Marketing Manager of Emerging Technologies, SpeechWorks; and Albert Kooiman, Director of Business Development, Philips.
Q What are your goals with SALT?
A-Kassel SpeechWorks sees multimodal applications, which combine speech input and output with graphical displays, as an emerging market for our speech recognition and text-to-speech technologies. We are working with several partners on projects to realize the powerful, ubiquitous information access that multimodal applications promise. We expect the SALT specification will provide a standard means of writing multimodal applications that run on a wide variety of devices, both existing and new, thus speeding development while encouraging adoption.
A-Kooiman The SALT Forum aims to create a an open, royalty-free, platform-independent specification to extend existing Web markup languages (in particular HTML and XHTML) for speech input and output in telephony and multimodal applications. These applications run on a wide variety of devices, from traditional servers to handhelds such as mobile phones or PDAs.
For Philips, the SALT specification furthermore serves its goal by setting a standard for building solutions that not only uses existing and widely adopted Internet standards for application creation and system integration, but also allows Philips to deploy advanced Natural Language Processing (NLP) concepts, like natural language. With NLP, people can speak spontaneously, hesitate and correct themselves, instead of having to stick to a fixed sequence in the sentence spoken. Users can also take advantage of natural dialogue, whereby people are not forced into fixed menus and can take initiative to define the flow of the dialogue.
Q VoiceXML is currently being supported by many companies, why was the development of SALT necessary?
A-Kassel The SALT specification is designed specifically to work harmoniously with other markup languages that will be used in multimodal applications. This allows SALT developers to seamlessly blend speech interfaces with visual interfaces.
A-Kooiman The SALT specification has broader goals and runs in different environments. Instead of defining a new language with a new execution model and semantics, as in the case of VoiceXML, the SALT specification is a set of lightweight extensions that can be integrated into a variety of markup languages and run on very different execution environments. Thus, the application does not have to be changed for each modality, or each type of client device.
In addition, Web developers do not have to learn a new language, so those developers that have adopted the SALT specification do not need to specify and implement completely new browsers.
This model supports speech input as well as other input and output modalities. The multi-modal and multi-device approach, together with the concept of re-using existing Web execution environments, makes the SALT specification a perfect fit for the Web world.
Because VoiceXML and the SALT specification differ in the form of the markup, programming and execution model, and the level of the programming interface available to the developer, the two specifications can both exist in their own right. To a certain extent, VoiceXML is a logical evolution from the proprietary IVR application creation environment to an XML-based application language that is portable across platforms.
However, this portability has tended to make the VoiceXML execution model, with as its core the Form Interpretation Algorithm (FIA), solely server-based, as it can make browsers "heavy." It also restricts the NLP technology Philips already offers.
These lightweight extensions to existing markup languages in combination with the intelligence written within scripts such as ECMAscript or Java, make SALT-enabled technology an attractive and powerful option. Since NLP technology can run on an Internet Application Server, as well as re-use a high degree of existing Web-based application code, speech becomes simply an added modality.
It is important to note that the SALT specification incorporates a great deal of development work from current standardization efforts originating in the Voice Browser Group of the World Wide Web Consortium (W3C), mostly in the area of speech. Examples of this are the grammar formats for automatic speech recognition (ASR) and text-to-speech (TTS).
Also, the Call Control implementation of the specification allows for a fully object-oriented approach with libraries provided by card manufacturers in combination with existing XML call control standards, such as the XML format of the CSTA specification (ECMA-323).
Q What is the next phase for SALT?
A-Kassel The SALT Forum is working diligently to complete the SALT specification by mid-year as planned. Once that is done, the Forum intends to submit the SALT specification to one or more international standards bodies in the hopes that it will eventually become a formal open standard.
A-Kooiman The Technical Working Group (TWG) of the SALT Forum has nearly finished its proposed 1.0 specification, which is planned to be published by midyear.
The intention of the Board of the SALT Forum is to submit this 1.0 specification to one or more of the most appropriate standards bodies. In doing so, the SALT Forum strives for the widest adoption possible by receiving and incorporating input from the broadest array of organizations committed to furthering the specification.
The SALT Forum will begin encouraging adoption of the SALT specification by establishing an Adopters program that will allow organizations to work with the specification on a royalty-free basis.
The TWG will continue to serve as group in which Adopters can help develop a common vision and strategy with regards to the specification. Furthermore, the TWG will serve as the forum for creating new SALT profiles, providing more sample code and acting as an exchange for clarifications as well as bug fixes.
The specification beyond Version 1.0 will of course be part of the standardization effort mentioned before.
All work of the TWG will be shared for educational purposes on a developer web site that plans to include tutorials, FAQs, demos, sample applications, code libraries etc.
In the longer term, the TWG will look at conformance testing, and every six months, we will take stock of the situation and re-charter ourselves towards new goals if necessary.
However, Philips recognizes that the most important metric of the SALT specification's success is implementations of solutions based on the SALT 1.0 specification. To this end, Philips is working on products and solutions to be deployed in the field as early as this year.
Q What types of comments are you getting on Version 0.9 of the SALT specification?
A-Kassel The comments we've received on the preliminary SALT specification have generally been favorable. The SALT specification uses existing and popular draft standards whenever practical, and some were pleasantly surprised to see that approach. Several have commented on the minimalist, elegant design that speeds implementations. Many strong supporters of VoiceXML have commented that there is much merit to the SALT specification.
A-Kooiman Philips has received a great deal of positive feedback on both the basic principles of the specification and the ease of upgrading from telephony applications to multimodal applications using the SALT specification.
From its partner organizations, Philips has gotten many questions around clarification, namely centered on the advantages of the SALT specification over VoiceXML. We've also received questions on technical details concerning the SALT elements and objects, such as how it affects existing platforms, prompting mechanisms and call control.
The SALT Forum has received feedback from the Contributors that joined the group after Version 0.9 was published. The feedback has been discussed in the TWG and will be reflected in the 1.0 specification.
Q Much has been made about SALT being royalty-free, why do you think this is important?
A-Kassel Royalty-free standards are easier for companies to embrace. The more companies that embrace a standard, the healthier the resulting market will be. The healthier the market, the easier it is for customers to embrace the underlying technology.
A-Kooiman As far as the SALT Forum Members are concerned, we feel that offering the SALT specification on a royalty-free basis will assure the widest possible adoption.
Q What does SALT mean for enterprises and service providers who would deploy speech solutions?
A-Kassel The SALT specification will enable an entirely new class of speech solutions with unprecedented appeal. Handheld devices will soon feature wireless connectivity for both data and voice along with high-resolution color graphic displays. Consumers will come to expect continuous access to the Internet through such devices, and they will expect easy access using spoken commands rather than a keyboard. The SALT specification is a critical piece of the infrastructure to support this vision.
A-Kooiman The SALT specification, as noted, is based on two principles: it is a lightweight extension of existing mark up languages, and it uses scripts for all application logic. This has many positive effects for the enterprise - most notably, an enterprise organization can take an existing Web or WAP application and easily enhance it with speech in- and output.
This will likely support high re-usability of the application code of other modalities. While the user interface will be different, most application logic can be re-used.
From the perspective of the solution providers, applications and platforms become increasingly de-coupled. Solution providers will likely begin investment into each of these two elements independently because large, proprietary, "lock-in" systems will become less attractive. The telephony server, for example, will contain less and less applications, for they will run on the Internet application server.
Q What are some of the problems SALT was designed to address?
A-Kassel The SALT specification was specifically designed to address the problem of accessing web content via spoken commands. This is applicable to the desktop, but it is necessary in handheld devices and automotive applications.
A-Kooiman Different clients (e.g. PDAs, mobile phones, desktops) have very different capabilities. In our increasingly mobile world, it is important to consider these various devices. Because speech is just one modality among many, the SALT specification was designed to expand its focus beyond speech to integrate these various other modalities.
For the most part, multi-modality requires an asynchronous execution model. The SALT specification supports this, but it needed to be designed to run in very different environments like SMIL that do not support scripting abilities. All this had to be fulfilled by re-using as many existing Web standards as possible.
Q Describe your version of multi-modality and why is multi-modal, multi-device an important concept?
A-Kassel Multimodality is ultimately the ability to access any information in any situation. For example, when in a meeting, you might want to view e-mail messages on a screen and file them by pressing on-screen buttons. However, while driving you might prefer to hear messages read to you and file them by speaking commands. Multimodal applications allow users to complete more tasks, no matter where they may be, in the manner that suits them best.
A-Kooiman Philips wants to be a high growth technology company that plays in a space we have named Ambient Intelligence. Ambient Intelligence is defined by 'ubiquitous computing,' with electronic notepads and whiteboards everywhere, swarms of embedded micro devices and walls painted with 'smart electronic dust'. What allows these different, distributed technology components to be everywhere, all the time is for them to work together transparently.
This can only be useful to people if ubiquitous computing devices have intelligent and social user interfaces that are multimodal by design and easy to use. Speech interfaces can be personalized and are able to reflect the emotion and experience of its users.
The SALT specification allows users to build these intelligent and social user interfaces with speech in- and output and embed them into the glue that brings all those components together: XML.
Q Why is it important to use event-wiring models for Web developers?
A-Kassel Speech is inherently linear in that effective spoken communication requires turn-taking by all participants. Conversation structure simplifies how speech-only applications are built today. However, multimodal applications allow speech to be interspersed with other actions, and these must be integrated to form a correct response. An event-driven model simplifies supporting such free-form interaction.
A-Kooiman The telephony world is mostly asynchronous. Consider, for example, call control: messages are sent and received to the call control module asynchronously. This is also true for multi-modal environments. Different modalities are integrated by messages being passed back and forth that can come at any point in time from any modality. The Web event model adopted by the SALT specification is perfectly suited to fulfill these requirements. Events are messages, and the application can react to these messages asynchronously by defining event handlers. Most importantly, this model is well known to Web developers - every HTML programmer is familiar with this technology.
Q What is the estimate of the total numbers of developers SALT could impact?
A-Kassel Every web developer in the world.
A-Kooiman Given the fact that developers of speech-enabled applications are currently working with proprietary IVR technology, VoiceXML and now the SALT specification, they will all have to get a better understanding of the design principles and inherent advantages and disadvantages of each. This affects tens of thousands of people.
The writing of SALT applications on one hand will become mainstream by its inclusion in development tools like Visual Studio. The SALT specification will also create an environment for professional application developers with a lot of know-how on adding speech to professional web applications.
To this end, Philips prepares to provide the necessary services and tools to its partners to create these professional applications for its carrier customer base as well as its enterprise customers.
Q How does SALT enhance the user experience?
A-Kassel The SALT specification gives the user a more powerful interface that is also more natural. Have you ever known what you wanted to accomplish in an application but not known how to perform that task? The SALT specification solves that problem by giving control through simple spoken commands.
A-Kooiman It will provide a key element for bringing speech technology into the mainstream.
Companies and Suppliers Mentioned