4 Speech Technology Standards That Need to Happen ASAP
It’s an understatement to say that speech, natural language, and multimodal technologies are rapidly evolving. New development tools like Amazon’s Alexa Skills Kit, Nuance Mix, and IBM Watson appear on a regular basis. Many basic language technologies are also becoming available as open-source code. For enterprises that want to develop conversational applications, this rapidly developing ecosystem raises important questions: How can these tools work together? If our platform doesn’t meet our requirements, what do we have to do to change to a different platform?
Right now, there aren’t good answers. Making systems work together means developing integration code, and migrating to a new platform means doing a possibly significant amount of rework. It doesn’t have to be this way.
Some argue that standards stifle innovation with new technologies. But I believe standards do the opposite by factoring out unimportant variations between systems and enabling developers to focus on creating new functionality rather than reinventing the wheel. When technologies are new, developers explore a variety of solutions to common problems; soon it’s obvious the solutions have similar features. This realization points to a place for a standard.
When this realization hits a technical community, what’s next? The first step is for the community to talk over common problems. The World Wide Web Consortium (W3C) sponsors workshops on emerging technical issues related to the web. The W3C also offers free, easy-to-start community groups that can focus on any web-related topic. This initial discussion doesn’t have to be sponsored by a formal standards organization. Technical conferences are also a good place to get started. These might include satellite workshops, birds-of-a-feather lunches, or even just hallway conversations. Once it’s clear there’s interest, a more structured group can be organized. These groups can publish notes or white papers to get feedback about their ideas. If some participants are willing to implement open-source reference code, it can give an emerging standard a huge boost.
There are quite a few interesting areas for potential community agreement on aspects of speech technology. Here are four.
2. Semantic representation. This is closely related to natural language tool kit output but is concerned with representing actual meanings as opposed to creating a common output format. An effort addressing this problem is currently sponsored by ISO, and an “Interoperable Semantic Annotation” workshop was due to take place at the Conference on Computational Semantics.
3. Better defined dialogues. VoiceXML was a groundbreaking standard that is still widely used in IVR applications, but technology has moved a long way since VoiceXML was standardized. Dialogues defined in VoiceXML generally assume an underlying speech grammar that defines expected inputs in a way that doesn’t take advantage of the capabilities of current speech recognition technology. On the other hand, current proprietary approaches are also limited and don’t go much beyond simple question-response interchanges. Can we create a dialogue standard that supports a richer dialogue style, taking better advantage of current speech and language technologies? The W3C Voice Interaction Community Group and the AVIOS Advanced Dialog Forum are currently looking at this problem.
4. Interoperable intelligent agents. The Voice Interaction Community Group is also discussing how intelligent agents could interoperate. For example, how does a generic agent, such as Siri or Alexa, find a specific enterprise agent (like your bank’s customer support agent) that can help you with a specific problem?
The key component to a successful standards effort is a group of dedicated individuals who are deeply motivated by a desire for a standard that addresses problems without limiting new ideas. If you are intrigued by any of these ideas, or if you have other ideas for standards that can solve a problem you’re facing, join one of the discussions listed above, or start your own.
Deborah Dahl, Ph.D., is principal at speech and language consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group. She can be reached at firstname.lastname@example.org.
For natural language tools to take on greater complexity, they'll need consistent, agreed-upon data annotations