May 8, 2015
By Deborah Dahl Principal - Conversational Technologies
Standards

Talking to Everything: User Interfaces for the Internet of Things

An increasing number of objects can be connected and interact with other objects and with people. This Internet of Things (IoT) has certainly been the subject of plenty of hype, but its potential is real. Although some basic technology for supporting it is in place, the full potential of this world of connected objects has not quite been realized. Obviously, we need to have connectable objects in the first place, but this is well under way. According to Intel’s 2014 earnings report, the IoT is a $2.1 billion business for Intel alone, up 19 percent from the previous year.

Less obvious is the requirement for a way for people to interact with connected objects. Many connected things will communicate directly with each other, but at some point, people will need to interact with many of them. For this, there has to be a user interface, most likely in the form of an app. But app-based user interfaces (UI) pose a problem, because if there are a few hundred things in our environments that we want to interact with and each has its own user interface, the total number of user interfaces will be overwhelming. Worse, popular connected objects may have more than one UI. Right now there are more than 100 apps in the iTunes Store and Google Play Store for controlling just one type of light bulb, the Philips Hue. Finding and using the right app for all of our connected objects will be very difficult. To manage this, we need a natural, generic means of communication that doesn’t require installing and learning to use hundreds of apps.

Natural language, especially spoken language, is a good way to communicate with many kinds of connected objects. You can say, “Turn the light in the bedroom on.” Or you can say, “Is the garage door closed?” without scrolling through a lot of screens. In fact, natural language is starting to become an important user interface in personal assistant applications such as Amazon Echo, VoicePod, and other home control systems. Voice interfaces in cars, such as Ford Sync or Lexus Voice Command, provide a general-purpose user interface to sets of services in the car. Personal assistants such as Apple's Siri provide a similar aggregation function for many Web services, and it's reasonable to predict that personal assistants will eventually control connected objects as well. Before any of this can be successful, though, personal assistant APIs will have to become more open to developers, so that developers can connect the assistant to new objects and services.

The number and variety of possible connected objects makes it unrealistic to expect a single company to provide interfaces to more than a fraction of them. But right now Siri is completely closed, Microsoft Cortana only supports launching (not interacting with) an app from its interface, and Google Now's API is only open to selected partners. A few smaller personal assistant platforms—Xowi and Pandorabots—do have open APIs, and more personal assistant APIs will become open as time goes by. But just being open isn't enough. IoT interfaces will be much easier to develop if the results they produce are not only available to developers, but are in a standard format too. That way developers won't have to work with completely different API formats for all the current personal assistants and all those that may come along in the future. In fact, there is a standard for speech understanding results that would be an excellent interface between personal assistants and connected objects—EMMA (Extensible Multimodal Annotation), a standard from the World Wide Web Consortium. EMMA provides a standard way to represent not only user intents but also important metadata about user inputs, such as alternative results, confidences, timestamps, the processor used to create the results, and more.

If the output of the speech and language processing components of personal assistants were in the form of EMMA, developing new interfaces to connected objects would be much simpler because developers would only have to work with EMMA-formatted results. Combining natural language interfaces with a standard output format such as EMMA will dramatically accelerate the growth of the IoT by making it easier for users and developers to connect with the enormous variety of current and future connected objects.

Deborah Dahl, Ph.D., is principal at speech and language consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group. She can be reached at dahl@conversational-technologies.com.