As we enter the 'immersive age,' we need to prepare for a whole new way of interacting
A historian may divide the history of machine-human interaction into three general time periods. In the industrial age, operators manipulated physical levers, wheels, and switches and observed gauges, signals, and the machine itself. Operators usually stood next to the machine as it ran or occasionally inside of the machine (e.g., a car or truck). Next came the information age, in which operators interacted with electronic representations of the machines by typing, clicking, or touching a screen; usually operators worked at a PC or similar device, which was separate from the machine.
Recently, we’ve begun the immersive age, in which operators interact with smart objects connected to the internet. These smart objects can be controlled by gesturing and speaking; for example, a user might bring up a Dancing with the Stars episode by simply speaking to the TV. The manner with which people interact with machines, then, will be dramatically different from the earlier ages.
How Will the User Interface Operate?
In the immersive age, users speak requests that trigger one or more smart objects to respond. Sometimes, operators will gesture, such as moving their hand upward or downward to increase or decrease TV volume. When operators cannot remember the set of functions to be invoked, a screen will display a menu of options and parameters; operators either can speak or point to the desired one. For example, if you want to turn off lights, you might say “Turn lights off” and indicate which lights by touching light icons within a house layout. Then, you can observe which lights were turned off by walking from room to room or by monitoring the light icons within the house layout. Only as a last resort will a keyboard appear onscreen so the operator can enter information by “typing.”
One Controller or Many?
In the immersive age, the “controller” refers to a device with a screen, microphone, and speaker that is used to operate other devices, systems, and appliances. Today, there are many such controllers in every house. Some are in a permanent location, such as a thermostat mounted on a wall or a garage door opener mounted outside the house next to the garage door. Others are mobile, such as smartphones or TV remote controls.
Will one of today’s controllers evolve to become the controller for all of the home’s smart devices? It is unlikely that a single controller will be able to control more than a handful of smart things. Either the display will become overloaded with icons and labels or become too complex for operators to navigate. But if the controller has a microphone, operators can speak the name of the smart thing (or group of smart things) and either speak or touch an icon representing the smart thing’s parameters. For example, an operator might say “turn lights off” and select the desired light icon, or say “adjust TV volume” and touch a slider to indicate where the volume should be. Using a single controller also avoids operator confusion from having multiple controllers lying around the house.
Will the Controller Disappear?
Many homes have security cameras and smoke detectors in various locations and rooms. Each room could also have one or more microphones, screens, and speakers. As users move from room to room, they could simply speak the name of devices they want to operate, select the appropriate parameters, and see the effects: Lights dim, the garage door opens, the oven turns on. In effect, the controller is the house and the operators live inside the controller.
For this immersive user interface, designers should consider the following:
• It needs to be conversational so that users won’t have to memorize the names of smart devices or their functions.
• It needs to be extensible so that new smart devices can be easily added.
• It needs to be knowledgeable enough to recognize pronouns and their references (“Turn on the TV, turn its volume up”) and when the operator has shifted topics, as well as other complex dialogue structures.
• It needs to be trainable, to interpret new verbal instructions and parameters.
• It needs to include modalities such as touch, haptics (vibrations, pressure), 2-D gestures (swiping a screen), 3-D gestures (nodding head or waving hands), location, and orientation.
In short, these user interfaces should enable natural language and multimodal interaction with smart devices. Are you ready to move forward into the immersive age?
James A. Larson, Ph.D., is program chair for the SpeechTEK 2017 Conference and teaches courses in voice applications and user interfaces at Portland State University in Portland, Oregon.