Speech Finds a Home
In Star Trek: The Next Generation, Captain Jean-Luc Picard enters his quarters and tells the ship's computer to adjust the lights, prepare a cup of hot Earl Grey tea, play a Klingon opera, and open a video conference with an admiral at Starfleet Command in San Francisco—all with simple voice commands. His requests are carried out flawlessly, and not just because he's the guy in charge of the Enterprise. The ship's computer responds just as well to verbal commands from an ensign.
That level of man-machine interaction isn't entirely possible just yet, but industry insiders agree we're pretty close.
"The technology is here today. [This] kind of futuristic scenario could be days or weeks away," says Bill Scholz, president of the Applied Voice Input/Output Society (AVIOS) and president and founder of NewSpeech Solutions, a speech technology consultancy.
Gary Clayton, chief creative officer at Nuance Communications, agrees. "From a pure technology standpoint, we're already there, and have been for quite a while," he says.
"Star Trek is a very realistic paradigm," adds Todd Mozer, CEO of Sensory, a provider of embedded speech technologies for consumer products.
Speech interfaces not only exist today, but they are widely popular for everything from smartphones and car telematics to TVs and video game consoles. Experts agree that it's just a matter of time before they spread to other devices and applications, both in the consumer and business worlds.
On the consumer front, if a research project underway in Europe is successful, the prototype of a fully voice-enabled home—complete with voice-enabled lighting, thermostat, security system, and a host of everyday appliances and mobile devices—will be completed by 2015, well ahead of Star Trek creator Gene Roddenberry's vision. That three-year project, which launched in January 2012 with $6.3 million in funding from the European Union, is called Distant Speech Interaction for Robust Home Applications (DIRHA). It involves research and development around multichannel acoustic processing, speech recognition and understanding, speaker identification and verification, near-field and far-field speech processing, and spoken dialogue management. It even considers power consumption and energy efficiency.
DIRHA's goal is to create a pervasive microphone array so users can say what they want from anywhere in the house and have their requests recognized and understood. The project also seeks to make it possible for the system to identify and capture an individual speaker from several yards away, in a crowded room, and with music playing.
The DIRHA project has the potential to not only dramatically change the way people interact with technology, but can make a real difference for those who can't easily move around, such as the elderly or disabled. In addition to the home scenarios, the distant-speech interaction systems can find use in robotics, telepresence, surveillance, video conferencing, and industry automation.
A part of this project would be the development of systems that do not rely on the user to push a button to initiate a voice command. Rather, the system would need to be in a constant listening mode.
"We shouldn't have to push buttons to use the speech recognition," Mozer says. "The whole purpose of speech in this case is so we would not have to use a lot of button pushes."
His company, Sensory, has been at the forefront of development in this area. Its TrulyHandsfree Trigger is a voice technology that makes devices and applications come alive—or "wake up"—with a spoken word or phrase.
Another company that is tackling this problem is Conexant. It, too, has created wake-up technology for speech-enabled devices. "Power has been a big issue. If the device has to be on all the time, it can really run hot and burn a lot of power," says Saleel Awsare, the company's vice president and general manager.
"Our equipment scans for a human speech pattern, and when it detects one, it brings up the speech engine to listen for a command, and then it activates the appliance or device to carry out the command," he explains.
But despite all the advances and ongoing research, a fully speech-enabled home is still a very ambitious project. That's not to say that it can't be done. In fact, many people think research and development will gain momentum as the economy improves. "As housing construction starts up again, you'll see more [smart-home technology] in the development plans," predicts Jim Larson, an independent consultant and professor at Portland State University in Oregon.
Larson sees smart-home technology playing out especially well in the senior living environment, leading to a whole host of applications that can help the elderly lead independent lives and stay connected with loved ones.
These technologies, he says, can provide greater security, convenience, and connectivity, and can even be instrumental in monitoring and reporting on daily activities and healthcare regimens. The same system can, for example, keep track of a person's medicines and alert him when he forgets to take a pill at the designated time.
Deborah Dahl, chair of the Multimodal Interaction Working Group at the Worldwide Web Consortium (W3C) and principal at speech and language consulting firm Conversational Technologies, shares Larson's enthusiasm. "It's been a dream for so long for the disabled, the elderly, and the just plain lazy to be able to control the lights, heat, air-conditioning, TV, etc., without having to move around a lot," she says. "There's no reason that we couldn't have speech as a central interface. It makes a lot of sense [to have] one speech interface so we wouldn't have to worry about so many different dials, knobs, and buttons."
But for the speech-enabled home to meet with any success, it will need to gain consumer acceptance, something Bill Meisel, president of TMA Associates and an executive director of AVIOS, thinks will not be easy. "Yelling across the room, while possible, is not something people are comfortable with," he says. "The whole idea of walking around the house talking to the walls does not appeal to a lot of people."
Perhaps even more problematic will be getting speech vendors, system developers, appliance manufacturers, and others to come together on a set of standards. Right now, all the work has been centered on individual devices and appliances, with nothing connected or integrated. And all the technology has been offered through OEM agreements with individual device manufacturers. Very little is offered directly to consumers as an after-market or add-on product.
"The ideal is to have one platform that everything plugs into rather than each device having its own system," Nuance's Clayton asserts. "At the end of the day, the user doesn't want ten different systems that he has to remember commands and passwords for."
Yaron Oren, chief technical officer at iSpeech, a home automation systems provider, also sees the need for a single, converged platform. "The whole space will take a huge step forward when someone brings together one complete solution that works well with multiple systems," he says.
Dahl maintains that this is very possible. "We'll eventually see speech decoupled from each device itself and run on a network that sends commands directly to all of the devices on that network," she says.
According to Dahl, many of the pieces for this single network are already in place to some degree. However, "all the people involved need to put them together," she says.
The Remote Control App
Others don't believe the industry will ever come to agreement on a single standard, citing the still fragmented environment around HDTV despite the government's 2009 mandate. They think moving all the technology to the cloud is a more likely possibility.
Already Veveo, a natural language conversational interface and search platform provider, has turned all its efforts to the cloud. Veveo's technology has been adopted by some of the largest TV service providers to help subscribers search programming choices by voice right from their living room couches.
"We don't put anything on the set-top box," says Sam Vasisht, chief marketing officer at Veveo. "A lot of these set-top boxes are already IP-enabled."
ABI Research recently predicted that the worldwide IPTV subscriber base would grow to 79.3 million people this year, continuing the steady growth that began a few years ago.
"Everything has to be IP-based," Vasisht continues. "You really need some central hub for all the commands. There has to be something that sends specific commands to the device."
For Veveo, that one hub is the mobile phone. And the company is not alone in thinking this way.
"What's really going to happen is it will all converge on the mobile phone," Meisel says. "You will have an app on your mobile phone that you will use to control everything."
But in the mobile space as well, a lack of standards could derail any immediate progress toward a speech-enabled home.
"With smartphones you can do a lot today, but there are so many different models, manufacturers, and carriers," Awsare says. Getting the device manufacturers to agree on a single mobile platform, or to develop technologies that can be deployed across all platforms, will be tough, he adds.
It's What's On
The car and smartphone markets today are saturated with voice interfaces that let users perform a host of functions in eyes-free and hands-free modes. The TV, therefore, is seeing the lion's share of the development efforts and consumer attention today. Major manufacturers of TVs and set-top boxes, as well as cable and satellite service providers, are investing in speech systems that let consumers search for and access TV programming and other content with voice commands.
Nuance's Dragon TV technology, for example, is incorporated into models from Samsung, Panasonic, LG Electronics, and several other manufacturers, allowing consumers to use voice commands to search program guides, change channels, search for content on the Web, connect with friends and family via Skype, and access social media content from sites such as Facebook, Twitter, and YouTube.
"Traditional search on televisions is tedious and amazingly outdated," said Michael Thompson, senior vice president and general manager of Nuance Mobile, in a statement. "Dragon TV brings an amazing voice experience directly to the living room, similar to what people do every day on their phones and in their cars."
Other developers of these smart TVs include ActiveVideo Networks, Veveo, iSpeech, Sensory, and Conexant. Even Apple, Google, and AT&T have publicly stated their interest in smart TV technology.
It's What's Next
Beyond the TV, the larger smart-home market is only starting to take shape, with iSpeech and a few other manufacturers taking a leadership position. ISpeech last year unveiled iSpeech Home, a complete home automation system that blends speech recognition and text-to-speech technologies to interact with a number of systems and devices around the home.
Panasonic, though, is right on iSpeech's heels, with its release in Japan last fall of a number of smart, connected appliances, including air conditioners, refrigerators, microwaves, washer-dryers, rice cookers, blood pressure monitors, and calorie counters. Owners of these appliances can remotely program and operate them with their iPhones or Android smartphones. These devices can connect with cloud-based support services to automatically report device faults to the manufacturer.
Google also announced its entry into the home automation market last year with the launch of Android@Home, a network of connected accessories that would use Android as the central operating platform. Google has said little about the initiative since then, however.
And then there's Vivint, the largest home automation services company in North America. The company offers electronic door locks, door and window sensors, lighting and small appliance controls, an IP camera, and a programmable thermostat, all of which can be controlled through smartphones. Its clients number more than 675,000 throughout the United States and Canada.
Of these companies, iSpeech is the only one so far that has committed to speech as an interface. "Voice control is a natural interface to offer a more compelling experience," Oren says.
"We have been pleasantly surprised at how many device manufacturers are looking at this," Oren adds. "There is quite a lot going on beyond just the TV.
"There are a limitless number of products out there that could be made easier to use with a speech interface," he continues. "The technology has such broad applicability."
The Business Case
In business, the same technology used to talk to the alarm clock, and have it talk back to the user, could be used for controlling printers, fax machines, phones, computers, calculators, and any other piece of office equipment.
One industry that really stands to gain from these technologies is healthcare. "Hospitals can really use this," Awsare says. "They want to keep doctors' hands sterile, and so they don't want them to touch a lot of stuff."
Other opportunities exist in the entertainment, government, education, retail, automotive, security, and telecommunications sectors, many contend.
The technology will also expand opportunities for e-commerce, giving TV viewers the opportunity to buy products via voice right through their TVs. "If you see a commercial on TV with a call to action, shouldn't you be able to take that action?" Clayton asks. "If you see an ad for a travel destination, you can book a trip. If you see an ad for a restaurant, you can make a dinner reservation."
The business benefits could be great, Veveo's Vasisht states. "All of this automation puts information and service at our fingertips," he says. "If you can see an ad or a program about a vacation and the system takes you somewhere to book a trip to that location, that's really powerful. This frictionless interaction can be a real benefit to any company that runs ads on TV."
That, Clayton maintains, is an easy way for companies to become more involved in the daily lives of consumers.
Sensory has taken it a step further. In May, it launched a speaker identification product to accompany its TrulyHandsfree speech recognition technology, allowing companies to personalize offerings based on information about the types of programs a customer views, for example.
"A lot of companies have figured out ingenious ways to make their products more useful by offering recommendations and tracking individual usage models and settings. But in a shared device situation, this oftentimes becomes meaningless because each user has individual preferences and desires," Mozer says. "For example, I often get recommended movies targeting twelve-year-old girls, not because that is my preferred viewing experience, but because I have a twelve-year-old daughter who also uses the TV."
As voice control of consumer devices becomes more prevalent, Mozer expects speaker identification technology to improve the user experience on shared devices by providing recommendations and suggestions based on an individual's habits, behavior, and lifestyle or by automatically adjusting the device to the user's unique preferences.
Not for Everything
And while almost everyone would agree that speech could easily become the most natural interface for interacting with many of the products we use every day around the house or office, it might not be the right fit for everything. Having a voice interface for a washing machine, for example, might seem attractive at first, but it isn't really all that practical, Awsare says. After all, with a washer, you still have to load the laundry into the machine and add the detergent. Voice is not going to do that for you.
Similarly, to use a microwave, someone still needs to stand in front of the machine, open the door, put the food in it, and then close the door. Having to push a button to start it doesn't detract from the user experience, Awsare says.
Advances in other technologies, such as robotics, could change that. "Our personal lives are getting automated very quickly—much more quickly than our business lives," Vasisht says. "As consumers get more comfortable [with technology], they'll demand it everywhere."
That was the case when Apple first introduced Siri on the iPhone 4S in October 2011. Now, the technology is pervasive on mobile phones, with similar virtual assistant offerings from Nuance (Nina), Angel (Lexee), and Taptera (Sophia). But Apple really gets all the credit for taking the speech interface to the mainstream.
"Siri has done a lot to raise the consciousness about speech applications. When Siri came out, it changed the way people [thought] about speech technology and how to use it," Clayton says.
"Apple did a good job of showing the consumer what you can do with voice. Now it's up to the rest of the industry to show what else it can do," Awsare adds.
News Editor Leonard Klie can be reached at email@example.com.
Partnership yields voice and vision controls for some home systems.
The turnkey digital audio processor solution with far-field voice processing algorithms and speech recognition for consumer electronics devices, smart home appliances, and toys.
Wider use of apps offers broad potential
Speaker verification is designed with a small footprint and low power consumption for integration into all types of devices