As Voice Assistants Multiply, When Will We Get a Registry?
Voice assistants such as Apple’s Siri, Samsung’s Bixby, Amazon Alexa, Sound Hound’s Hound, and Google Home have become increasingly popular, and individual companies are hopping on the bandwagon to create assistants for their brands. Brand-specific voice assistants can help customers purchase, install, debug, fine-tune, and train a company’s products, and they can also help customers actually use those products and services; for example, Comcast’s X1 service enables users to verbally select and view TV programs on demand. As a way to connect customers with companies, voice assistants are poised to outperform both smartphone apps and websites.
Introducting the VAR
The ubiquity of voice assistants points to the need for a voice assistant registry (VAR). What is a VAR? It’s a new component that identifies the domain name associated with a voice assistant and maintains database information for each one, including this:
• Verbal name. Users speak the verbal name as a default wake-up word to initiate a discussion with the voice assistant. The verbal name is represented using a phonetic alphabet such as the International Phonetic Alphabet (IPA).
• Domain name. A unique identifier associated with the voice assistant—basically, the domain name refers to the website where the voice assistant resides.
• Description. A description of the voice assistant’s capabilities and functions.
To route a user request to the appropriate voice assistant, a VAR must resolve the following problems: (1) users may change their personal wake-up word to replace the default wake-up word; (2) a single voice assistant may have several different wake-up words—for example, users might have different accents or might speak different languages; and (3) a single wake-up word may wake up several voice assistants—for example, voice assistants named “Cents” and “Sense” would sound the same when spoken, so the wrong voice assistant might be initiated.
Potential solutions to these problems include these:
• Track personal wake-up words and their relationship to the default wake-up word.
• Use local context. Consider which verbal assistants the user previously used.
• Enable user choice. Ask users to choose the verbal assistant with the description that most closely matches their needs.
• Use AI. Maintain a history of user utterances tagged with relevant voice assistants and then use this data to train deep neural networks to select the appropriate verbal assistant.
One example of a VAR is Cerence’s Cognitive Arbitrator. According to Joe Iacobelli, director of sales engineering, product, and strategy at Cerence (a spin-off from Nuance Communications), “Cerence’s Cognitive Arbitrator is an AI-powered product that makes it fast and easy to build in-car voice systems that can seamlessly support various virtual assistants, third-party services, and content.” The Cognitive Arbitrator can learn preferences over time so that it knows which assistant or content service the user prefers for specific tasks.
In Search of a VAR Standard
In the Wild West of today’s voice assistants, it is clear that a voice assistant registry is needed. Different vendors will create their own solutions based on differing technologies and customer needs. Should there be a single use-by-all VAR, or should there be several VARs for different languages, different industries, or different networks? Should there be a standard API for a VAR? One or all of the four standards working groups below could step forward to create a VAR standard:
Open Voice Network’s leader, Jon Stine, tells us, “At the top of our priority list is a Destination/Dispatch Name registry, modeled on the [Domain Name System]. We would be delighted to coordinate with other standards-development efforts to find common ground and avoid inconsistency.”
Project Connected Home over IP currently has no plans to build a voice registry, according to Heather Chesterman, PR supervisor for the Zigbee Alliance, but it is willing to work with other standards organizations on anything that would improve the user experience around connected devices.
Dirk Schnelle-Walka, cochair of the World Wide Web Consortium’s W3C Voice Interaction Community Group, says, “VAR is part of our efforts on interoperability for conversational interfaces. Anyone is welcome to participate in this Community Group. We would welcome a liaison with other groups.”
The fourth group, the Voice Interoperability Initiative, declined to respond to our request for comments.
James A. Larson is co-program chair, Speech Technology Conference. He can be reached at firstname.lastname@example.org.