An Open Letter to Voice Agent Platform Developers
A voice agent avalanche is coming. Every website owner will have a voice agent promote their products and services soon. Users will control devices on the Internet of Things by speaking with voice agents. Every file, TV show, and event will have a voice agent with which to engage or attract users. Thousands of voice agents will need to find and interact with one another. You are in the right business!
Voice agents will need to cooperate and, sometimes, collaborate with each other. To be competitive, your future products should support the following six voice agent interoperability features:
1. Invoke remote voice agents. Given a specific voice agent address, any user or voice agent anywhere in the world can invoke and interact with it. If letters and postcards can reach their destinations anywhere in the world, users should be able to connect to every voice agent, regardless of platform or location.
2. Make explicit requests for a specific voice agent. If users or voice agents know the name of a specific voice agent, they should be able to connect with it. This is similar to using the telephone white pages. Today, the Domain Name System (DNS) provides this function, which routes a request for a specific website through the internet to the named website. A similar Voice Registry System (VRS) will enable voice agent owners to register the unique names of their voice agents to enable users to directly connect to them.
3. Switch between voice agents. At times a voice agent user will need information from a second voice agent. For instance, a voice travel agent might need weather information to properly advise a user about travel plans. One approach for switching between voice agents is Amazon’s Voice Interoperability Initiative, allowing Amazon’s Alexa users to speak the name of a voice agent to activate it. Several multi-agent demos can be seen here.
4. Process an implicit request. An implicit request specifies products and services rather than naming the voice agent—like asking for companies that sell a product (identified by its universal product code) located nearby. Like using the telephone yellow pages, users would search the VRS for candidate voice agents by product name or location, and if several voice agents are identified, an arbitration service recommends one.
5. Share data and context between voice agents. Have you ever used multiple voice agents and had to answer the same questions from each? If one voice agent can share the user data it has collected, the context, and the controls/permissions with a second one, that redundancy goes away. An example of data sharing is “skill connections,” which enables an Alexa skill (Alexa terminology for an AI app) to use another skill to perform a specific task. A skill connection lets developers offload common tasks to outside providers. Skill connections also improve the Alexa customer experience by allowing customers to move freely between skills without having to repeat information. For example, a customer who interacts with a recipe skill can say “Print this” to print a recipe. Then the recipe skill can use a skill connection to forward the request to a printing skill without the need for redundant questions.
6. Extend an enterprise’s persona. A persona refers to the voice and characteristics presented by a voice agent. Just as GUI apps have a look and feel, voice agent personas have a distinctive sound and ambience that leads to branding. Rather than switch between personas when switching between voice agents, developers can maintain the persona of the first voice agent when users switch to a second voice agent. This will minimize confusion in users’ minds and promote the branding.
Other advanced features for voice agents include discovery—locating a new service; planning—developing a sequence of voice agents that perform a complex task; and identification—identifying both the user and the voice agent so a secure connection can be made.
The Open Voice Network (OVON) is a neutral, nonprofit industry association dedicated to the development of standards and ethical use guidelines to make voice worthy of user trust. It operates as a directed fund of the Linux Foundation and is independently funded and governed. OVON is defining the data formats and message protocols for implementing the above features and is creating proof-of-concept demonstrations of interoperable voice agents. Share your ideas, needs, and concerns about interoperable voice agents with OVON. (https://openvoicenetwork.org/contact/) to help set the future of interoperable voice agents.
Open Voice Network
James A. Larson is also the co-program chair of the SpeehTEK conference. He can be reached at firstname.lastname@example.org.