Answer Technology: A Necessary Step in Human-Computer Conversation
We often use the web to provide us answers, and interactive voice systems are a fast way to access that resource. Specifically, digital assistants such as Apple’s Siri, Google Assistant, and Amazon’s Alexa are often used to get general information. Company-specific digital assistants for customer service are often used to answer questions about company products and services.
To be effective, those assistants must include effective underlying technology in speech recognition and natural language understanding to process a spoken question. If the answer comes from a text source, text-to-speech synthesis is required to speak the result. These core technologies are continually improving.
But more is required. For digital assistants to encourage usage and to continue to engage users in voice conversations, concise spoken answers are required. A recent survey by Opus Research asked participants to agree or disagree with the statement “Voicebot responses should be short and purpose-driven”; 78 percent of respondents agreed.
Finding concise answers to user questions involves a separate technology from the core language technologies mentioned. For lack of a better term, let’s call the technology for finding answers suitable for concise spoken replies “answer technology.”
Too often, the current approach of voice-enabled digital assistants to answering questions is to use a device’s screen when available and default to a web search. An all-too-frequent response is “Here’s what I found,” followed by a list of websites. When a screen isn’t available or when the user’s eyes and hands are otherwise engaged, this obviously isn’t an acceptable option. Even when screen options are viable, requiring users to look through lists of websites is not providing them an efficient answer. Further, such actions drop users out of voice interactions and into text interactions, destroying the conversational flow.
Frequently asked questions (FAQs) are often handled effectively with software written specifically for each FAQ. For example, when I just asked Alexa the weather forecast for tomorrow, she said, “Tomorrow in Tarzana, there will be lots of wind and a high of 83 degrees Fahrenheit and a low of 46 degrees.” Custom treatment of FAQs can be effective when feasible, but how can an assistant deal effectively with other than the most common FAQs?
Part of the solution must be finding appropriate information sources. Keywords in inquiries could be used to find potential sources, as in web searches. The sources could perhaps be further analyzed to find a summary paragraph. But if the goal is to maintain a conversation, a long delay while executing such a process is problematic. Some pre-analysis is required to provide prompt answers.
Both general digital assistants and company customer service lines have large databases of requests. They also have data on which responses appeared to satisfy users. AI techniques such as machine learning using deep neural networks (deep learning) can be used to exploit this data and provide quick responses to questions that are similar to past questions. This type of analysis has proved effective with web search and can be effective in answer technology.
An analysis could start with conventional natural language processing of the question to provide input to deep learning software. Such NLP analysis usually provides the intent of the user (e.g., to obtain the current price for a specific stock) and the specific variables required to address that intent (e.g., the stock name).
A full solution must extend to the information sources themselves. Today, web sources are designed for text display. Both company sources and general sources such as Wikipedia could provide an initial summary paragraph suitable for speaking. A digital assistant could speak the summary and respond to “more” with a continuation of the rest of the entry, or respond to “another” with a summary from a different source.
One obvious approach that is being enthusiastically developed by large companies is the AI-based text generators like ChatGPT, a chatbot created by OpenAI, a Microsoft-backed startup. Alphabet, the parent company of Google, recently announced its own AI-powered chatbot, Bard.
A specific website designed to support spoken replies would be ideal, a “voiceipedia”; such a site might even include professionally spoken audio files as well as text to be spoken by text-to-speech synthesis. It could perhaps be supported by spoken ads, motivating investment aimed at expanding content and improving quality.
Effective answer technology would make digital assistants more useful. The resulting increase in use of digital assistants could then spur increased investment in answer technology, a virtuous cycle.
More generally, an exponential increase in affordable computer power (driven in part by cloud computing services) will provide support for the improved performance of digital assistants. Voice-enabled digital assistants can become the most intuitive connection between humans and computers, but providing concise answers promptly will be a necessary supporting technology. x
William Meisel, Ph.D., is executive director of the Applied Voice Input Output Society and author of the recent book Evolution Continues: A Human-Computer Partnership.