Intelligent Agents Are Poised to Change the Conversation
Long one of the most exciting and potentially transformative technologies, conversational intelligent virtual agents (IVAs) seem ready to stretch the boundaries of the speech recognition industry. Here’s an overview of what such agents can do now, and what they’ll be capable of just over the horizon.
But first, let’s back up with a primer on how they’re put to work. Developers train conversational agents to extract a user’s “intent” from a user’s “expression” before performing an associated action. An “expression” is usually a spoken utterance, a string of words, a video of a gesture, or a combination of these inputs. Associated with each expression is the user’s “intent,” an action to be performed based on the user’s request. For example, an expression could be “What is the temperature?” The intent from that expression points to a chunk of software code that obtains the temperature from a thermometer and presents the result to the user.
Thousands of expressions with associated intents are created and fed to agents, which remember each expression and associated intent. When deployed, an agent uses complex algorithms to extract intents from both existing and new expressions. For example, the IVA could extract the same intent from the known expression “What is the temperature?” and a new expression, “How hot is it?”
Agents Recognize Speech of All Kinds
Commercially available voice assistants like Amazon Alexa, Google Home, and Microsoft Cortana recognize spoken expressions and perform the requested actions, such as searching the web, consulting the user’s calendar, or performing transactions.
And developers can create specialized voice agents by carefully annotating expressions of various types. For example, a speech recognition system could be tailored to recognize dialects and accents, which are often troublesome for general speech recognition systems. Speech disfluencies, such as stuttering or mumbling, could be detected and corrected for. To improve accuracy, IVAs can be trained using multiple sources of input, such as speech and lip movement, via mobile devices’ cameras. In a car, for instance, a camera focused on the driver’s face could improve the recognition of the driver’s speech.
These powerful tools are capable of many other types of services. Travelers can use an IVA’s translation services to convert their spoken utterances into similar utterances in a foreign language. Intelligent agents can identify and authenticate users wanting to access software and gain entry to hotel rooms, cars, bank vaults, and secure areas.
Agents Can Recognize Traits and Conditions as Well
Ongoing research is investigating how IVAs can be trained to recognize personal traits such as a person’s age and gender, adding additional clues to the person’s identity. Jeff Adams, CEO of Canary Speech, has postulated a new diagnostic tool, a “speech stethoscope,” that can analyze a person’s voice to detect early Alzheimer’s disease, Parkinson’s, depression, suicide risk, head concussions, sleep apnea, and other conditions. Researchers are also studying whether IVAs can use speech recognition to detect intoxication by alcohol or prescribed or illegal drugs.
Speech carries information about a person’s stress level and emotional state. So when a customer service IVA detects a rise in a caller’s stress level, it can transfer the user to a human agent. If an IVA that assists shoppers knows the user’s emotional state, it can suggest products that increase the user’s feeling of well-being and avoid products that decrease it.
Data Is King
As we’ve seen, IVAs have the potential to provide many new types of capabilities and services, but not without an immense database of expressions with associated intents. Besides writing code to perform intents, agent developers have to create expression-intent pairs to train agents, and previous interactions between users and computers are a useful source of expressions and associated intents. Companies with large histories of customer interactions are well positioned to develop agents. (Companies should provide descriptive privacy policies to customers about the type of information captured, how it is used, and how users can opt out.)
Prepare for the New World of IVAs
Future IVA developers should evaluate alternative agent development platforms by using them to develop prototype IVAs. To be able to extract pairs of expressions-intents, developers should have a firm grasp of data science, the study of extracting knowledge from data in various forms. Developers also need to understand deep neural networks and other technologies used inside of agent platforms to fine-tune how agents work.
James A. Larson is program chair for SpeechTEK 2018, which includes sessions about agents and their development. He can be reached at firstname.lastname@example.org.
As we enter the 'immersive age,' we need to prepare for a whole new way of interacting
ITR systems let callers blend texting and talking
Developers have to design both visual and voice experiences for today's devices