July 10, 2012
By James A. Larson program co-chair, SpeechTEK 2021
Forward Thinking

Job Descriptions for Personal Assistants

Radar O'Reilly, a character on the TV show M.A.S.H., was the perfect assistant. If I had a butler, I'd expect him to serve my needs as well as Radar. Most people expect virtual assistants to work like Radar, who performed the tasks listed below. How can virtual assistants do the same tasks that Radar did?

Task 1: Use multiple modes of communication. We spend the first two years of our lives learning how to speak and listen. We will surely want to use automatic speech recognition and text-to-speech synthesis to interact with virtual assistants. But there will be times when we prefer to type or write rather than speak to our virtual assistant.

Teenagers seem to text more often than they make phone calls or leave voice messages. Touchscreen users use the Graffiti alphabet to enter text or use pickers (a type of menu) to select options rather than type. Rather than type email and Web site addresses, users will take pictures of QR codes using the camera on their mobile devices and use vision processing algorithms to convert the code to text. People use multiple modes of communication, so virtual assistants should too.

Task 2: Recognize the person needing service. Virtual assistants use a variety of techniques to recognize and verify that a user is the person he claims to be by:

Something the user has: Virtual assistants use the mobile device camera to scan ID badges, driver's licenses, or other forms of picture ID. The mobile device itself contains its user's identity.

Something the user knows: Virtual assistants use challenge dialogues to ask users questions only the user is likely to be able to answer, such as "What is the name of your first pet?" or "What is your mother's maiden name?" I regret that we will probably continue to be plagued by artificial user identifiers and passwords that must be changed frequently.

Something about the user: Algorithms that recognize a user's signature, voice, face, or fingerprints can identify and validate that users are who they claim to be.

Because each identification technique can be compromised, a combination of techniques provides a greater deterrent to identity thieves than a single technique.

Task 3: Understand requests. In order to process a user request, a semantic interpretation algorithm must extract the user's intention from the text generated by speech and handwriting recognition algorithms or keyboarded by the user. Semantic interpretation algorithms extract the user's intention from text strings that are either spoken or typed by the user. The semantic algorithm puts user intentions into a format that can be processed by the virtual assistant.

Task 4: Know the preferences and choices of the requester. Information about the user in the form of a user profile and a history of actions performed by the user provides insight into the semantic interpretation algorithms. Virtual assistant designers must address privacy concerns about the information collected.

Task 5: Resolve ambiguous requests. Virtual assistants may be able to resolve ambiguous requests by consulting the user profile and history. Otherwise, the virtual assistant clarifies such requests by asking the user questions.

Task 6: Identify resources required to perform the request. The virtual assistant must identify the applications, Web sites, and resources needed to perform the user's request. Then the virtual assistant must develop a plan to perform the request. For simple requests, the virtual assistant can use predefined plan templates and known resources. For more complex requests, the assistant may need to (a) apply discovery protocols to learn what resources are available, and (b) apply planning algorithms to develop a strategy for accessing resources and integrating results.

Task 7: Perform the request promptly. The virtual assistant must coordinate the execution strategy involving multiple applications and Web sites, and produce the requested results in a reasonable time. The virtual assistant may need to determine the best medium for presenting results in a form the user can quickly understand and use.

The tasks listed above are very ambitious. Hundreds of work hours are necessary to apply research in semantic interpretation and planning algorithms to make virtual assistants behave as well as Radar O'Reilly. Speech processing is only a small part of the work needed. Virtual assistant developers must broaden their knowledge beyond speech processing technologies to develop new skills, tools, and standards for designing and implementing helpful virtual assistants.

James A. Larson, Ph.D., is an independent speech consultant. He is coprogram chair for the SpeechTEK Conference. He also teaches courses in speech user interfaces at Portland State University and the Oregon Institute of Technology. He can be reached at jim@larson-tech.com.

Job Descriptions for Personal Assistants

Siri, Meet Nina

The Changing Perception and Reality of Speech Recognition

Vlingo Announces its Virtual Assistant for Smarter TV

Tronton Launches Talking Mobile Virtual Assistant

Aetna Members Get Answers

Next IT Releases ActiveAgent for Live Chat

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

VoicePing Releases VoicePing 3.0

Voiskey Officially Launches

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

DeepL Acquires Mixhalo

The Voice Can Sound Right, and the Video Can Still Be Wrong

Canary Speech Partners with NeuroLexIQ

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Voicelyt Launches Voice Score

DXC Partners with ElevenLabs

Nabla Launches Dictation for Mac

Fish Audio Raises $52 Million in Seed Funding

Deliverect Partners with SoundHound AI

OrcaRouter Launches OrcaDub