June 20, 2005
By Tom Houwing Director - voiceandvision B.V.
Features

Multi-Persona VUI Design

In the world of voice recognition-based applications, we encounter systems that use a clear, pre-defined single-persona concept to represent the Voice User Interface (VUI). The idea behind this approach is to provide callers with a consistent style of prompting and an overall Hear and Feel that ensures user-friendliness over a period of time. Callers would more easily learn how to get along with the voice application, turning from novice- into power-users, without unnecessarily being distracted by different personas representing the system-output. The single-persona concept meets one of the most important VUI guidelines: a consistent VUI improves human–machine interaction.

But, as far as natural, verbal human interaction is concerned, people do not get a better understanding of certain situations, issues, problems or discussions by having topics being represented by one person only. It’s the very diversity of thoughts, attitudes and opinions of others that enables us to test our own values.

Current technological development guarantees improved speech recognition, and together with the fact that most business decision makers begin to understand the impact and importance of a professionally designed VUI, the course is definitely set for voice recognition-based applications to become a realistic and competitive channel for automating and improving customer service.

Among other things, I see a clear trend of larger companies wanting to consolidate their full range of services, which have often evolved into individual segments over time (each often with its own phone number that the public has come to know and accept), under a so-called “one-number-concept.” The path to “go for voice” is being initiated by automating part by part, being able to monitor the process and evaluate its potential. The creation of a voice portal with the integration of a potentially large number of various voice services, each with its own unique Hear and Feel, becomes reality.

Hear and Feel Definition
The machine output embodies the personality of a system and callers will perceive a personality in the spoken system whether you plan for it or not. Therefore, voice talent(s) should be carefully selected to ensure the appropriate tone of voice, pitch and timbre. Wordings and overall style need to connect with the application character, cover a corporate identity, and meet callers’ expectations.

It is important to define the Hear and Feel of a voice application early in the project, since it affects the entire persona and prompt design, as well as the recording session.

When it comes to corporate identity, the VUI provides a great opportunity to leverage existing investments in branding and image (such as audio branding through jingles, Internet, television and radio commercials, and the telephone) and differentiation with respect to competitors’ offerings.

Multi-Persona
Conversation, or even better, a dialog, between two people differs from conversation within a group. In a group, different participants have different profiles.

In order to create an effective multi-persona experience, callers need to be able to distinguish between the different profiles. A point of view, a character or even human values represented by specifically dedicated voices create a recognizable structure. In daily life, people associate certain types of voices with certain personality values. Stereotypes help to convey certain messages: People with deep voices, who do not speak too fast, are often taken more seriously than those with high-pitched voices who talk rapidly.

Modern speech applications need to be able to handle comprehensive and intelligent routing processes and at the same time, within a more complex structure, provide callers with an auditory overview. In this context, the interface of a voice recognition-based portal with a considerable number of integrated voice services could very well be represented by several voices differentiating between specific services, customized preferences, functions or layers within a more complex structure. A VUI approach that provides individual services and functions with their own voices, audio-branding and style brings auditory clarity, for example, to larger and more complex structures.

Requirement Analysis
For any type of design to be successful—no matter what it is—you have to understand which tasks are to be incorporated and who's going to do them.

In VUI design, as in any software project, requirements derive from two sources: the business goals on one hand and customer behavior and needs on the other. Examining customer requests, the task or tasks of the application, user characteristics and demographics, and technical constraints together with a solid research methodology will help generate a complete set of requirements.

Taking into consideration that most automated transactions are representations of real-world transactions between a caller and a live agent, the VUI designer needs to examine the real-world task from as many angles as possible. Careful consideration must include a detailed understanding of the callers, their environment, the various interactions involved in the task, and of course, above all, the desired outcomes and behaviors that the VUI is designed to promote. In order to get a realistic idea of what the real world is all about, well-prepared interviews not only with the business decision makers and people from the marketing division, but also with call center agents, customer service managers and technical experts are essential for the information they provide. Such research leads to and helps to arrive at a sufficiently detailed task model and the most effective VUI design possible.

One Service With Many Voices
Subsequent to a thorough requirements analysis, the VUI designer needs to find out what kind of tone, wordings, speech rate and formality callers do prefer and – more importantly – why. This can be done by letting the callers decide. By offering several personas for the same application, the VUI designer is able to get valuable information:

A first approach within multi-persona systems is to randomly select one of the system personas and re-evaluate caller feedback through dedicated testing.
Another approach is to enable callers to select a preferred persona. This can be done by voice, within the voice application itself, where several personas are introduced with prompt samples, or by clicking a checkbox from within a supporting Web site, where several available personas are introduced with sound files, personality profiles and pictures.
A third approach could very well be to appoint a certain persona to a certain target group, for one reason or another (age, gender, education, income). This, of course, can only be realized when callers are identifiable (through ANI or caller identification by, e.g., client number or account number).

By combining several approaches, caller preferences can be evaluated and truly user-centered design can be achieved.

Either way, by providing a voice application with several personas, callers are put in control to customize the voice application according to their preferences. And, as we all know by now, it’s considered to be solid, good VUI design, when callers are “in the driver seat.”

Or Every Service With Its Own Voice...
In designing a VUI for a voice portal that brings together a variety of services, it’s important to offer consistent navigation. Choosing an “anchor persona” as a moderator, presenting an introductory welcome as well as the menu structure, always-active commands, short-cuts and maybe additional information, will help callers to orientate. The concept assigns the function of moderator to a particular voice. This "gatekeeper" could, in turn, hand things over to another persona embodying a chosen service. After using the service, the caller may be routed back to the moderator who metaphorically represents, for example, the main menu. In such a way, different voices not only represent different functions, but also services, service characters and structures. Various voices create a certain structure which enables callers to navigate by sound only. Within a multi-persona VUI design, each individual service can be branded with its own voice, creating great added value.

Multi-Persona Design and Event-handling
With certain voice applications, the VUI designer may want to associate a certain function with a dedicated persona. For example, the initial dialog might be covered by one persona, while the “No Match,” “No Input” and “Help” prompts might be covered by another one. The designer might even want to set the entire event-handling within the context of an effective metaphor. Sooner or later, callers will associate a certain type of voice with a certain state in the dialog. A particular voice might represent detailed information or assistance. Any change of persona will be clearly noticed and will trigger heightened caller attention.

Multi-Persona Design on a Prompt-Level
Imagine, for example, a main menu being represented at the same time by two or more personas having an animated conversation. By calling in, callers are “thrown into” a situation or ongoing action. The personas, in the middle of a conversation, detect the caller and react accordingly. One persona could very well represent prompts that initialize caller input, while in parallel another could represent additional information and yet another assistance or more detailed instruction when things go wrong. By having more than one persona servicing callers, the VUI designer can set a tone and create an atmosphere that could very effectively connect with the target group.

An information service represented by several personas, each representing a certain expertise, again creates auditory structure. For example, a medical doctor vs. the insurance company representative, together with a moderating persona responsible for independent consumer information.

Creating Caller Acceptance
Speech recognition-based systems are not waterproof. In every new project, designers and developers will face recognition, tuning and grammar problems time and again. In this respect the VUI designer might want to personify weaknesses in speech recognition in order to create caller acceptance. In a dual-persona design approach, one character could be set up as the responsible one and the other as a subordinate one. The subordinate character is evoked when recognition is weak or fails. A normal human reaction is to want to help the underdog. By presenting the subordinate character as being good-hearted, passive, somewhat weak, and by having the superior bossing him around, callers will tend to sympathize with the subordinate character. If the subordinate character is the one put in charge of confirming names or numbers, and makes a mistake in doing so, callers will show more patience and understanding. In fact, instead of confronting the callers with speech recognition’s weak spots and thereby exposing them, this kind of VUI creates caller acceptance and ‘service with a smile.’

Depending on the requirement analysis and overall structure, multi-persona VUI opens up colorful new concepts and creates additional possibilities. In larger systems that bring individual services together, multi-persona VUIs will be inevitable.

Being aware of design techniques and problems up front might provide VUI designers with future guidelines on how to handle a multi-persona concept in a successful way, ensuring great VUI design and caller-acceptance.

Tom Houwing created and leads the VUI group at VoiceObjects. In his role as principal VUI, he is responsible for the overall quality and art direction of the company’s VUI design.

Multi-Persona VUI Design

Eltropy Expands Voice Authentication Ecosystem with Illuma, IDgo, and Pindrop

Modulate Expands Velma with Voice-Native Real-Time Conversation Intelligence

Corti Launches Symphony for Speech-to-Text

Why Voice AI’s Next Big Challenge Isn’t Accuracy. It’s Relationship Design.