Expanding & Improving Upon Traditional Usability Focus Groups with Online Usability Surveys

Speech applications have traditionally relied upon small focus groups to assess performance from the caller’s perspective.  What happens when a group of a dozen or so people includes that one person who is difficult to please, or worse yet, the person who gives conflicting opinions such as, “I was very dissatisfied with the system but I had a great call experience”?  The answer is many hours lost trying to understand conflicting data and trying to find solutions to problems that may not even exist. 

A more efficient and less expensive alternative to the traditional focus group is the Web-based usability survey that engages hundreds of panelists rather than small groups of people.  Participants are invited to join a survey via an e-mail with instructions and a task to complete.  After making the call into the application, they’re directed to an online questionnaire to fill out.  The survey sample size is statistically significant, and the feedback acquired from each participant is detailed and abundant.  Nortel partners with Vocal Laboratories, Inc. (www.vocalabs.com) to provide turnkey usability surveys and has been successfully performing online usability surveys to gauge the performance of speech applications for several years.

Much like traditional focus groups, usability surveys have four components: the recruitment process, questionnaire development, survey launch, and data analysis and reporting.  An extensive pool of tens of thousands of potential participants allows the flexibility to choose panelists who will most closely match the caller population in the speech application, and they can be selected based upon known demographics such as age, education, or geography.  Panelists don’t call into the customer’s system directly, and each and every participant call into the survey is recorded and available for future reference.  During the survey analysis phase, having the calls “at our fingertips” provides instant clarity into the participant’s experience.

A typical usability survey takes advantage of at least 500 panelists per study, and this greatly increases the likelihood of identifying problems that otherwise might not come to light until full production when thousands of callers are using the application.  For example, if a problem affects only five percent of the callers, a sample size of 10 has a 40 percent chance of finding it.  With a sample size of 100, the chance of finding that problem jumps to 99 percent.  The ability to recruit hundreds of panelists per survey means increasing the likelihood of finding common and not-so-common problems that will affect real users.

The online questionnaire that panelists complete after making a call into the application usually consists of a dozen or so questions.  The development team and the customer deploying the speech application have the flexibility to design a survey that will best test the usability of the system.  For instance, some customers prefer to execute a “blind” survey where participants don’t know the name of the company represented by the application while other customers prefer that the panelists know exactly which company they’re reaching.  Specific areas of concern or interest, such as how well the recognition performed or whether the participant has ever used an automated system - similar to the one being tested - can easily be incorporated into the questionnaire making it completely custom, flexible, and effective.  Another invaluable part of the questionnaire is the free-form question in which participants can provide honest, anonymous, and often blunt feedback on their experience.

The survey task given to each of the participants can be all the same, or there can be variations on the same task.  A recent survey for a store locator application asked half the panelists to locate a store using a ZIP code in their area, while the remaining panelists were given a scenario about being away from home and “told by a friend” that a store could be located in a nearby city.  When specific scenarios are constructed, the flexibility of the survey allows supporting information given to the panelists to be varied.  For instance, the city name in the scenario that was looking for a store in a particular city was one of a set of locations chosen for variety because some of the stores were further down the list of possibilities when presented to the caller.  This provided enough variation on the participants’ experiences to make the survey as close to “real world” as possible.  Information in the survey instructions is kept to a minimum, but any necessary identifiers such as telephone number or account number that may be needed to complete the call are provided to the participants.

In order to benchmark the application against similar applications in the industry, there are two primary questions that are included in every survey.  The first question measures caller satisfaction and the second question determines the call completion rate.  The caller satisfaction measurement is the percentage of panelists who answered that they were “very satisfied” and “satisfied” (on a five-point scale) with the overall experience.  For comparable benchmarking, the call completion rate is limited to those panelists who were able to complete the assigned task in a single call.  While panelists may make more than one call into the application before completing the task just as real callers may, for purposes of a consistent measurement, only single call completions are calculated in the survey’s call completion rate metric.  The remaining 10 or so survey questions can provide insight into specific questions ranging from how pleasing the voice of the system was to whether or not the menu choices were clear, but the caller satisfaction and call completion metrics are the cornerstone to effectively measuring the usability pulse of the application.

Once the panelist pool is chosen, the questionnaire is written and the application is ready to support the assigned tasks, the survey itself is launched.  E-mail invitations are sent to the chosen panelists with instructions on how to participate and a task to be completed, and a typical 500 panelist survey takes about a week to complete.  When initiating a survey, the invitations are staggered so that busy signals are kept to a minimum since 12 IVR phone ports are typically used during the survey.  Attention is also paid to when the survey begins, since invitations sent out on a Friday afternoon or just before a holiday may result in longer survey collection times.

Surveys are usually completed quickly and efficiently, and when the maximum number of participants is reached, the survey is closed and the data analysis begins.  The online data is easily accessible and can be viewed, filtered, and “sliced and diced” by the speech specialist analyzing the data.  The caller satisfaction and call completion rate are immediate and accurate indicators of how the application is performing.  The remaining survey questions often provide valuable insight into possible trouble spots either with the IVR platform or with the application itself.

To illustrate the advantage of larger sample sizes during usability testing, a recently completed store locator survey found that some store information coming from the back-end database was incomplete.  Out of several hundred participants who were asked to use a local ZIP code to locate a store, two panelists noted in their free-form survey question that the information playback sounded “odd.”  A review of these two survey calls found that the data for these two particular stores was incomplete and that indeed, the playback had awkward and confusing silences.  While it would be nearly impossible to test the full set of data for all of the stores that might be accessed by the application, it was possible to enhance the application by checking for and accommodating missing data elements to “smooth out” the caller’s experience when necessary.

Usability surveys are most valuable when conducted during the integration testing phase just before the application is deployed so that issues that impact caller satisfaction and completion rates can be identified and corrected before full production.  Usability surveys performed at any stage, however, can provide a valuable check of performance as well as providing insight into how well the application measures up to similar applications that have also gone through usability surveys.

Online usability surveys are less expensive to execute, easier to coordinate and deliver than more traditional usability focus groups.  While focus groups might include up to a dozen or so participants, usability surveys take advantage of e-mail and the Internet to broaden the sample size to several hundred participants.  Larger sample sizes increase the likelihood of finding problems that will affect most callers along with those problems that will affect only five percent or less of the callers.  A statistically significant sample size allows the overall application performance to be measured while permitting the dismissal of conflicting data points from a few participants.  Attempting to understand why a caller might say he was “very dissatisfied with the experience, but loved the system” can be left as just an interesting notation in the usability testing.  Finally, because of the ease with which online usability surveys can be conducted, they offer an excellent way to perform periodic “health checks” on an application to insure and maintain quality on an on-going basis.

Fran McTernan has been working in the IVR field for over 20 years, the last six of which have been spent specializing in deploying speech recognition applications.  She leads the team of speech specialists in Nortel’s Professional Services Organization and has been involved with the successful deployment of speech applications for utilities, wireless carriers, and railroad companies, among others.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues