September 1, 2010
By Melanie Polkosky Human Factors Psychologist & Consultant - IBM/Center for Multimedia Arts (University of Memphis)
Interact

To Usability Test...Or Not?

Ever since I officially hit middle age a couple of months back, I’ve found myself doing things I once labeled as Stuff I’ll Never Do. During a recent exchange of usability testing war stories with a colleague, I heard myself utter a stunning declaration: “You know, I think there are situations where it’s better to not even do any usability testing at all!” I cocked my head to the side while my younger, more idealistic self did angry flips in the back of my brain. What!? Did I really say that?

Usability testing provides feedback on a designer’s work. We’ve been drilled with the notion that a script can only benefit from having input from its intended audience. IVR scriptwriting, like any other type of writing, gains infinitely from the writer seeing firsthand how words land on other people. In fact, in teaching students to write, cycles of feedback and rewriting make up the bulk of instruction, which ensures the writer’s intent and the reader’s experience are in sync.

If feedback is essential to a well-executed IVR design, then why would I ever suggest ditching it? No designer worth his salt would cancel usability testing, right? However, in real-world projects, stakeholders often dictate aspects of a test that render its results meaningless. Even worse, poor testing might actually precipitate less usable design decisions than if there were no test at all.

Let’s look at three ways a usability test design can go awry:

1. Recruiting employees: This is probably the most egregious problem imposed on a test. It often happens because it’s difficult to recruit people, there’s no time, there’s no money, and/or someone just wants to use the engineers on the development team or the guys in finance. The difference between anyone associated with the organization and John Q. Public is often substantial. Employee samples might be less diverse, more educated, more technically savvy, more familiar with internal structure and business processes, and of higher socioeconomic status than samples drawn from the general population. All of these discrepancies translate into test results that describe the engineers’ or finance guys’ usage behavior, plus the incorrect assumption everyone else can use an application “because we tested it.”

2. Testing edge cases: In this form of test design sabotage, stakeholders select their pet or most controversial usage scenarios as the tasks participants will complete. Sure, it’s generally a good idea to test some less-common scenarios in a representative mix of tasks, but the bulk of test scenarios should be the most frequently traversed paths of the application. If you test only those once-in-a-blue-moon cases, then you don’t get a view of how the user experience operates for the most common call reasons. Worse, you risk making design changes that negatively impact the majority’s experience for just a few people who almost never call.

3. Oversimplifying measurement: When you’re testing usability, you’re necessarily testing complex human behavior. Thus, measuring it can also be complicated: You draw an adequate picture of what’s happening only if you test in a multidimensional manner, using a broad range of question types (e.g., yes/no, open-ended, closed-set, numeric rating scales) and target behaviors and perceptions. My preferred approach recently surprised me when only open-ended questions revealed subtle perceptual differences in two applications. Other similar tests have hinged on objective performance measures. If I had relied on only one question type in both these tests, my analyses would have been skewed and my design recommendations flawed. Past research has shown qualitative testing to be subject to a wide range of reliability and validity problems, making it problematic as a sole basis for design recommendations. Having a hybrid form of testing offers the best of both measurement approaches, while mitigating their limitations.

Any one of these three common flaws can make a usability test go very, very wrong. It can erroneously proclaim user experiences are working great when they’re not, or imply terribly flawed experiences are just peachy. If you’re putting the time and expense into conducting a usability test, then it needs to be carefully designed so the mirror you’re holding up reveals the script’s blemishes, wrinkles, sagging, and spots accurately. This is not the place for a “skinny” mirror—you want to know the full, ugly, naked truth. You’re going to be taking a scalpel to your script after this usability test, after all. Would you want to give it liposuction if it really needs a brow lift? If a test has been compromised, then you’re not going to be able to make appropriate recommendations. This old gal in me says it’s better to just scrap the test plan and start over. Or don’t even bother—even if axing usability testing falls firmly in the category of Stuff I’ll Never Do.

Melanie Polkosky, Ph.D., is a social-cognitive psychologist and speech language pathologist who has researched and designed speech, graphics, and multimedia user experiences for more than 13 years. She is currently a human factors psychologist and senior consultant at IBM. She can be reached at polkosky@comcast.net.