What is Usability Testing?
Looking around the industry, it is apparent that "usability testing" means a number of different things to a number of different people.
While there are consistencies in methods and techniques among many speech industry usability analysts, there is no obvious consensus as to the purpose of usability testing or on any particular way to conduct usability tests. For many, usability testing refers to a pre-coding phase in the speech application development process.
The idea is to test the acceptability of one or more design approaches prior to committing the approaches to code. Typically this entails use of a Wizard of Oz procedure in which users interact with a simulated system. Sometimes the simulated system is the product of a prompt-playing, data-collecting software environment. Other times, users may simply interact with a person who "pretends" to be the system by reading prompts and following the decision logic of a dialog design specification.
Usually testers attempt to create a testing environment that realistically reflects the system's eventual production environment. For example, most telephony-based, application testing takes place over an actual telephone connection. For others, usability testing refers to a post-production process.
Typically the idea is to tune an existing application by finding and fixing its shortcomings. Sometimes this means testing groups of selected experimental subjects while at other times, usability investigators may study actual "live" user interactions. Some investigators adopt the goal of proving that a design works. Others embrace the goal of trying to make the system fail. Acceptability and shortcomings How one measures the "acceptability" of a voice user interface or even a particular "shortcoming" in a VUI is also subject to debate.
Some approach the problem with a product marketing mentality wherein the apparent intent is to determine what people think or feel about a system. Users are exposed to a system in some way or another and subsequently questioned about their experience. Focus groups are often used in these procedures, as are follow-up user questionnaires. Other investigators take more behavioral, data-centric approaches.
One of the more powerful methods is the observational study. In an observational study, users are generally videotaped while using a system to perform one or more assigned tasks. A usability analyst will subsequently infer the presence of VUI problems from specific user behaviors that suggest confusion or annoyance. None of these approaches and methods is without merit and all can provide particularly valuable insights into the workings of a VUI.
On the other hand, none of these approaches is particularly effective in avoiding subject and experimenter bias. And as any social scientific researcher will attest, uncontrolled experiments cannot strongly support any particular analytical conclusions.
A better way? In order to avoid the introduction of bias, principles of sound experimental design and appropriate statistical analyses must be employed. Additionally, investigators should always clearly state the experimental questions they intend to ask and identify each independent and dependent variable that they intend to measure. Furthermore, the industry would be well served by the adoption of standard questions that usability testing is intended to answer. Several basic behaviorally-defined questions might be:
- Overall percent task completion
- Percent task completion by task
- Mean duration for all tasks
- Mean duration for each task type
- Incidence of recognition failure for all tasks
- Incidence of recognition failure for each task type
- Incidence of inactivity errors for all tasks
- Incidence of inactivity errors for each task
The most important question of all? According to Dr. Susan Hura of Intervoice, "Usability is not a single quality per se, but rather the combination of several factors, including learnability, effectiveness and user satisfaction".
Dr. Hura is certainly correct but unfortunately, she is one of relatively few in the usability community to emphasize the importance of learnability. Learnability is a property of an application's design. It is essentially a measure of the application's ability to teach its users how to use it efficiently. To the extent that the application is designed in accordance with the laws of learning psychology, its learnability will increase. Learnability can be assessed with a randomized-task, repeated-measure experimental design. Not to be confused with simply asking users to do the same task repeatedly, this experimental design can reveal evidence that an instructional design in the application is at work across the application's various tasks. In order to assess learnability, one slightly alters experimental questions such as those listed above. For example:
- What is the overall percent task completion as a function of repeated trials? or,
- What is the incidence of recognition failure for all tasks as a function of repeated trials?
Ironically, investigating learnability can provide uniquely valuable insights into other experimental variables such as effectiveness and user satisfaction. It stands to reason: if an application teaches its users how to use it efficiently, its designers will find it more effective and its users will find it more satisfying.
Dr. Walter Rolandi is the founder and owner of The Voice User Interface Company in Columbia, SC. Dr. Rolandi provides consultative services in the design, development and evaluation of telephony based voice user interfaces (VUI) and evaluates ASR, TTS and conversational dialog technologies. He can be reached at wrolandi@wrolandi.com