Moving Beyond 'I'm Sorry, I Didn't Get That'
At SpeechTEK West in San Francisco last year, a group of voice user interface designers participated in a workshop directed by James Larson of Larson Technical Services and Lizanne Kaiser of Genesys Telecommunications Laboratories. Participants developed recommendations for dealing with error messages in speech applications from four perspectives. This is the second in a series of four articles summarizing the recommendations reached by these experts. Contributors to this article are:
• Jenni McKienzie, Travelocity;
• Paul Greiner, Viecore (a Nuance Communications company); and
• Sunil Issar, Convergys.
Writing the positive call flow is the truly fun part of voice user interface (VUI) design. However, miscommunication between the caller and the system can occur for a number of reasons. Without careful attention to the error-handling strategy and prompts, callers can enter the death spiral of IVR hell to the point of hanging up out of frustration, requesting an agent, or having the system initiate the transfer to an agent. None of these outcomes is good for customer satisfaction, brand image, or containment rates.
The goal of error strategies is to guide the caller to provide necessary information in a way that has a good chance of recognition with a high confidence score.
Let’s first consider the term "error." The word implies a departure from a smoothly flowing dialogue having a single question-and-answer turn for each step through the call flow. In this sense, error-free dialogues are a lofty and unnecessary goal, given that they don’t even exist in human-to-human communication. Asking your conversation partner to repeat or explain is not a detour from a human conversation, but rather a natural and integral part of it.
"That darn Laura makes me so mad!"
"Lauren?"
"No, Laura."
"Why’s that?"
Yet the term error remains a convenient way to describe the two traditional dialogue event categories of no input and no match. And we’ll continue to use the term here.
It goes without saying that designing an error strategy always begins with writing well-crafted initial prompts that reduce errors in the first place. But moving beyond that, there unfortunately is no generic recipe for writing error prompts, no hard and fast rules that apply universally to all error circumstances. The first step in writing effective error prompts is to understand why and how dialogue errors occur.
First let’s look at no input. It appears to be pretty simple on the surface: Callers don’t say anything. But why not? Are they not paying attention? Do they not have the information requested (like an account number)? Do they not understand what’s being asked? Are the choices too similar and they’re not sure which one is correct? Are they hoping that playing possum will get them to an agent? Are they not talking loudly enough? Each of these situations would lead the designer to write a different type of error prompt.
When we look at no matches, the breadth of possibilities grows even more. Callers might simply be restarting or self-correcting, attempting to give an in-grammar response. Out-of-grammar responses fall into a lot of categories as well. Callers might be rambling or supplying unanticipated prefiller or postfiller. They might be trying to make the system error out and transfer them. Maybe side speech or background noise is causing the no match. Then there’s the case of when the caller says something that’s completely covered by the grammar but just not recognized. Once again, each situation would lead the designer to a different reprompt.
Many Errors, One Response
There are many different types of errors, but we designers have tended to have a single way of dealing with them all, either due to time or technological constraints. We can take what we know about the error and tailor our handling of it, crafting specific verbiage to handle each gracefully.
In the old days of VUI, we said things like, I’m sorry, I didn’t get that. Most of us have moved away from apologizing, but do we need a transition phrase at all? We’re finding that getting straight to the reprompt often works better. Sometimes simply changing the prosody can express equally well what robotic transition phrases have done in the past. For example, asking Whaaat was that account number? with a drawn out "what" does a great job of conveying that the IVR knows you were trying to answer, but needs you to try it again. The trick here, of course, is coaching your voice talent to deliver this line in an effective and believable manner.
Changing the prosody is especially effective in situations where you think the caller was trying to respond correctly but had some kind of restart/self-correcting/recognition error where no extra information is needed, just another opportunity to answer.
Other types of errors are best handled by a strategy of providing more information. The caller might be better served by a prompt that gives tips on how to say an account number or where to find it.
Most designs now offer dual-tone, multifrequency (DTMF) in initial prompts when there’s the potential for privacy issues—say, with account numbers or medical procedure menus. But even for simple menus, explicit mention of DTMF options is certainly appropriate to remedy situations of restarts, out-of-grammar responses, or ongoing background noise.
We also need to consider that the caller might reasonably not know the requested information. The caller might not have his account number in front of him. In this case, he needs an out.
So for our simple account number prompt, we’ve now come up with several possible reprompts. Here they are all together, followed by an explanation of where each would be appropriate to use:
• Error: No match
Caller restarted, was distracted, some digits detected, or prefiller "My account number is..." recognized.
Prompt Variant: Whaaat was that account number?
• Error: No input
Caller typically has a statement in front of him and just needs help finding the account number on it.
Prompt Variant: Please give me your account number. It’s in the blue box in the top left corner of your statement.
• Error: No match with low confidence
Caller is saying the account number digits as natural numbers, e.g., "51-13" because they’re printed in groups of two on the statements.
Prompt Variant: Please say or enter your account number one digit at a time. For example, three four two, etc.
• Error: No match or no input
Company knows a small number of callers don’t know or don’t yet have an account.
Prompt Variant: Say or enter your account number, or say "I don’t have it."
Now let’s look at how errors tie into grammars. If the grammar’s format is flexible, the initial prompt can be format-free, but the reprompt can suggest the most recognizable format and limit the grammar. For example, look at a flight information prompt. The initial prompt may be as simple as Where are you leaving from? The grammar accepts BOS, Boston, Boston Massachusetts, Boston Logan, and Logan. But if the departure airport cannot be determined on the first attempt, reprompt with more directive instructions like this: Please tell me the city and state from which you’re leaving. This guides the caller to a specific way to formulate the answer and aids recognition accuracy, since there is no Austin/Boston confusion when it’s followed by Massachusetts.
Something to consider in this case is using a stripped-down grammar that only accepts city/state combinations. Combining the directions with the smaller grammar with less confusability should increase the likelihood of successful recognition. The flip side is that if a caller originally said Logan and was misrecognized, barging in over the reprompt gives the caller no chance of being recognized. At this point, if we really want to strip down this grammar, we may have to turn barge-in off for the reprompt to make sure the caller hears what we’re saying.
Multislot grammars allow for even more possibilities. This is one area with clear guidelines. If the original question fills more than one slot, take whatever you get and fill the others later. If you’re asking for a departure date and destination city, write the grammar to accept either or both, adding a follow-up question if necessary. Second, if the recognizer comes back with a complete no match, you can back off the multislot prompt and ask for one thing at a time.
Most recognizers don’t return what they think was recognized when the confidence is low. But that information may still be of use. Let’s go back to our airport example. You might want to change the error handling based on what the caller was trying to say. For example, if the recognizer returns an easily recognized phrase like Indianapolis Indiana with low confidence, the caller probably didn’t actually say that. It was more likely something out of grammar or background noise that the recognizer matched to something in the grammar. But if it returns I-N-D (a harder to recognize phrase) with (not surprisingly) medium confidence, it’s probably a valid response. In this case, you could now prompt the caller to say the city and state instead of just the airport code.
Slotting Allowances
Along those same lines, passing back prefiller and postfiller as slots can help. Let’s say that in response to our airport prompt, Where are you leaving from? the caller says, I’m leaving from DFW. The grammar will, of course, accept this prefiller of I’m leaving from based on the prompt. But we as designers tend to throw that away. If instead the grammar was written with a prefiller slot and an airport slot and written to return either or both, we would know from the successful filling of the first slot that the caller was genuinely attempting to answer the question.
We said we were going to focus on no-match and no-input situations, but allow us an interesting exception. Date collection brings with it endpointing issues. It can be hard for the recognizer to distinguish between February fifth and February fifteenth, triggering a false accept error. Consider this healthcare example: The caller is requesting a claim for service on February 15. The system comes back and says it couldn’t find anything for February 5. Would the caller like to try another date? If the caller says yes, a reprompt could be written to say, To help me understand you better, please say the date like this: the 12th of April. Instructing the caller to put the day before the month eliminates the endpointing issues and increases the likelihood of correct recognition. The caveat is you have to be able to effectively communicate the format to the caller.
A special case of a no-match error occurs at the first prompt of a call. Almost all calls start with some kind of greeting and maybe a monitoring message. Often barge-in is turned off for these, and callers may be continuing their side conversations while these messages go on. Then we turn barge-in on and hit them with the first question. If they’re still talking to somebody else, they have completely missed all but the first couple of syllables. They’ll quickly realize what’s happened. At this point, it’s probably best to pretend like nothing happened and simply repeat the first question as originally written.
The bottom line is that the type of error should guide the reprompt. If the caller was trying to play along and just wasn’t recognized, a very short reprompt is enough. If he doesn’t know how to respond, then examples or further explanation are warranted. The trick is differentiating between them. And as noted earlier, the more information you have available to you, the more variation you can have in your reprompts. At some point there are probably diminishing returns to writing all these variations. The goal, after all, is to get the caller successfully through the prompt. If a single version can do that for all situations, that’s the way to go. You just have to find that magical single version. ˝
Editor’s Note: In the next installment of this series, another team of VUI designers will provide ways to get beyond error messages by avoiding the tendency to harp on errors that can be resolved later in the call flow.