Natural Language Processing: The Next Steps
Speech interfaces in which users respond in their own words to open-ended prompts like "How may I help you?" are becoming more and more widely deployed. They are most frequently used in routing applications where the application's task is to identify the topic of the users' requests and transfer them to another part of the system where their requests can be addressed.
These interfaces are generally called "natural language" systems, because users can express their requests in their own words, that is, using natural language. Does the fact that users can use natural language with these systems automatically make them natural? No. They are more natural than directed dialog systems in one respect — the way that users word their requests is more like the way they would word their requests to a human. As Walter Rolandi of The Voice User Interface Company maintains1, the term "natural language processing" can be misleading in that it implies that the system has very sophisticated, human-like, dialog competence that goes well beyond the capabilities of today's systems, and this clearly isn't the case with current systems.
Interacting with these systems isn't as natural as interacting with a person, although, as Rolandi points out, it's probably correct to characterize them as "less unnatural" than Dual Tone Multi Frequency (DTMF) or directed dialog systems. However, the real goal of speech interfaces isn't to make a system natural, but to help users achieve their goals in using a system quickly and accurately. "Naturalness" is a means to that end, not an end in itself. By analogy, even though driving a car isn't just like walking, it helps us satisfy our goals of getting from one place to another very well. The more interesting questions about these systems aren't how natural they are, but have to do with the user experience— how do users interact with these applications, what do they say about them, and what has been learned from deployments about the user experience?
This article explores natural language applications from the end users' point of view. For the purposes of this article we leave aside other important questions having to do with other aspects of these systems, such as implementation. For background information, readers may wish to consult previous articles in Speech Technology Magazine, which have discussed the technology that underlies these systems2,3 the business case for these applications in the call center4 and case studies of these applications in action5.
To begin, consider the following two dialogs from a hypothetical application:
Conversation A:
System: Please state whether you have a question about your account, you would like to report problems with your service, or if you would like to set up an account.
User: Service.
System: Are you having problems accessing the Internet, problems with email, or are you having a different problem?
User: Accessing the Internet
System: Has a technician looked at your problem?
User: Yes. (transfer)
Agent: I understand you're having problems with your Internet service.
Conversation B:
System: How can I help you?
User: I'm having problems with my Internet service. The technician was here yesterday and it worked for a while after she left, but now it's not working anymore. (transfer)
Agent: I understand you're having problems with your Internet service.
Conversation A represents a directed dialog. If the set of possible problems is large, the process of working through the deeply nested tree of possibilities is tedious, and any errors in selecting a choice can lead to wasted time on the part of both customers and agents. Conversation A requires that users mentally factor their goals into the concepts, terminologies, and choices offered by the dialog, whether the choices offered match what the user actually wants to do or not. Moreover, this has to be done even though in most cases the user has no way of anticipating the choices that will be offered in upcoming menus. I encountered this problem in a DTMF telephone application, which I recently called with the goal of canceling an appointment. When I called I was offered seven options (some with up to four suboptions), none of which referred to canceling or rescheduling an appointment. I realized that the menu design suffered from the common problem of failing to include choices that might be considered negative. I hung up, called back, and selected "scheduling an appointment." This, as it turned out, was the correct option for canceling an appointment. So I had to map my goal, "canceling an appointment," into the application's option, "scheduling an appointment," It would have been much easier if I had just been able to say "I need to cancel my appointment" at the outset.
In contrast, Conversation B avoids these problems by enabling users to state their requests in their own words. Conversation B is an example of a natural language processing application, because users are able to state their requests in natural, unconstrained, language, without having to conform to an application's structured menus.
How Callers Interact
Users exhibit a variety of reactions when they interact with natural language systems, particularly when they encounter an open-ended prompt for the first time. In one study performed by West, about 10 to 20 percent of callers at first zeroed out or didn't respond at all to the open-ended prompt. In this study, a second, follow-up prompt that included an example of how to speak to the system was able to reduce nonresponses to one to three percent. On the other end of the spectrum, about 10 percent of users responded with lengthy, sometimes rambling utterances. These responses often contain multiple requests, or utterances that seem unrelated to any possible user goal in that application. In many cases these rambling responses can actually be handled by the application. In any case, they demonstrate that in a significant percentage of cases users feel free to express themselves in very open-ended language.
Here are some examples of utterances collected by the vendors I talked with. All of them were spoken in response to open-ended prompts, such as "How may I help you?"
Spoken to Intervoice applications:
1. "I need to get a payoff and also the residual value of my car."
2. "I need to, well, you know, the lease is up and I need to make arrangements to return the car."
Spoken to another vendor's systems:
1. "My dial tone has that stutter. And, yet, when I called the message center I have no messages, but every time I pick up the phone to make a call I hear the stuttering. When I call the message center it says you have no messages. I don't know how to get rid of that stutter, temporarily. I like that I have it."
2. "I wanna take the block off a my telephone. My daughter can't leave an answering… leave her, her message. Somebody put a block on my telephone the last couple of days."
3. "Repair SBC was out here today at this community, at the mobile home park, and they were rerunning some wires and now mine doesn't work [says phone number] and I need somebody to come out here and take care of it a s a p. I am the manager of this park."
Spoken to a Nortel System:
1. "I just eliminated a swimming pool and I wanted to talk to someone about it."
2. "I have half power and no power."
3. "I've been on in and out of the hospital and I know I'm late on it and I'm… I'm… I'm wondering, I'm out of the hospital now and they finally took my cast off, but I still can't work and I can't walk and I'm wondering…."
4. "Oh, I'm calling now to report that there's no light here in this little corner of 2474 South 11th Street on this side of the Cowah and there's no, no, no power, no light, or nothing and all the homes are in the same position."
5. "I need to build a nuclear plant."
What Users Say
West reports that in its usability testing users preferred speaking naturally as opposed to using DTMF. Users seem most pleased with the efficiency of natural language systems and interestingly, user comments almost exclusively focus on the overall positive experience of using the system rather than the specific contribution of the natural language processing technology itself. In other words, they're focused on achieving their goals with the system, not on how much using the system is like interacting with a person. The most commonly reported negative experience has to do with confusion about what to say in response to the open-ended prompt. The next section will describe some suggestions for helping users know what to do.
Enhancing the Experience
In this section we look at when to use natural language systems, how to get users to know what they can say, and suggestions for handling uncertain results.
The first question about the user experience in a natural language application is whether it's the right choice for a user interface in a particular application. Sometimes directed dialog is the better user interface choice. Most of the vendors I talked to agreed that when there are only a few possible destinations, directed dialogs are preferable because it's easy to tell users what the possibilities are. When there are 15 or more possibilities, it makes more sense to use an open-ended natural language system*.
Manish Sharma of Nortel found in "lopsided" applications—where most calls fall into one of only a few categories, but where there are a large number of total categories— it works well to start out with a directed dialog to handle the top few categories directly. If the user selects the "other" category, Nortel found that a natural language system could then be used to assign the request to one of the more numerous, but less frequent, remaining categories. He also maintains this finding was based on an analysis of thousands of calls, not just developers' intuitions. He highly recommends that answers to these design questions be based on analysis of actual caller behavior.
Jerry Carter of Nuance cautions that combining natural language with directed dialog must be done with care, however. He states if callers have learned that they can give natural language responses, they are 20 to 30 percent more likely to provide natural language responses to subsequent directed dialog prompts.
Under some circumstances, natural language routing can outperform agents. Chris Nichols at Intervoice describes a deployment at a large consumer products company with a complex set of products, where 45 percent of calls routed by agents were misrouted. The natural language system, which was implemented by Intervoice to perform this task, correctly routed 93 percent of calls after tuning. The relatively poor performance of the agents can be ascribed to high agent turnover as well as the complexity of the set of destinations. Since misroutes clearly interfere with the efficient achievement of the users' goals, we can surmise that the automated system probably resulted in better user satisfaction than the agents in this case.
Knowing What to Say
Although most people respond to open-ended prompts with an utterance that can be successfully processed, the biggest user interface issue that developers report is confusion about how to respond to an open-ended prompt. This may be because users are familiar with the directed dialogs they've encountered in previous experiences with applications.
This is the case not only with IVR applications but also with agents who are many times following a very directed script. West and Intervoice cited giving callers examples as one way to help them express their requests. However, giving examples doesn't necessarily work well in every case. West had success giving examples to users in troubleshooting applications, but has not found the need to do so with all call-routing applications. West also had success with giving users examples for just the first one or two times they call, and then omitting the examples as users become used to the interface. Intervoice also reports that tracking the number of times the user has called and changing the system's behavior for repeat callers is effective. Finally, Walter Rolandi found that users are better able to provide specific responses to an opening prompt such as "Tell me what you would like to do" than with "How may I help you?" because it (1) informs users that they can speak (2) makes users think of their requests in terms of doing something, and (3) increases the probability that the response will take the form of "I want/would like/need to (do something)."
Handling Uncertain Results
Handling uncertain results in all speech systems, whether based on natural language or directed dialog, is always important.
Several techniques were recommended for handling uncertain results in natural language systems:
1. Reprompt with an example
2. Switch to a directed dialog
3. Transfer directly to an agent
We still need more experience with these applications to fully understand the best ways to apply these alternatives in systems.
Advertise Automation
Finally, it's important to make sure that the users know they're talking with an automated system. David Attwater of the Enterprise Integration Group recommends an initial prompt such as "Hello, this is the automatic operator, how can I help you?" He notes, "discovering the deception deeper in the call was detrimental to customer service and brand expectations."
When All Is Said and Done
Are natural language systems natural? Certainly what users can say is more natural than what they can say to a directed dialog system, but interacting with a natural language system isn't like interacting with a person. The real criterion for automated systems is not naturalness, but how well they enable users to accomplish their goals. If systems that are more natural can do this, then they represent a valuable addition to current speech technologies. They don't have to be "just like talking to a person" to be successful. Judging from what users do with these systems and what they say to them, natural language systems are very helpful in getting users to accomplish their tasks quickly and easily.
Deborah Dahl is a consultant in speech and natural language technologies and their application to business solutions, with over 20 years of experience. Dahl is also involved in speech and multimodal standards, service as the chair of the W3C's Multimodal Interaction Working Group. She is the editor of the recent book "Practical Spoken Dialog Systems."
References
1 W. Rolandi, "What's Natural about Natural Language Processing?," in Speech Technology Magazine, vol. 9, 2004.
2 D. A. Dahl, "Is Natural Language Real?" in Speech Technology Magazine, vol. 9, 2004, pp. 34-36.
3 B. Pollack, "2006: The continued emergence of natural language speech applications," in Speech
Technology Magazine, vol. 11, 2006, pp. 22.
4 B. Suhm, "Lessons Learned from Deploying Natural Language Call Routing," in Speech Technology Magazine, vol. 9, 2004, pp. 10-12.
5 "Troubleshooting with Speech," in Speech Technology Magazine, vol. 9, 2004, pp. 8-10.
*While fifteen or so is the minimum number of destinations, we don't yet have clear data on the maximum number of destinations—some vendors report success with natural language systems with over 100 possible destinations.