Detecting Emotion: Prevention Is Better Than Cure
Fascinating Technology
I am not sure how much progress has been made in detecting all possible emotional states in users, but detecting anger can be relatively easy. When people become frustrated (read, "angry"), they often evince some stereotypical verbal behaviors. Angry speakers are likely to exhibit changes in the volume, pitch and pace of their speech. Changes in the length and the latency of their utterances are also likely.
Tell-tale Signs
Volume and pitch are probably the most predictable variables. Research (well corroborated by the common sense experience of "raised voices") has shown that the volume or loudness of speech increases significantly in both males and females when they become angered. Pitch typically increases, particularly in females, as well. These changes are not difficult to detect with software and using these two variables alone can provide a very high accuracy rate in detecting anger in a user's voice.
The other speech variables can provide collaborative evidence of anger. For example, a wide variation in the pace or rate of speech often accompanies anger. This can take either extreme form. The speaker's utterances might become extremely rapid ("to snap") or extremely slow and deliberate:
Normal: I want to make a transfer.
"Snapped": I-said-I-want-to-make-a-transfer!
Exaggeratedly Slow: I----said----I----want----to----make----a----trans----fer!
Similar extremes can be seen in utterance length. The utterance can become much shorter or much longer:
Normal: I want to make a transfer.
Shortened: Transfer!
Elongated: I've told you twice but I'll tell you again! I want to make a transfer! A transfer!
Finally, a dramatic drop in response latency can also signal that the user is angry.
System: What would you like to do?
User: I want to make a transfer.
System: Sorry, I didn't get that. What would you like to do?
User: I want to make a transfer.
System: Sorry, I didn't quite get that ee… (user barges in)
User: I wanna make a transfer!
Note that a single utterance from an angry user might exhibit all of these characteristics at once. If, as above, the user is annoyed by repeated recognition failures, he might simply interrupt the recognition failure prompt by "raising his voice" and shouting, "Transfer!" So timed, this one-word response would demonstrate increased volume and pitch while concomitantly demonstrating decreased length, pace and latency.
Application Opportunities?
While accurately detecting user anger is clearly possible, the value of doing so is less clear. To be sure, it would be nice if an application could sense the displeasure of its user and somehow attenuate it by making socially appropriate conversational gestures. As it turns out, this is not a trivial problem. Being able to say something sufficiently appropriate to disarm someone else's anger is a talent unpossessed by many of us humans and it is unclear to me how the ability might be simulated in machines.
In any event, the value of a "charming and disarming" application would still be questionable. By the time the system detected its users' anger, the damage, it would seem, has already been done. Knowing that a user is angry implies that there is something wrong with the VUI's design or function. Wouldn't it be better to simply design systems that do not anger their users?
Anger Management
What is undeniably clear is that user anger and speech recognition do not mix. Speech recognition accuracy varies inversely with the changes that often occur in an angry user's speech. The angrier the user, the less likely speech recognition accuracy becomes.
To date, anger detection algorithms have been mostly employed in call center systems. The software monitors customer-to-customer service representative conversations for signs of heated emotions. When anger is detected, the call may be brought to the attention of a manager or otherwise elevated in importance.
Similarly deployed, the technology could be very valuable in the IVR world. Detecting anger in the user might trigger a transfer to a "charming and disarming" CSR. On the other hand, such a practice, in the long run, could reinforce emotional outbursts in users if the transfer functioned as a reward.
In the meantime, reacting to user anger should be less important than preventing it.
Walter Rolandi is the founder and owner of The Voice User Interface Company in Columbia, S.C. Rolandi provides consultative services in the design, development and evaluation of telephony-based voice user interfaces (VUI) and evaluates ASR, TTS and conversational technologies. He can be reached at wrolandi@wrolandi.com.