Combining IVR and Smartphones
Customers speak and listen to IVR systems, and they read, type, and click smartphone applications. How can those two modes be combined to enhance the user experience? Following are some simple short-term solutions as well as a comprehensive long-term approach.
Short term: Make traditional IVR applications more effective by using displays on smartphones. Visual prompts and messages have advantages over audio prompts and messages. Visual displays containing graphics and animations tend to be faster and easier to comprehend than verbal messages are. Users can recognize and react to graphics and text faster than they can read and comprehend audio messages.
Graphics and video help users understand how to perform tasks better than textual instructions do. The bandwidth of information flow is greater in multimodal systems than in systems restricted to voice. Visual screens can present more choices than verbal menus, which are usually restricted to three or four. Displaying the n-best list (candidate words rejected by a speech recognition system) makes correcting misrecognitions easier than presenting the n-best list via audio. Many IVR systems would be improved by using visual displays on smartphones.
Vendors like Genesis and VoicePilot have extended VoiceXML 2.0 to support video. VoiceXML 3.0 supports video and images, but no target dates have been set for implementations.
Long term: Evolve traditional IVR applications into a new type of smartphone application. Smartphones are popular because they are portable, convenient, and easy to use. Currently most apps are driven by the user, who touches or clicks visual options. Though few apps support speech, one class could benefit from it: applications that require the user’s hands and/or eyes to perform tasks other than manipulating the phone. I call these consumer product applications because they help consumers use products. These apps would replace user manuals and tutorials.
Consumer product applications can demonstrate to the product’s owner how to assemble, install, operate, debug, and repair a product. These apps would let product owners manipulate the product while listening to instructions, and they enable the owner to navigate the instructions by speaking verbal commands, such as “next step.”
Consumer product applications, which are always available when the product owner needs them and wherever the owner might be, obviate searching for missing user manuals. Consumer product applications are kept updated by downloading revised apps or connecting to the consumer app in the cloud, which gets updated by the product vendor.
In turn, product owners would be able to download up-to-date consumer product applications from application stores in the cloud and connect with the vendor’s help desk only when the product owner faces difficult problems.
Implementation issue: While IVR apps are “system-directed,” smart phone apps are “user-directed.” Thus companies should maintain system-directed IVR applications for traditional phones and cell phones and create user-directed applications for smartphones.
The Apple iPhone, Google Android, Microsoft Windows Phone, and phone Web browsers all require different implementation techniques, meaning each necessitates a unique consumer product application. Several companies provide cross-platform development tools that permit a developer to write a single generic application and transfer it to different smartphones. However, none of these cross-platform development tools currently supports voice technologies.
Recommendations: Several players are needed to create infrastructure for developing consumer product apps:
• IVR vendors should implement VoiceXML 3.0 as soon as the spec becomes stable; VoiceXML will support the implementation of consumer product applications.
• Cross-platform vendors should extend tools to support speech technology so that applications developed with vendor tools could be employed by people whose hands and/or eyes are busy while they use the application.
• Smartphone browser vendors should support the evolving W3C HTML-speech specs when they become stable.
What can consumer product developers do now? Begin by building and deploying a consumer product application for a single popular smartphone platform. That would give you the experience of building consumer product applications and being the first to market with applications that support your product.
James A. Larson, Ph.D., who offers VoiceXML training courses, is a former chair for the W3C Voice Browser Working Group and the program chair for the SpeechTEK and SpeechTEK Europe conferences. Email jim@larson-tech.com.