-->

Can VUI and GUI Survive an Interface Marriage?

Article Featured Image

Overcoming Interface Obstacles

Many believe consumers will welcome the pairing of the two interfaces. "I see [VUI and GUI] coming together gradually more from the perspective of mobile customer-facing apps because customers want more self-service; they want more options," says Ahmed Bouzid, CEO of ZOWi and former senior director of product and strategy at Angel.

However, there are some careful considerations to make when pairing the two interfaces.

Greg Pal, vice president of marketing, strategy, and business at Nuance Communications' enterprise division, points out that, historically, GUI designers have been responsible for Web sites with a lot of screen real estate and now have to get used to shrinking down functionality for mobile devices. "From a voice user interface perspective, ironically, you're actually moving in the opposite direction. You're going from an input modality where there's no visual support whatsoever. It's entirely auditory."

"The challenge is that you're trying to bring together totally different ways for the customer to interact," says David Lloyd, president and CEO of IntelliResponse, a provider of virtual agent technology solutions for the enterprise. "It forces the perspective about creating a compelling customer experience. There are two approaches that people seem to be taking, instead of a holistic one."

Experts suggest that, above all, the customer needs to come first, and multimodal design should be simple and intuitive, no matter who designs it.

According to Susan Hura, principal at SpeechUsability, from a user perspective, both VUI and GUI designers need to be more abstract in the way they think about design. She states that there are certain tasks designed to enable users to accomplish automation; for visual designers, they may be drawing screen shots. VUI designers may do something similar, creating what she calls "flow diagrams."

"Rather than jumping straight in at that level, it would be helpful to abstract the design and think of things like, how would a user check [her] account balance or make a flight reservation? If you can think about what a user's goals are, you can think about what information is needed from the user and what information the system needs to report back to [her]. This will allow you to make smarter decisions [and say] 'This is the information that we want to speak to the caller' versus 'This is the information that we want to display visually,'" Hura says.

Patrick Nguyen, chief technology officer at [24]7, suggests that with multimodal capabilities, consumers are becoming more empowered than ever. "We're moving into a world where customers have more control, they tend to switch devices, and they can use multiple channels to get things done," he says.

"In that environment, the enterprise needs to be much more responsive to the way consumers want to interact with their brands."

To do so, however, requires organizations to create what many vendors are calling omnichannel environments. The greater presence of these environments enables users to easily switch between the phone, Web, social media, and other channels.

Early Progress

The often heralded voice assistant Siri is a great example of a VUI, but it is not truly multimodal, explains Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium's Multimodal Interaction Working Group. "With Siri, you're just talking to it and it's showing you things, but you can't really interact with it," Dahl says. "When you ask Siri a question it shows you something and you touch your screen, but Siri doesn't know what you've touched. It's like saying, 'I gave you your information and I'm out of the picture now.'"

Thanks to its acquisition of Angel.com earlier this year, Genesys Telecommunications Laboratories has been working on further development of the personal assistant Lexee, which Pelland sees as a good example of integrating GUI and VUI.

"We're at the early stages of people realizing that you can build VUI and GUI using the same development platform," Pelland says. "A lot of times today, when you're doing a voice application, you use one development environment, and if you're trying to do a handset app, you're using a different environment. Tying your voice and data to your graphical interface is a real pain in the neck because you've built two different environments. The key thing is [building] a simple application that has voice and graphical on the same platform, which is what we've done with Lexee."

Dahl singles out AT&T's Speak4it (a speech and gesture recognition search app for the iPhone, iPad, and iPod touch) as a true multimodal app. "You can draw circles on the screen and say 'Show me the restaurants around here,' so it really knows what you did on the screen and you can interact with it," Dahl says.

Another true multimodal example is from OpenStream, which focuses on context-aware mobile app development for the enterprise.

Dahl maintains that OpenStream is good for hands-busy voice use, particularly for field service workers. The company's Mobile Force Automation (MoFA) solution interface uses speech and stylus features. Dahl gives the example of a car insurance adjuster using MoFA while looking at a car that has been in an accident. Using the solution, the adjuster would be able to record and send information from his remote location back to his office or share it with other employees.

Nuance suggests that Nina, its virtual assistant for mobile customer service apps, is a good example of multimodality. Nina Mobile, primarily used by telecoms and financial institutions, enables companies to add speech-based virtual assistant capabilities to existing Apple iOS and Google Android mobile apps. It also combines Nuance speech recognition, text-to-speech, voice biometrics, and natural language understanding solutions hosted in the cloud and understands what is being said as well as who is saying it.

"With our Nina mobile product, one of the key design parameters that we've had from the very beginning is that users should have the flexibility to interact with the system at any given point in that interaction through whatever modality or method makes the most sense," Pal says. "The systems shouldn't dictate to users that they have to interact exclusively through speech or touch; they should be able to mix and match in whatever combination makes sense, or what their environmental situation is. [For example], are they in a public place or is there a lot of background noise?"

SpeechTek Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Microsoft Introduces Cortana, Its Answer to Siri

A Foolish Consistency in User Interfaces

When serving customers, a little change can be a good thing.

The Customer Moved My Cheese

Mulitmodality use and design success means understanding your customer.