-->

Best Practices for Bringing Voice Assistants to Mobile Apps

Article Featured Image

Voice assistants have become an important tool for most companies, not only in call centers but also in mobile apps. In fact, for consumers today who are glued to their smartphones virtually 24/7, the expectation that they can interact with their preferred companies via voice is high. Companies that can meet these expectations in a user-friendly way will have an edge.

Voice assistants are being used for both input and output. As an input, voice commands can be used to navigate apps, search for information, dictate messages, and perform specific actions within the app. From an output standpoint, apps can provide audio responses, read information aloud, and offer voice-based alerts and notifications.

Big companies have been quick to adopt voice assistants to offer consumers ease and convenience when interacting with them via mobile apps. Some of the best-known and most widely used include the following:

  • Google Assistant. Powered by artificial intelligence, Google Assistant lets users interact with their smartphones (both Android and iOS) to send text messages, check the weather, get driving directions, and more. It even integrates with other apps and services so that users can do things like order food, schedule ride shares, or search for local businesses or other types of information.
  • Starbucks unveiled its mobile voice assistant back in 2017, allowing customers to order via voice. The app can understand and process the most detailed custom requests—like “a double upside-down macchiato half-decaf with room and a splash of cream in a grande cup.” It also lets customers process payments and provides a pickup time.
  • Bank of America’s Erica app lets customers access and manage their accounts, research investment opportunities, receive notifications of charges and bill reminders, and more. It’s considered one of the most innovative applications.

Voice interactions like these aren’t new. After all, look at the top two tools that made voice interaction a household experience: Apple introduced Siri in the iPhone in 2011, and Amazon introduced Alexa in its Echo devices in 2014.

What is new, though, is how companies and consumers think about voice interactions. And even early adopters of voice interactions in mobile apps are continuing to evolve and improve.

The Evolution of Voice in Apps

“If you go back more than a decade to the launch of Siri and Alexa, voice got really, really hot for a short period of time,” says Derrick Johnson, CEO of Encounter AI, an AI startup for drive-through restaurant ordering.

Back then, “everyone was talking about voice, and companies started building Alexa skills,” says Johnson, a former voice engineer at Roku. But it’s a trend that really didn’t take off, he adds. “It didn’t really do what they thought it was going to do for their brands, for efficiency, or for consumer engagement. It just didn’t pan out.”

Voice technology has gone through a few generations, Johnson continues. First-generation apps focused on simple FAQ-style interactions, like asking for the time or the weather. The second generation could handle more complex or compound requests, like asking for information on movie showings and times at the local theater. The current generation of voice interactions features human-level interfaces with faster response times and more natural discourse.

Today, Johnson says, a breakthrough is happening where the technology is getting much better in terms of transcribing and interpreting what’s being said, and “people are starting to realize that the voice-to-voice experience is actually quite flawed.”

Voice offers big benefits in terms of access, Johnson notes, “because we can speak three to five times as fast as we can type.” It’s a more efficient way of interacting. But it’s not voice-to-voice interactions consumers are really looking for.

Apps like Alexa and Siri, while popular, don’t really provide the rich and natural kind of experience that consumers would like—especially from the companies with which they interact most often. And voice-to-voice has limitations that can be clunky and frustrating.

Johnson says: “One of my favorite quotes is from the head of user experience for U.S. Bank, who said: I’ve got 300 functions in my app. There’s no way I can design an interface on a mobile device that allows users to easily find those 300 functions. What I need is a voice interface where the user can just say ‘reorder my checks’ or ‘transfer money to savings.’”

The big breakthrough in thinking, Johnson points out, is that “voice doesn’t mean building an Alexa skill or something on top of Siri. What voice means is actually building a multimodal interface for my mobile application.”

That’s where individual brand apps come into play, allowing companies to better engage and interact with consumers. Doing this effectively requires thoughtful consideration of how voice, text, images, and video can be incorporated into the experience.

Implications of Multimodal Design

When it comes to effective voice integration, multimodal is really where it’s at, says Tobias Dengel, president of WillowTree, a TELUS International company, and author of The Sound of the Future: The Coming Age of Sound Technology. “Integrate voice capabilities within existing mobile apps rather than creating stand-alone, voice-only interfaces,” he says. For instance, allow users to speak commands, but show responses on the screen for faster comprehension.

Dengel says that the multimodal approach is the real breakthrough in voice technology because it addresses the limitations of earlier voice-to-voice implementations and provides a more practical and efficient user experience.

Johnson also recommends focusing on multimodal experiences, integrating voice capabilities within existing mobile apps rather than creating stand-alone voice-only interfaces. He recommends forming cross-functional teams that can design end-to-end customer experiences incorporating voice.

Until recently, two major barriers have hindered the integration of voice capabilities into apps, according to Dengel. The first was technology limitations, and the second was a lack of multimodal design vision or capabilities.

“Over the last 24 months, the technology needed for voice capabilities has made huge strides, especially with local on-device transcription and large language models,” Dengel says. “Historically, relying on cloud-based transcription resulted in slower processing and higher error rates. However, on-device processing supported by LLMs now offers near-real-time accuracy. Additionally, LLMs excel at interpreting human intent, addressing a long-standing challenge in voice technology implementation.”

Breakthroughs with design are also on the horizon, Dengel says. He points to Apple as a design innovator. Apple’s announcement of Apple Intelligence, he says, “signifies they believe the technology is ready to make multimodal voice interaction part of its operating system (OS) that will be available to developers later this year. This will unleash voice-powered apps, and we believe Android will work quickly to bring these advancements to that platform as well.”

The key for organizations considering creating or updating voice assistants in their mobile apps is to focus on the user experience (UX), understanding the ways that voice can both enhance or detract from the experience.

Crafting the Use Case

Companies interested in integrating voice technology should begin designing multimodal experiences for the most straightforward use cases, Dengel says. These should go beyond the basics to consider the kind of detailed searches customers might be conducting. For instance: “Give me one-way flight options from D.C. to Philly on November 12,” or “I need to order a new light bulb for my GE 350 CFM stove hood.”

How do your customers currently interact with your organization through traditional channels like the call center or newer ones like social media? What are the commonly asked questions, or most common searches? Dengel recommends going through call logs to determine why people are calling and why they access customer service most often.

Look for scenarios where voice input is significantly better than touch interfaces, he says. He recommended targeting common use cases like the following:

  • Frequent actions for which users want shortcuts, like ordering coffee the same way every morning.
  • Rare actions that are hard to find in app menus, like reordering checks.
  • Tasks requiring extensive data input, like booking flights based on specific criteria.

Keep in mind, Dengel points out, that customers aren’t the only potential users of voice technology. The employee audience also offers opportunities for streamlining processes and improving service through voice-assisted apps designed specifically for their needs.

Companies also need to think carefully about both user needs and potential concerns, offering options that address both. For instance, Dengel points out, be mindful of privacy concerns. Some users might prefer typing over voice for sensitive information.

“There are some things that you’d prefer to type vs. talk out loud,” Dengel says. This might be the case when interacting with organizations like healthcare providers or when providing credit card or Social Security numbers. The data for existing customers should already be on hand, but new customers would need to provide that information. Make sure to be considering the varied needs of different audiences and offer options that meet those varied needs.

From a technical implementation standpoint, Johnson notes that a handful of providers, his company included, provide modular JavaScript software development kits that can be plugged into existing mobile applications. “You then need to have what’s called a voice interaction layer that’s normally in the form of an SDK, where someone can integrate it and then it gives you that full duplex conversational capability.”

Most companies, Johnson says further, have already built APIs to talk directly to their customers while offering a lot of functionality.

Company apps are built on existing data they likely have on their websites or in their content management systems and other systems. Companies likely already have much of the needed functionality in their apps, and voice interfaces will need to tie into existing APIs. But carrying out accurate and up-to-date data preparation is critical to ensure accurate interactions and responses, both Dengel and Johnson say.

Then it’s important to do the following:

  • Implement robust data validation and quality control processes.
  • Ensure that data is clean and scrubbed to contain only the most relevant and current information.
  • Remove outdated procedures or policies, because machine learning engines need to rely on only the most recent versions.
  • Continually monitor and evaluate the performance of voice assistants to help identify and address issues.

Cross-functional teams, including designers, data specialists, and developers that might be both internal and external, will be an important part of this process. When working with external providers, Dengel stresses the importance of making sure you choose a partner that’s done it before and having a clear understanding of how you want customers to perceive your brand and the outcomes you want to achieve. Make sure you have good data, you’re choosing the right problems, and your voice assistant meets your customer or user dynamics and how you want your brand to be perceived.

Finally, have metrics and measurements in place to continually monitor success and drive continuous improvement. Customer engagement with the voice assistant and satisfaction will be two important considerations. In addition, Johnson suggests, since efficiency is an important reason for implementing voice assistance, speeding up request processing time is a key metric. He tells of a bank mobile app in which it took 10 to 11 minutes to open an account; the bank’s goal using voice and a reimagined user flow was to get that down to 2.5 minutes. That provides both a better user experience and allows the organization to process more requests. The length of time required to complete tasks can also be measured and compared across interfaces—for instance, call center vs. app.

Dengel also notes that some of the same metrics companies currently track in terms of customer service would also apply as these transactions are moved to voice.

Experts also advise that since technology and user needs are continually evolving and changing, it’s important to consider how these shifts might affect functionality in the future.

Future Trends and Opportunities

Right now, Dengel says, users represent “a bathtub curve.”

Users under 15 years of age, he says, are incredibly heavy users. But so are users over 65. That might come as a surprise, but older users “never really had to learn to type the way that the rest of us did in our 20s and 30s,” he explains. “They were in kind of a pre-typing world. So, particularly on mobile devices, they find typing incredibly slow and frustrating. They’re happiest just using their voices.”

Recognize, too, that voice, and even modular voice, won’t be adopted by everyone. The investment in the technology and implementation is an add-on rather than a replacement for other ways consumers interact with your company

“These technologies all require an adoption period,” Dengel points out. “For some period of time you need the old technology. You can’t just abandon it.”

Dengel compares the current state of voice technology to the early days of mobile adoption, suggesting that companies that embrace voice interfaces now will gain a significant competitive advantage. He projects that the combination of improved AI technology and multimodal interfaces will lead to widespread adoption and new business opportunities, similar to the impact of the iPhone on mobile computing.

Now is the time to consider how your company could implement or upgrade voice assistant mobile app interfaces to deliver an exceptional user experience that delights and engages customers, prospects, and employees,, experts agree. x

Linda Pophal is a freelance business journalist and content marketer who writes for various business and trade publications. Pophal does content marketing for Fortune 500 companies, small businesses, and individuals on a wide range of subjects, from human resource management and employee relations to marketing, technology, healthcare industry trends, and more.

SpeechTek Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues