December 7, 2020
By Deborah Dahl Principal - Conversational Technologies
Standards

A Tangled Web of Intelligent Assistants

According to Voicebot.ai, as of April, nearly 90 million U.S. adults had smart speakers, and the number is growing fast. Intelligent assistants like Amazon’s Alexa, Google Assistant, Microsoft’s Cortana, and Apple’s Siri are becoming increasingly commonplace. They simplify everyday activities like checking the weather or finding out quick facts, especially when we’re on the go. But there’s an incredible but largely untapped potential for more complex, enterprise-specific applications.

Think about things that you currently turn to the web for—shopping, product support, order tracking. Yes, if you bought something from Amazon, you can locate your order by asking Alexa, but not if you ordered from somewhere else.

Not just businesses but governments and nonprofits could potentially offer intelligent assistant apps. These could give users relevant information like where to renew your driver’s license, how to register to vote, where to go for COVID-19 testing. Applications internal to organizations could offer human resources information, employee directories, building closures. Intelligent assistants would be a quick, frictionless way to get all of this information. So where are they?

If you’re reading this, I think you know my answer will pertain to standards. Here’s where they can help.

Discovery: In the early days of the web, there were no search engines—if you wanted to see a website, you had to know the exact web address. That meant that you had to know that it existed, and you had to know the kinds of information it provided. This is similar to where we are with intelligent assistants. If you know the application you want, you can ask the assistant, but you can’t expect the assistant to find an application for you based on just your spoken request. And if there’s no application on the platform that meets your needs, you’ll have to try another platform.

Intelligent assistant standards could be very similar to web standards—for example, the Domain Name Service (DNS) standards that let web browsers find web servers. When a user makes a request of an assistant, the assistant could pass on the request to another, more specialized assistant that can fulfill the request, just as a browser does. For example, if you want to order takeout, your assistant could help you find a restaurant and then pass on your request to the restaurant’s assistant. Then that assistant will help you find food, schedule delivery, and collect your payment information.

But discovery isn’t enough. Intelligent assistant applications must be able to work together. Again, we can compare this to the web. Once a relevant website is located, any browser can display it. This is because web pages are written in standard languages, primarily HTML, that every browser knows how to interpret. Some browsers will be superior to others because they are faster, more secure, or have more configuration options, but they can all display standard web pages. So we also need…

Interoperability: An application for intelligent assistants should be able to work with any intelligent assistant, just like web pages can work with any browser. Right now, all of the intelligent assistants have their own formats for training data and processing results. This means that applications that are accessed from Alexa, Google Assistant, Cortana, and Siri need to be developed independently. If we could agree on standard formats for intelligent assistant data, this would enable the same application to run on all of the assistants. That could cut down development effort by roughly 75 percent, a huge savings.

Interoperability could be accomplished if the companies that provide assistants use the same formats, but this would mean that the companies would need to agree on formats. Getting this kind of consensus might be difficult. But there is another way: Interoperability could be achieved through third-party middleware that translates between formats, or through developer tools that automatically produce different assistant formats from a generic application design.

How to find out more? There are ongoing standards activities aimed at achieving a web-like open ecosystem for virtual assistants. Here are two:

The Open Voice Network is “dedicated to making voice assistance worthy of user trust—especially for a future of voice assistance that will be multi-platform, multi-device, multi-modal, and multi-use.”

The Voice Interaction Community Group of the World Wide Web Consortium works on developing an interoperable architecture for virtual assistants. The group has also published a proposal for a standard format for representing the results of processing—JSON Representation of Semantic Information (JROSI).

Please look into these organizations, read their proposals, and join their efforts!

Deborah Dahl, Ph.D., is principal at speech and language consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group. She can be reached at dahl@conversational-technologies.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

A Tangled Web of Intelligent Assistants

Gladia Launches Solaria, a Multilingual Speech-to-Text Model

aiOla Launches Jargonic Speech Recognition Model

Northeastern Researchers Develop AI App to Help Speech-Impaired

Amazon Launches Nova Sonic, a Gen AI Model for Building Voice Applications and Agents