Video: Current Challenges in Enterprise Speech Tech

Learn more about enterprise speech technology implementations at the next SpeechTEK conference.

Read the complete transcript of this clip:

Ellen Juhlin: One of the challenges that we've seen are, speech-to-text is still what I call Starbucks-style name recognition. So, I'm "Allen" everywhere. That I say that to a computer, and at Starbucks. And when we're working in an environment that is person-to-person, like, "Hey, Bob, can you go get another ladder for us?" You know, "Bob" is pretty easy, but "Nivedita" is a harder one.

In areas where we're trying to convert names to commands or have a computer act for a specific person, you know, that's one of the areas that's still, it needs to catch up a little bit. And just dealing with people who don't, aren't as familiar with smartphones and using smartphones. You know, when working with task workers, smartphones are scary, and make them feel dumb. And so, asking them to know about Bluetooth or apps makes them feel challenged. Creating a specific device smooths that over a little bit, but it's still, anytime that you bring up an app, you have to be conscious of which workers you're asking to use that.

Cory Treffiletti: So, I agree with all that, but I'll give you a different spin, which is its expectations. Two things come to mind.

The first one is that, when we first launched the product and we talked about the AI transcribing a conversation, and recognizing intent, extracting information, people's expectations are that they should be able to walk out and everything is gonna be absolutely perfect. And that's not possible, because audio quality, so many other things have an impact on whether or not the machine can even hear what it's trying to transcribe, and it can identify the right kinds of intent. So, that's one thing.

The second thing is that there's also this expectation that when people see what they've said in writing, that they're like, and we use this analogy, but they're like Martin Luther King in how great they spoke and how eloquent they were. No one speaks like that. So, what happens is, a lot of times, you'll look at a transcript for a conversation, and you'll look at it and say, "This is crap. This is horrible." Then, you'll go listen to the audio, and it's verbatim, 100% exactly what was said, with five people talking over each other, interrupting each other, and saying "Um, you know, uh, yeah, like" a lot.

What happens is that people, what they hear, what they expect, are misaligned. You have to take that and also put that into effect with, when we first launched, our expectation was that people understand Alexa. They will use voice commands in a meeting.

Then we found that to interrupt the flow of a conversation to use voice commands is really awkward, so let's swing the pendulum the entire other way, have the AI do all the intent recognition. And there was too many false positives, and too many things it was capturing that aren't really as important.

So, we ended up shifting back into the middle, where it's a combination of explicit and implicit commands that are being given in a conversation. This is all stuff that we're doing on the fly, because while we're trying to build this, people are getting more comfortable with an Alexa device. People are getting more comfortable with how they speak.

We've seen some interesting things which, just last week, I heard feedback from a user where having the AI in their meetings makes them slow down, makes them think about what they're going to say before they say it, and makes them be much more cohesive in their speech, which is something we hypothesized would happen, but to have someone tell us, unsolicited, "This is what's happening" was interesting.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Video: How to Assess if a Conversational UI Is Right for You

Allstate Conversational Designer Katie Lower outlines working models for assessing the viability of a conversational interface with multiple teams within an organization in this clip from her presentation at SpeechTEK 2019.

06 Sep 2019

Video: How to Map the Customer Journey (and Why)

Allstate Conversational Designer Katie Lower defines the customer journey map as a visualization of the customer's process and explains why it's valuable in this clip from her presentation at SpeechTEK 2019.

30 Aug 2019

Video: Implications of a Speech UI

Grand Studio Lead Designer Diana Deibel discusses the ethical implications of speech UIs and remaining cognizant of the inherent human elements of speech and conversation in this clip from her presentation at SpeechTEK 2019.

21 Aug 2019

Video: Enabling Transparency in VUI Design

Grand Studio Lead Designer Diana Deibel discusses multiple approaches to making VUI design transparent--the Google vs. Alexa, system-initiated vs. user-initiated--in this clip from her presentation at SpeechTEK 2019.

16 Aug 2019

Video: What Is the Minimum Amount of Speech for Authentication?

Pindrop Director of Product Marketing Ben Cunningham discusses best practices for voice authentication in IVR design in this clip from his panel at SpeechTEK 2019.

08 Aug 2019

Video: Demo: Gridspace Grace Autonomous Call Center Agent

Gridspace Co-Founder and Co-Head of Engineering Anthony Scodary demonstrates Grace, Gridspace's new automonous call center agent, in this clip from his keynote at SpeechTEK 2019.

02 Aug 2019

Video: Emerging Trends in Speech Tech Adoption

451 Research Senior Analyst Raul Castanon discusses new findings of a recent survey on speech technology adoption in the enterprise and how adoption of devices in the consumer space have impacted enterprise adoption in this clip from his panel at SpeechTEK 2019.

19 Jul 2019

Video: Current Challenges in Enterprise Speech Tech

Video: How to Assess if a Conversational UI Is Right for You

Video: How to Map the Customer Journey (and Why)

Video: Implications of a Speech UI

Video: Enabling Transparency in VUI Design

Video: What Is the Minimum Amount of Speech for Authentication?

Video: Demo: Gridspace Grace Autonomous Call Center Agent

Video: Emerging Trends in Speech Tech Adoption

Video: How to Make Your VUI Inclusive

Video: How to Interpret the Transaction in Every Conversation

Video: 6 Ways to Improve VAs via Better Language Understanding

Video: The Current State of Conversational Systems

Video: More Targeted Knowledge Can Improve Today's VAs

Video: How to Leverage Text-Independent Biometrics

Gladia Launches Solaria, a Multilingual Speech-to-Text Model

aiOla Launches Jargonic Speech Recognition Model

Northeastern Researchers Develop AI App to Help Speech-Impaired

Amazon Launches Nova Sonic, a Gen AI Model for Building Voice Applications and Agents