Speech Technology Magazine

 

The Case for Active Listening in Speech Recognition (Video)

SpeakEasy AI CEO Frank Schneider explains how AI and active listening improve ASR accuracy in this clip from SpeechTEK 2018.
Posted Sep 21, 2018
Page1 of 1
Bookmark and Share

Learn more about customer self-service at the next SpeechTEK conference.

Read the complete transcript of this clip:

Frank Schneider: What we have challenges with is that noise, accents, similar sounds are killing us. Any kind of data. So I don't mean homophones, I mean things like lets recognize speech versus let's recognize speech. Or I want to open a home loan. I'm from Philly. I have a really bad Philly accent and I can't even make fun of my own accent, but I'm always going to sound like “I want to pen Home Alone” as opposed to “I want to open a home loan” for a bank. If you can pare down the gap between transcription and intent, understanding without having to push it, is the holy grail.

So, the question that we have is, “Can I listen better?” And what we've tried to do is try to actually develop an AI that could do this.

When I was at my chatbot company, were often asked, “Get this thing to work in my IVR.” We were always having that fight over how do we get it to work over there.

We learned that when you develop something that, from a listening perspective, can be pared down, not for accuracy's sake, but for improvement's sake, because these things change so quickly like digital chatbots from a programming standpoint, how can you change the next time someone comes in tomorrow quickly without a long-life IVR cycle.

We needed to develop an algorithm or a code that can have active listening as the intent is what we started to talk about internally. Active listening, which is a human term, is listening that has understanding as the prime objective. The prime objective is not to respond. The prime objective is not to give you your answers. We've always tried to develop our platform and our consulting practice on our ability to listen to and consume as much voice as possible and use that to inform decisions and not necessarily interact with the customer.

A lot of the AI we've been working on is AI that's been primarily put together to understand and not reply.

When you think about what's said and what the person wants, one thing we always do when we're looking at a deployment is whether we’re looking for a court stenographer or a counselor. So we had to kind of understand do we want someone to just type it out or actually listen and read back and confirm acknowledgement to get to a resolution.

We want to make sure that we know where they've been able to maintain context and memory. When it’s one of those “50 First Dates” scenarios--what can I use or who can ask to prepare to understand them? Tobias was talking about Jeff Bezos and it's like, you have a bot or an AI for Yelp, and that's where you go for the best restaurant recommendation. If I want to talk about movies, you would talk to a friend who's really into cinema and knows all kinds of different screenplays and things like that. It's the same way with AI.

We're trying to have this thin layer of understanding, an AI helping guide. So, if you're listening, how many things can be on the line if you will while you're listening.

It can be more than just Google. Everything they possibly could have on the Google screen will be there listening, potentially, or in Google's case it's more still transcription and push, and we'll be there when we push.

For us, we're trying to say, they're having two million monthly chatbot conversations right now, and I think I know who their vendor is. That should be on the line when they call the IVR and not just that. So, how many things can you bring into what's unsaid in that conversation?

Page1 of 1