Video: Current Challenges in Enterprise Speech Tech
Learn more about enterprise speech technology implementations at the next SpeechTEK conference.
Read the complete transcript of this clip:
Ellen Juhlin: One of the challenges that we've seen are, speech-to-text is still what I call Starbucks-style name recognition. So, I'm "Allen" everywhere. That I say that to a computer, and at Starbucks. And when we're working in an environment that is person-to-person, like, "Hey, Bob, can you go get another ladder for us?" You know, "Bob" is pretty easy, but "Nivedita" is a harder one.
In areas where we're trying to convert names to commands or have a computer act for a specific person, you know, that's one of the areas that's still, it needs to catch up a little bit. And just dealing with people who don't, aren't as familiar with smartphones and using smartphones. You know, when working with task workers, smartphones are scary, and make them feel dumb. And so, asking them to know about Bluetooth or apps makes them feel challenged. Creating a specific device smooths that over a little bit, but it's still, anytime that you bring up an app, you have to be conscious of which workers you're asking to use that.
Cory Treffiletti: So, I agree with all that, but I'll give you a different spin, which is its expectations. Two things come to mind.
The first one is that, when we first launched the product and we talked about the AI transcribing a conversation, and recognizing intent, extracting information, people's expectations are that they should be able to walk out and everything is gonna be absolutely perfect. And that's not possible, because audio quality, so many other things have an impact on whether or not the machine can even hear what it's trying to transcribe, and it can identify the right kinds of intent. So, that's one thing.
The second thing is that there's also this expectation that when people see what they've said in writing, that they're like, and we use this analogy, but they're like Martin Luther King in how great they spoke and how eloquent they were. No one speaks like that. So, what happens is, a lot of times, you'll look at a transcript for a conversation, and you'll look at it and say, "This is crap. This is horrible." Then, you'll go listen to the audio, and it's verbatim, 100% exactly what was said, with five people talking over each other, interrupting each other, and saying "Um, you know, uh, yeah, like" a lot.
What happens is that people, what they hear, what they expect, are misaligned. You have to take that and also put that into effect with, when we first launched, our expectation was that people understand Alexa. They will use voice commands in a meeting.
Then we found that to interrupt the flow of a conversation to use voice commands is really awkward, so let's swing the pendulum the entire other way, have the AI do all the intent recognition. And there was too many false positives, and too many things it was capturing that aren't really as important.
So, we ended up shifting back into the middle, where it's a combination of explicit and implicit commands that are being given in a conversation. This is all stuff that we're doing on the fly, because while we're trying to build this, people are getting more comfortable with an Alexa device. People are getting more comfortable with how they speak.
We've seen some interesting things which, just last week, I heard feedback from a user where having the AI in their meetings makes them slow down, makes them think about what they're going to say before they say it, and makes them be much more cohesive in their speech, which is something we hypothesized would happen, but to have someone tell us, unsolicited, "This is what's happening" was interesting.
Allstate Conversational Designer Katie Lower outlines working models for assessing the viability of a conversational interface with multiple teams within an organization in this clip from her presentation at SpeechTEK 2019.
Allstate Conversational Designer Katie Lower defines the customer journey map as a visualization of the customer's process and explains why it's valuable in this clip from her presentation at SpeechTEK 2019.
Grand Studio Lead Designer Diana Deibel discusses the ethical implications of speech UIs and remaining cognizant of the inherent human elements of speech and conversation in this clip from her presentation at SpeechTEK 2019.
Grand Studio Lead Designer Diana Deibel discusses multiple approaches to making VUI design transparent--the Google vs. Alexa, system-initiated vs. user-initiated--in this clip from her presentation at SpeechTEK 2019.
Pindrop Director of Product Marketing Ben Cunningham discusses best practices for voice authentication in IVR design in this clip from his panel at SpeechTEK 2019.
Gridspace Co-Founder and Co-Head of Engineering Anthony Scodary demonstrates Grace, Gridspace's new automonous call center agent, in this clip from his keynote at SpeechTEK 2019.
451 Research Senior Analyst Raul Castanon discusses new findings of a recent survey on speech technology adoption in the enterprise and how adoption of devices in the consumer space have impacted enterprise adoption in this clip from his panel at SpeechTEK 2019.
Grand Studio Lead Designer Diana Deibel discusses best practices for culturally inclusive access in voice UI design in this clip from her presentation at SpeechTEK 2019.
Gridspace Co-Founder and Co-Head of Engineering Anthony Scodary discusses the transactional nature of speech and how that understanding impacts effective, AI-driven call center analytics in this clip from his keynote at SpeechTEK 2019.
Conversational Technologies Principal Deborah Dahl lays out a plan for making more virtual assistants more effective in this clip from her keynote at SpeechTEK 2019.
Conversational Technologies Principal Deborah Dahl discusses the state of the art for the three pillars of conversational systems in this clip from her keynote at SpeechTEK 2019.
Conversational Technologies Principal Deborah Dahl explains how more targeted enterprise knowledge could make VAs more effective in organizations in this clip from her keynote at SpeechTEK 2019.
Nuance Communications' Roanne Levitt delineates the differences between text-dependent and text-independent biometrics and what the advent of text-independent means for IVR applications in this clip from SpeechTEK 2019.