Video: The Current State of Conversational Systems
Learn more about conversational systems at the next SpeechTEK conference.
Interested in attending a SpeechTEK event? Visit SpeechTEK.com to sign up for conference alerts, discounts, and more.
Read the complete transcript of this clip:
Deborah Dahl: I looked at what you might call the three pillars of conversational systems: speech, understanding and conversation. The speech recognition and text to speech, natural language understanding and generation, and conversation or dialogue. We'll take a look at each of those, going over the state of the art very briefly to concentrate on the future. So, where are we now? Speech recognition works very well for the best cases; the quiet environments, the English, the careful speakers, the adults.
I don't know if anybody went to Jeff Adams' presentation yesterday, but he gave a really nice layout of how different factors in the speech environment affect the success of speech recognition. So, going to the easiest possible situations, speech recognition works very well. And going to the more difficult situations, the noisy children talking over each other, speaking multiple languages, that kind of really hard situation where we're very far from being able to do that.
I'm gonna assume that everyone has interacted with Alexa and Siri and Google Home. Has anyone not? Okay, they're pretty ubiquitous now. So most of the examples I'm gonna give are from Alexa and Siri. We'll see cases where they do a very good job, and cases where they don't. So, moving on to the second pillar of conversational interaction, natural-language understanding. They can do a really good job at what I might call generic information. That's kind of information that is pretty pervasive within a culture. So, I asked Siri a question.
[Deborah Dahl's Recorded Voice] Alexa, what was the Phillies' record in 1956?
[Siri's Recorded Voice] In the 1956 regular season, the Phillies had a record of 71 and 83.
They're all very good at sports information for some reason. But this is a generic information that pretty much most Americans could probably, they could probably ask. Well, they'd have to be a Phillies fan but, they could ask it about other teams, and they could ask about long, long ago, and Alexa was right there. It was sitting there waiting to be asked.
As far as dialogue goes--the third pillar of conversational interaction--they're all very good at the kind of what you would call one-and-done, which is not really even a dialogue. it's just ask a question, get an answer. Things like the weather, the temperature, the time. Things that you can just ask once and get the answer to. They're very good at that. When slightly more sophisticated dialogues, or what you could call follow-up. That's where you ask a question and then kind of say, well, what about some other thing like without mentioning what you're interested in. So a better way to look at that is to find an example.
[Deborah Dahl's Recorded Voice] Alexa, what's the weather gonna be like tomorrow?
[Alexa's Recorded Voice] Weather alert for Plymouth Meeting. There's a tornado watch in effect until Friday, April 26th, 9:00 p.m. Here's the forecast for tomorrow. Look for partly sunny weather with a high of 60 degrees, and a low of 45 degrees.
[Deborah Dahl's Recorded Voice] Alexa, how about Washington, D.C.?
[Alexa's Recorded Voice] Weather alert for Washington, District of Colombia. There's a tornado watch in effect until Friday, April 26th, 9:00 p.m. Tomorrow, you'll see mostly sunny weather, and can expect a high of 68 degrees, and a low of 54 degrees.
So I didn't say anything about weather when I was asking about Washington, but Alexa knew that I was talking about weather and it gave me the information that I was interested in. And that's the information that all of you would expect if I asked about weather in Philadelphia, and then I asked about Washington. Let's try another one.
[Deborah Dahl's Recorded Voice] Alexa, what time is it?
[Alexa's Recorded Voice] The time is 1:57 p.m.
[Deborah Dahl's Recorded Voice] Alexa, how about in New York?
[Alexa's Recorded Voice] New York. New York was founded in 1624, 395 years ago.
Somebody sat there back at Amazon and typed in "weather follow-up," but they forgot to type "time followup." Interestingly, Siri actually does do the time follow-up, but it doesn't mean that either one of them really has a general idea what a follow-up question is. The third kind of dialogue that's the state of the art is slab filling, which is when you ask... Let's say I asked a question like, "I wanna make some travel plans." You can't really answer that question without getting some more information from the user; where they wanna go, when they wanna go, where they're leaving from.
So, the paradigm is going back through that task and saying, "OK, where do you wanna go, when do you wanna go, where are you going?" So those are kinda slots you fill in during the conversation. And that's a pretty well understood dialogue paradigm. What you don't find is, we'll see a lot of things that you don't find, actually, but they're starting to do things like at least be able to chain together several slot-fillings. Travel planning is a good example. Maybe you make it plane reservation, and then maybe move on to a related task like hotel reservation, and then move on to another related task like car reservation. There are systems that will let you chain together those individual tasks.
Allstate Conversational Designer Katie Lower outlines working models for assessing the viability of a conversational interface with multiple teams within an organization in this clip from her presentation at SpeechTEK 2019.
Allstate Conversational Designer Katie Lower defines the customer journey map as a visualization of the customer's process and explains why it's valuable in this clip from her presentation at SpeechTEK 2019.
Grand Studio Lead Designer Diana Deibel discusses the ethical implications of speech UIs and remaining cognizant of the inherent human elements of speech and conversation in this clip from her presentation at SpeechTEK 2019.
Grand Studio Lead Designer Diana Deibel discusses multiple approaches to making VUI design transparent--the Google vs. Alexa, system-initiated vs. user-initiated--in this clip from her presentation at SpeechTEK 2019.
Pindrop Director of Product Marketing Ben Cunningham discusses best practices for voice authentication in IVR design in this clip from his panel at SpeechTEK 2019.
Gridspace Co-Founder and Co-Head of Engineering Anthony Scodary demonstrates Grace, Gridspace's new automonous call center agent, in this clip from his keynote at SpeechTEK 2019.
Orion Labs Head of Product Ellen Juhlin and Voicea CMO Cory Treffiletti discuss persisting challenges in speech-to-text, AI identifying intent, user expectations, and more in enterprise speech tech applications in this clip from their panel at SpeechTEK 2019.
451 Research Senior Analyst Raul Castanon discusses new findings of a recent survey on speech technology adoption in the enterprise and how adoption of devices in the consumer space have impacted enterprise adoption in this clip from his panel at SpeechTEK 2019.
Grand Studio Lead Designer Diana Deibel discusses best practices for culturally inclusive access in voice UI design in this clip from her presentation at SpeechTEK 2019.
Gridspace Co-Founder and Co-Head of Engineering Anthony Scodary discusses the transactional nature of speech and how that understanding impacts effective, AI-driven call center analytics in this clip from his keynote at SpeechTEK 2019.
Conversational Technologies Principal Deborah Dahl lays out a plan for making more virtual assistants more effective in this clip from her keynote at SpeechTEK 2019.
Conversational Technologies Principal Deborah Dahl explains how more targeted enterprise knowledge could make VAs more effective in organizations in this clip from her keynote at SpeechTEK 2019.
Nuance Communications' Roanne Levitt delineates the differences between text-dependent and text-independent biometrics and what the advent of text-independent means for IVR applications in this clip from SpeechTEK 2019.