SpeechTEK 2019: Making Machines More Human

Deborah Dahl, principal at Conversational Technologies and co-chair of the SpeechTEK Conference, kicked off the last day of SpeechTEK 2019 with a talk titled "Just Like Talking to a Person: How to Get There from Here?" Dahl says that conversational interfaces are much improved but asks, "Are they like talking to a human?" The simple answer is, "Not yet."

The most common virtual assistants--Siri, Alexa, and Google Assistant--all still struggle with some basic tasks, like follow-up questions. They excel at doing tedious tasks--like setting alarms--and relaying sports information, which is probably why the websites for all three still promote their most practical attributes.

Dahl says there are ten places she sees as ripe for improvements that will help virtual assistants get closer to carrying on a conversation like a human. In descending order they are:

Knowledge
Language
Detecting user state
Learning
Dealing with bad audio situations
Multimodal inputs
Multi-lingual
Prosody
Grounding (the ability to learn without being taught)
Carry on an extended, productive conversation

Dahl's keynote served as a good lead into Intuit's Wolf Paulus's "The Engineering of Emotion." When creating a voice-based app for Mint, Paulus was challenged by making sure answers to questions like, "Alexa, ask Mint, can I go out for dinner tonight?" don't sound judgemental. He says, the original answer to that question, "There is still $70 left in your restaurants budget, but you also significantly overspent in all other categories," didn't rate well with Sentiment Analyzers. So he started reworking the text--changing out this word or that--until he came up with a more positive answer: "You still have $70 in your restaurants budget, but please understand, in all other categories, you are in the red."

Determining what's positive and what's negative can be challenging. Paulus' team asked 156 people to tag thousands of sentences on a scale from totally negative to totally positive--93% of all responses exactly matches or were one off from the sentiment analyzer's score. But there were some sentences that really seemed to throw a wrench into the works, like, "He was evil through and through, and I am glad the bastard is dead and won’t be able to hurt her anymore." It's the kind of sentence that you need context to truly comprehend.

Still, he developed a tool that could parse sentences, showing the creator what was positive or negative about it. In the end, though, Paulus says the easiest way to know if you've delighted your user is if they say "please" and "thank you" when interacting with your voice app.