How Users Interact with Different Types of Intelligent Agents (Video)

Ulster University Professor Michael McTear discusses how interactions differ with different intelligent agents, from one-shot dialogue to system-directed dialogue and mixed-initiative in this clip from his presentation at SpeechTech 2018.
By The Editors of Speech Technology - Posted Aug 31, 2018
Michael McTear: Interactions can differ slightly from one type of intelligent agent to another. You could have a voice-user interface such as you find in the traditional systems.

More recently, of course, we have voice assistance on our phones. Predominantly, there are two types of interaction that are used in these systems. One type is the one-shot dialogue. That's where you say everything that you want in one go. So for example, "What's the weather in Washington for next Thursday?" The system finds the answer and tells you, and that's it. Or you say, "Set an alarm for 7:30 tomorrow morning." It says, "Alarm set," and that's it. It's a one-shot query. If that works, and that's the way most of these systems are at the moment, that's fine.

The other type is the system-directed dialogue where you want to, for example, book a flight and there are different parameters that have to be filled. The system needs to know where you want to go, the date and the time, and maybe various other things, such as, which particular airline. When it's gathered all those different bits of information, we call that slot-filling, then the system will then look up and find one or more flights that match those parameters.

The third type is mixed-initiative, which is sort of more in the future. It's what people are looking for, where you might be doing some troubleshooting or getting advice on which insurance to take out, where it can't really just be handled with a one-shot query or simply slot-filling. There a sort of negotiation goes on. The user may be asked some questions as well. That's called mixed-initiative dialogue.

Messaging interfaces are more text-based. In some cases, not even text. There’s a button for the input, where the feedback that the user is going to say is just little buttons. They're called quick replies. At the next level, you can have fairly constrained text input. And then at the higher level, you have natural language understanding with open-ended input. And then, sometimes these can be voice-enabled as well.

Then we can interact with intelligent agents, such as the smart speakers that we have in the home, or with social robots. There again, we have the same sorts of queries.

But what we might see in the future, again, is what we call open multi-turn dialogues. And these are dialogues that actually take much longer, will be mixed-initiative.

Then there's the internet of things, where voice interaction is also possible. In some cases, it may be very restricted. If you're talking to the thermometer, there are not that many different things you're going to be able to say. You're just adjusting the temperature. But again, there may be one-shot queries like, "Raise the temperature to 70 degrees." And in some cases, there may be slot-filling dialogues as well and even, then, open multi-turn dialogues.

