Speech Technology Magazine

 

Google’s Duplex Lets a Bot Be Your Voice

Thanks to the Duplex technology, Google Assistant can make simple calls on your behalf. But how much automation is too much?
By Phil Shinn - Posted Oct 8, 2018
Page1 of 1
Bookmark and Share

This spring Google rolled out Duplex, its technology for conducting two-way real-world conversations over the phone, with Google Assistant holding up your end of a call. It was demoed at the Google I/O developer conference by Google’s CEO, Sundar Pichai, who noted that 60% of small businesses don’t have automated reservation systems. So if you need to make a haircut appointment or get your car fixed or book a table at a restaurant, more often than not you’ll have to call. The star of the demo was Google Assistant, which will make that call for you. Pinchai played two recordings: a call to a hair salon to book an appointment, and a call to a restaurant to reserve a table. 

Some initial press was positive, but critics focused on the two recordings, which were not live, insinuating that they were staged. There was a dial tone in the audio, but the human answerers don’t say the name of their business; there’s no ambient background noise; and neither the salon nor the restaurant asked for the caller’s phone number or other contact info.

You don’t need to be in the speech space long to learn that you might want to avoid antagonizing the live demo gods, so I for one have no problem with recordings. And one could presume the business names were edited for privacy. But what if the demo was scripted and/or edited? Why, I am sure this has never, ever been done before! 

What bothered other folks was the system did not disclose its lack of humanity. In the demo recordings, there was no ‘“I’m a bot,” no earcon, no asking for DTMF input. Critics pointed to clever features the designers threw in—like ums and ers and other conversational dialogue markers—as “deception by design.”

Here was Google’s response: “We understand and value the discussion around Google Duplex—as we’ve said from the beginning, transparency in the technology is important. We are designing this feature with disclosure built-in, and we’ll make sure the system is appropriately identified. What we showed at I/O was an early technology demo, and we look forward to incorporating feedback as we develop this into a product.”

VUI designers have been wrestling with agent transparency for a long time. There’s actually law mandating that you have to tell people when they are being recorded, which is why “Your call may be monitored…” is played to humans a billion times a day. 

Should a bot use first-person pronouns? Interesting that a lot of identity claims over the phone start with first names only—“Hi, this is Julie, how can I help you?” Should we put in earcons at the start to let users know what they’re dealing with? Or use a flat-affect TTS like Data from Star Trek

In the early days, when a voice bot asked you to punch buttons, it was pretty clear a bot was on the other end of the line. Later we got speech recognition and natural language processing grammars, so it wasn’t as clear. Some designers did in fact recommend playing an earcon at the start of the interaction to make the automation apparent to users, mainly because computers were still pretty lame when it came to having a dialogue and it was a good idea to set realistic expectations. 

Now it’s like when you know three sentences of French, go to Paris, use them, and fail. The trouble with being too clever by half is how do you unwind a bot when it breaks? Just because you can pass the Turing Test doesn’t mean you have a right to.

Personally I like to know who or what I’m talking to up front. This should be the fourth law of robotics. Maybe I need a bot to answer calls to figure out if it’s a bot calling. Soon people will propose working out the details of meetings or projects by saying, “My bot will touch base with your bot.”

We know machines are stupid now when it comes to real conversation. (Not to mention ethics. When I get a call from “Barbara” pitching timeshares in the Bahamas, I hang up. This is why nobody answers the phone anymore.) Eventually, however, bots will probably get so smart that they’ll start feigning stupidity in order to get to talk to a person—and pass the Turing Test with flying colors.  

Page1 of 1