Q&A: Jonathan Eisenzopf, chief technology officer at Discourse.ai, about using human-to-human corpora to train bots faster
By using human-to-human corpora, bots can be trained faster. James Larson, co-chair of the SpeechTEK conference, talked to Jonathan Eisenzopf, chief technology officer at Discourse.ai, about this, which will be part of his SpeechTEK 2020 presentation.
What information can chat logs and call transcriptions provide to dialogue system developers?
Quite a lot. If you think about it, every conversation a company has with customers today likely also occurred yesterday, last week, and last month. Assuming that a dialogue system will automate some of the same kinds of conversations, many of the patterns we need are already in the data; the challenge is how to identify and classify it. More specifically, we have models that can extract the intents, slots, entities, and flows along with the associated agent prompts and customer responses. This is a two-step process because we first must classify the agent interactions using a model that represents a theory of how conversation works, then identify which features might be useful in a dialogue system.
Can extracted commands be automatically labeled with the appropriate intent?
Yes. The Discourse.ai system automatically groups together conversations that are similar to each other, then labels the intent for those conversations based on one or more utterances within the conversation. This is actually more complicated than it sounds because people don't necessarily fully state their goals at the beginning of conversations the same way that they would in a directed dialogue or statistical language model dialogue.
Given that human conversation unfolds over time, we also need to care for how the intent might evolve or change during the course of the conversation based on clarifying questions or statements or expressions and handling of certainty.
I actually don't like the word intent and wish the industry would use different terminology, but we're stuck with it now, so I'll instead try to define it better. I would define intent as the desire (or action potential) of an interlocutor to achieve an outcome that meets a goal (or need).
I would go further to also define a conversation as a collaborative act between two or more interlocutors to achieve a goal.
These definitions of terms can then be used to mean the intent of any speaker (human or bot) in a conversation and allows us to have a basis for analyzing both human-to-human and human-t0-bot corpora.
What are the common features of product offerings for extracting intents?
This is something that I will be presenting at the SpeechTek conference, but I'll summarize them here. I would categorize this task as intent discovery, which I define as the process of uncovering the relationship between customer intents and a company's ability to maximize conversion of the customer's action potential as it relates to the company's brand, products (or services), and processes.
This sounds a bit broad because the term is used across different domains, but it also applies here. The important distinction is that intents are usually weighted toward the company and not the customer. We can debate whether that's good or bad, but it does make a difference in how a company identifies and names what an intent is and means.
To be more specific in the domain of extracting intents from human conversations for the purposes of building a dialogue system, what we're extracting from conversations is usually the customer utterance(s) where they are stating their goal(s) of the conversation in their own words. This typically involves a manual collection process whereby a business analyst reviews and extracts those utterances from transcribed agent calls or online chats. They then label each utterance with a specific intent.
This process of identifying which utterance(s) best represent the customer's intent, grouping similar utterances into clusters, and then suggesting a potential name of the associated intent are the three primary features that should be supported.
What are the significant differences among product offerings?
The biggest differences at the moment have to do with the level of data pre-processing that is required by the customer to use the product, how much work is required by the vendor to load and process the data, what kind of visualization tools are provided for business analysts and designers, and which bot frameworks are supported. This is a new market and it's not clear what the right way is yet. Any improvement compared to manual collection and analysis is an improvement.
What are the major difficulties that can be encountered when using these products, and how can these difficulties be overcome?
There are several, some of which I alluded to previously. Intent discovery tools that require developers to manually collect intent utterances save some time, but leave most of the work to the customer or professional services. Other tools automatically cluster similar conversations together, but some still require data scientists at the vendor to complete this work.
These first two steps are the hardest to automate and require the most work. If we can partially automate the last third of the intent discovery process, that would be useful, but I don't believe these tools will be fully embraced by businesses until the entire process is automated, which I believe will happen as these products mature.
The last difficulty is deciding how to visualize many conversations in a way that is easy to understand. I've worked for more than 10 years to find the right way to visualize conversations for tasks like intent discovery, and I'm still not entirely happy with the options. The market will tell us the best way to solve the problem when customers tell us which method makes the most sense to them.
To see presentations by Jonathan Eisenzopf and other speech technology experts, register to attend the SpeechTEK Conference in Washington April 27-29.