Q&A: Building a Conversational Chatbot for Google Assistant
Michael McTear is an Emeritus Professor at Ulster University with a special research interest in spoken language technologies. He has been researching in the field of spoken dialogue systems for more than 15 years and is the author of the widely used textbook Spoken Dialogue Technology: Toward The Conversational User Interface (Springer Verlag, 2004). He is also co-author (with Kristiina Jokinen) of the book Spoken Dialogue Systems, and of the book Voice Application Development for Android (with Zoraida Callejas). He is presenting the SpeechTEK University course “Build a Conversational Chatbot for Google Assistant" on at SpeechTEK 2018. SpeechTEK co-chair, James Larson, talked to McTear in advance of his conference session.
Q: What do you feel are the major issues/challenges in building a conversational chatbot for Google Assistant?
A: Building a conversational chatbot for Google Assistant, or indeed for any of the other available platforms, would appear to be a simple process if we go by promises of quick success such as “build voice and chatbots in minutes” or “create an assistant without any code.” Indeed, there are tools available on the Actions on Google developers website as well as on many other platforms that enable developers to get started quickly with an app. However, creating a robust and useful application requires the same amount of effort as any other software development. It is important at the outset to establish appropriate use cases and then to design the dialog flow carefully to provide an effective and satisfying experience for users.
Extensive testing is essential with real users in realistic usage situations to expose any problems with the application and address these through iterative refinement. The Actions on Google website provides useful guidelines for designing voice-based apps. There is also a wealth of expertise accumulated in over two decades by designers of voice user interfaces that can provide useful guidelines and highlight pitfalls to be avoided.
Q: What tools are available to assist in designing and implementing the chatbot?
A: Google provides several tools for developing a chatbot: Templates enable developers to create simple apps without writing any code. Some of the settings within the template can be customized, and developers do not have to worry about designing conversational flows as this is handled by the template. However, a template solution is only appropriate if there is a template that is closely matched to the form and structure of the app being developed.
Dialogflow (formerly known as API.AI) provides an easy-to-use web IDE that takes care of many of the complexities involved in creating an action for Google Assistant, such as linking to external web services. Dialogflow also includes a Natural Language Understanding (NLU) engine that can process the user’s spoken or written inputs. Actions SDK requires the developer to create action packages for Google Assistant manually and to deploy them using a command-line interface. While this option provides the developer with greater control, it is less convenient and less developer-friendly than the IDE provided by Dialogflow. However, the Actions SDK allows developers to use their own Natural Language Understanding tool. This may be desirable if the text to be processed goes beyond the capabilities of the Dialogflow NLU tool.
Q: How many utterances should the developer use to train the chatbot?
A: The Dialogflow NLU tool is based on an approach to natural language understanding involving intents and entities. This approach is used widely in most chatbot development tools. The user’s input is parsed and mapped to an intent that represents what the user wants to achieve with their utterance and what action the app should take. The entities (or parameters) of the intent are extracted, providing the data required to fulfill the intent. The task of the developer is simply to list sufficient examples of user utterances for each intent. Depending on the complexity of the intent around 15–20 sample utterances should be sufficient initially.
The NLU tool then uses machine learning to create classifiers that enable the tool to recognize inputs that are similar to but not necessarily a complete match of the sample utterances provided. Additionally, during further testing, the developer can explore the logs of the interactions to find inputs that were not matched or that were matched incorrectly, and these can be corrected and added to the tool’s training. Thus, over time, the NLU capabilities of the system become more extensive and more accurate as a result of iterative refinement. The main advantage of Dialogflow’s approach is that it frees the developer from having to create a huge number of variations of the sample utterances provided at the initial stage of development and from having to write rules to perform the matching and extraction processes.
Q: What do you see as the current limitations imposed by Dialogflow? (e.g., utterances containing references to previously discussed objects, etc.)
A: The approach of identifying intents and extracting entities works well with simple inputs to a well-structured task. However, this approach has its limitations. For example, it is not possible to associate more than one intent with each utterance: for example, “switch on the kitchen light and turn on the cooker.” Complex utterances involving multiple constraints would also pose a problem: for example, ”I want to book a flight to Denver that will get me there in the evening but not too late to get a bus into the city center.” Here a much more powerful NLU tool would be required.
Referring to previously discussed objects is addressed to some extent in Dialogflow using the concept of follow-up intents. A follow-up intent is linked to a parent intent by specifying an output context for the parent intent and a matching input context for a follow-up intent. This means that subsequent utterances that match in this way can be processed, for example: “What is the weather in Boston?” and subsequently “What about Los Angeles?” Several built-in follow-up intents are provided in which the contexts are generated automatically. However, specifying which intents are to be treated as follow-up intents has to be done manually, requiring the developer to predict in advance where any follow-up might occur in a dialog,
so that this approach only seems to work for narrowly constrained dialogs.
Q: What do you see as the differences between developing chatbots and techniques for creating speech-based dialogs?
A: In a narrow definition, a chatbot is an application that runs on a messaging platform using text, whereas other applications, such as voice user interfaces and intelligent assistants use voice. However, chatbots are often defined more widely as applications with a conversational interface, and in this respect, they are similar to applications involving speech-based dialogs. While speech adds the additional complexity of dealing with speech recognition errors, the conversational requirements of any application with a conversational interface are similar. For example, there are the issues of how to respond appropriately to a wide range of inputs from the user, including follow-up questions, clarification requests, corrections, changes of the topic--and how to keep the conversation on track by initiating clarification requests, offering guidance, making suggestions, and detecting and repairing miscommunications. There is an extensive literature on these topics by developers of spoken dialog systems and voice user interfaces that provide useful guidelines and techniques for developers of conversational chatbots.
Crispin Reedy is a Voice User Experience designer and usability professional at Versay Solutions. She has over 15 years of experience on the front lines of the speech industry, in the design, usability, and tuning disciplines. She is presenting the SpeechTEK University course "Strategizing Customer Experiences for Speech" on Wednesday April 11 at SpeechTEK 2018. SpeechTEK program chair, James Larson, talked to Reedy in advance of her conference session.