April 25, 2023
By Deborah Dahl Principal - Conversational Technologies
Standards

How to Make ChatGPT Usable for Enterprises

It’s hard to avoid hearing or reading about new advances in natural language understanding (NLU) technology that demonstrate its astonishing abilities to understand and generate natural language. ChatGPT, developed by OpenAI, is the latest example. Given a simple natural language prompt, it can generate amazingly humanlike, fluent, and complex responses. It has been trained on huge amounts of written language obtained from the web, using the newest and most effective machine learning algorithms. Basically, it has absorbed this training data and can recombine it to respond to users’ questions. Systems like ChatGPT and the related system, GPT-3, are examples of large language models, or LLMs, which are based on this approach.

A lot of ChatGPT examples that you’ll find on the web are entertaining but not very practical, and you might wonder how LLMs can be put to work in actual applications. Certainly, LLMs and their accompanying fanfare are going to encourage many entrepreneurs to put them to work in applications. How can standards help? One way would be to make it easier for developers to use LLM results to access enterprise information.

Besides writing poetry or giving answers to science questions, there actually are some ways that LLMs can be useful in voice assistants and chatbots. A simple example is generating application-specific training data. Getting more training data for natural language applications is always helpful, and ChatGPT can do this quite well if asked. For example, I asked ChatGPT to generate requests for account balances for a hypothetical banking application. It came up with a few plausible (if a little stiff) examples, like the following.

“Can you please let me know my checking account balance as of today?”
“Hello, I would like to request an update on my checking account balance.”
“Could you please let me know the current amount in my account and any recent transactions that have occurred? Thank you for your assistance.”

These could all be used as training data in a banking application. Someone does need to check what ChatGPT produces, though, because it can “hallucinate,” or come up with confident-sounding but unfounded and inaccurate responses that you wouldn’t want in your training data.

More interestingly, ChatGPT can be fine-tuned for specific applications like chatbots or voice assistants by providing it with the right data, such as examples of classification and slot extraction. For example, it could be told to extract “checking account” to fill an “account type” slot from the previous examples. The GPT-3 documentation shows how to provide it with this kind of data and how much data it needs for different applications.

Probably the most time-consuming part of NLU application development is getting information from a proprietary back-end source of enterprise content like a database. LLMs are of no help in connecting natural language queries to enterprise data, which by its nature is not, and shouldn’t be, publicly available to the LLM training process. Most of this data—for example, your bank account or medical history information—is protected by passwords or pay walls that the LLMs can’t get past. If asked, ChatGPT will correctly refuse to answer these kinds of questions and will point out that it’s not able to do that.

While LLMs can be fine-tuned to extract information from enterprise-specific queries, once the fine-tuned LLM understands the user’s question, it still needs to be answered. This means mapping the LLM result to enterprise knowledge sources. This mapping process would be simplified if the results from different LLMs were in a standard format, which would make the results easier to access. There are at least two existing formats for speech recognition results that could be used for LLM results: the Worldwide Web Consortium (W3C)’s Extensible Multimodal Annotation (EMMA), an XML format (https://www.w3.org/TR/emma20/); and the W3C’s JSON Representation of Semantic Information (JROSI), a JSON format (https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/emmaJSON.htm). Both of these are worth looking into and could be very useful as LLMs begin to be used for more practical purposes.

In case you’re wondering, I did ask ChatGPT to write this column for me, but it was confused about the difference between speech technologies and speech technology standards. For example, it told me that “one of the most important speech technology standards is speech recognition,” which is of course a speech technology, not a speech technology standard. It looks like I’ll be holding on to my column-writing duties for a while longer!

Deborah Dahl, Ph.D., is principal at speech and language consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group. She can be reached at dahl@conversationaltechnologies.com.