ChatGPT: Why the Hype and How Does It Affect Speech Technology?
Given the amount of attention paid to ChatGPT, the AI chatbot that’s been wowing the internet over the past few months, it feels as though humanity just discovered the wheel. OpenAI, ChatGPT’s founder, was set up as a nonprofit independent AI research lab in 2015 to promote and develop “friendly” AI in a way that “benefits humanity as a whole.” Elon Musk, Peter Thiel, and other tech figures pledged $1 billion toward its goals.
In early 2019, OpenAI’s leaders said that $1 billion wouldn’t be enough to compete with the well-resourced AI labs at companies such as Alphabet and Meta, so they created a new investment vehicle, called OpenAI LP. It is now a capped for-profit company (with investors limited to a maximum return of 100 times their investment), and it took a $1 billion investment from Microsoft. (At press time, rumors suggested that Microsoft was near an announcement of another $10 billion investment.)
In May 2020, OpenAI introduced GPT-3, “an autoregressive language model that uses deep learning to produce human-like text.” On Sept. 22, 2020, Microsoft announced that it had licensed exclusive use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3’s underlying model. Currently GPT-3 is one of the largest language models; OpenAI reports that the model has 175 billion parameters, or variables that the model uses to make predictions.
On Nov. 30, 2022, OpenAI released a public beta of the chatbot ChatGPT, and the hype ensued. More than 1 million accounts were created in less than a week, and off they went to ask ChatGPT about, well, everything: to make observations about what type of pizza is best, to write standup comedy, to compose poetry and piano sonatas, and on and on.
Entertainment aside, soon, perhaps, this technology could pair with speech tech for a range of business uses.
The 1 million-plus users’ inputs and feedback are further training the language model. And OpenAI isn’t alone; it has competition from several smaller and less funded companies that appear to be moving toward more focused language models by industry or application. In the general space where OpenAI’s ChatGPT is offered, well-funded companies like DeepMind, a subsidiary of Alphabet/Google, and Meta’s OPT are applying tremendous resources to keep up with or surpass OpenAI and its Microsoft funding.
Adding more training data to a language model can unexpectedly result in unplanned abilities, known as emergent abilities. The recently published research paper “Emergent Abilities of Large Language Models” states, “Although there are dozens of examples of emergent abilities, there are currently few compelling explanations for why such abilities emerge in the way they do.” And more confounding is that some large language models have emergent abilities, while others, such as GPT-3, do not—much like human siblings where one is a math prodigy and the other is a language genius.
OpenAI has discussed some of the capabilities and characteristics of GPT-4, which will be released sometime this year. OpenAI says the model will not be much larger than GPT-3, but it will be optimized, and the model will be denser. Density means that all parameters will be used to process any given input, which is comparable to asking Albert Einstein what color the sky is on a sunny day.
All these large language models require massive computing resources, but Google researchers have released a research paper proposing “confident adaptive language modeling” (CALM), which dynamically allocates resources depending on the complexity of individual parts of a task, using an algorithm to predict whether something needs full or partial resources. “Color of the sky? We don’t need Einstein; let’s ask that 10-year-old.” So the largest shortcoming of LLMs may soon be overcome.
As discussed here in September 2022, “New Omnichannel Offerings Share the Power of NLU,” the merging of text/chat with voice natural language models is well under way, and that is where ChatGPT and its competitors intersect with speech technology.
Currently these chatbot LLMs use text as input. But with a capable speech-to-text input on the front end and a text-to-speech engine on the back end, such a chatbot becomes a powerful speech-enabled voice assistant or a miraculous humanlike interactive voice response. These are the underpinnings of the “conversational cloud,” where using speech when appropriate and text when it’s not results in the same outcomes and becomes second nature to virtually everyone.
It seems that the confluence of the cloud, language models, and the desire to make them provide real value has speech technology moving near the speed of sound. x
Kevin Brown is a customer experience architect at Cognizant with more than 25 years of experience designing and delivering speech-enabled solutions. He can be reached at email@example.com.