2024 State of AI in the Speech Technology Industry: GenAI-Fueled Speech Analytics Enable Real-Time Results

Some very early iterations of artificial intelligence have been used in speech analytics for a few years. In 2023, however, the technology made significant advances thanks to the introduction of generative AI’s large language models. Those advances led to faster and more accurate analytics that could be used for real-time coaching, transcriptions, and summarizations, providing companies with more complete insights into the best ways to immediately help customers, customer trends and attitudes, and marketing and sales results.

AI developments in speech analytics and elsewhere have been in the news ever since ChatGPT made its debut a little more than a year ago, says Daniel Ziv, Verint vice president of experience management and analytics go-to-market strategy.

“It’s definitely had a big impact and will continue to have a big impact in 2024,” Ziv explains, noting that AI has become increasingly important as more companies are using speech analytics and are expecting additional benefits from the technology.

Speech analytics engines have been improving steadily, but the jump in contact center volume spurred by the COVID-19 pandemic and the subsequent spike in e-commerce and remote communications led to huge growth.

AI has answered that call, with significant growth last year, according to Ziv. “Now we have an AI-driven bot that scans the transcripts periodically, identifies words that are not in the large language model, and automatically adds them. That’s a huge enhancement for the thousands of customers who use our speech analytics. The model continues to evolve and learn.”

Brett Weigl, general manager of digital and AI at Genesys, considers generative AI’s impact to be minimal so far, but says benefits will come this year and beyond.

That’s not to say that there haven’t already been use cases for AI within the analytics field. On the contrary. So far, success has been seen in agent coaching, transcriptions, summarizations, and a few other areas.

The speed with which AI can cull through vast amounts of information has made a huge difference in real-time coaching, which experts expect will continue to evolve in 2024. The technology offers prompts and suggestions to agents in real time, meaning faster and more accurate customer service, Ziv says. “It continues to get better and more cost-effective. There is really strong ROI for that.”

Barry Cooper, president of the CX Division at NICE, agrees, noting that AI has also democratized speech analytics.

“No longer is [analytics] in the hands of a few select analysts. Real-time speech transcription and phrase detection are combined with AI models to score agent behaviors, produce live customer sentiment metrics, and guide agents on the next best action. This could include recommending that agents build rapport with the caller or inform the customer of a self-service option, like a new mobile app,” he says.

On the transcription front, AI has added the ability to fine-tune the outputs that are then fed into analytics engines.

“The result is that transcriptions are far superior than they were a year earlier,” Ziv says, noting that this, too, is a trend that is expected to accelerate in 2024.

Ziv’s company, Verint, offers a transcription tuning bot that automatically creates a custom language model by identifying and adding new terms and phrases relevant for a specific customer environment, so it improves accuracy of transcription, improving the speech analytics quality, as well as real-time coaching.

“The tuning bot impacts everything because it impacts the accuracy of the sentiment and the accuracy of the redactions,” Ziv says.

A bot for redactions was another advancement that Verint introduced in 2023. Accurate redactions help ensure that personally identifiable information and other sensitive data aren’t disclosed while the underlying information is analyzed, Ziv explains.

“Speech includes some very unique data, but also some very sensitive data. We don’t want to expose that information to everyone who uses the [AI or analytics] tool,” he says.

Other companies also released AI-based enhancements to their transcription engines.

NICE’s ElevateAI, launched at the very end of 2022, gained quick traction in 2023, according to Cooper. The technology works across audio, transcripts, and chats to build smart CX applications. Through APIs, organizations of all sizes can understand the voice of their customers with automated speech recognition (ASR) technology trained on billions of interactions from the world’s leading CX dataset.

In mid-2023, Genesys unveiled expanded generative AI capabilities for experience orchestration, designed to help organizations get deeper customer and operational insights using LLMs.

Also key to analytics success is call summarization, another area where AI is already being used.

Once a customer call is completed, the agent needs to summarize the conversation for analytics, compliance, and other processes. Along with being time-consuming, particularly for lengthy calls, manual summarization is prone to errors and inconsistency, according to Ziv. “Agents don’t like this part of their job.”

Automated call summarization driven by AI bots and LLMs alleviates this burden, is more reliable, and is fast enough that the summarization can go from one agent to another in the event of a call transfer. This provides the next agent with all of the details already collected so that the caller doesn’t have to repeat everything.

Ziv points out that providing good, fast, automated summarization of calls was “a nearly impossible task” before the advent of LLMs.

“There are still issues; it’s not a trivial thing to do this securely and effectively and to put it into the workflow,” Ziv admits. But generative AI’s accuracy and speed has made good, accurate, automated summarizations a reality.

The Year Ahead

Speech analytics will continue to benefit from the ongoing evolution of generative AI, Ziv and others say. “The power of the system is that it understands questions and can respond meaningfully. That’s what speech analytics does. You ask your analytics team what is impacting our cost structure, sentiment, conversion rates, etc.”

With traditional techniques, the analytics team would have to go through call transcripts or listen to calls themselves, build categories, then analyze findings. This process could take days or weeks, while generative AI automates everything and provides immediate responses, Ziv says.

But the technology needs to be monitored. If trained on incorrect information, generative AI will provide incorrect responses. If its knowledge base doesn’t include the answer, generative AI can start hallucinating, providing sometimes very incorrect responses.

“The challenge is that large language models themselves don’t have the answers in the model,” Ziv says. “They don’t know why people are calling because they have never seen your calls.”

So companies such as Verint use their vast amounts of data to train the LLMs, with a security layer to help ensure that data is safe. Using that internal data rather than data from the internet—which isn’t reliable—will be the big breakthrough in 2024, according to Ziv. “There are still some kinks to figure out, especially around the security and the availability of the data.”

Ziv also expects speech technology providers and users to develop and launch more AI-driven bots this year. Another advancement he expects is better integration of AI and speech across all channels to give a more comprehensive 360-degree view of customer interactions and analysis of those interactions.

Speech analytics results will continue to be delivered faster in 2024 as machine learning continues to enhance performance and users become more familiar with the technology, according to Maurice Kroon, founder and CEO of VoxAI.

“It’s going to be much more real-time than what we have now,” Kroon explains. “That opens up a lot of possibilities. There will be a lot of different use cases, like getting immediate, direct feedback.”

Better speed also means quicker insights from sentiment analysis. Though sentiment analysis has been around for a while, AI will provide more granularity, Kroon adds.

Another noteworthy development has been the emergence of open-source frameworks for emotion detection using voice analytics, which in some cases could recognize more than 60 types of emotions from voice alone. “This is both impressive and somewhat surprising; I didn’t even know there were so many,” Kroon says.

The improved real-time sentiment analysis is important, according to Kroon. “You can say something in a number of different ways. If we do not catch the nuances, there is a disconnect,” he says. “But due to improved analytics, there will be less friction, because we’ll get to know the nuance of what is actually meant because you can analyze it better and provide a better response.”

AI will also provide more comprehensive insights for organizations in 2024, Kroon predicts. “You have real-time analysis of not just a single conversation, but you also have a real-time picture of what is going on in your entire call center.”

Kroon also expects more start-ups to start offering AI-driven speech technology solutions.

Genesys’ Weigl expects to see current versions of AI-driven analytics for agent assist evolve into full copilots that provide more comprehensive information for agents than the current agent assist technologies. They will not only include the customer behavior and “triggers (for upsells, etc.) of today’s technology, but the generative AI will also be able to pull together and analyze customer journey information for predictive modeling.”

If a customer calls for a refund, for example, AI-driven speech analytics will be able to identify that purpose and guide the agent through the things that sequentially tend to happen, Weigl explains.

Copilot technology will also give agents immediate or near immediate access to data and reports relevant to caller queries, according to Weigl.

AI’s infusion into speech analytics really began to take hold in 2018 or so, and today one would be hard-pressed to find a speech analytics technology vendor that does not offer some variation of AI embedded into its solutions.

Following its acquisition of Nuance Communications in March 2022 for nearly $20 billion, Microsoft hopes to use AI to make significant inroads in the speech analytics technology market in 2024. Central to that mission was its introduction of the Azure Open AI Studio’s Chat playground, powered by Azure AI Speech, late last year. The Azure Open AI Studio’s Chat playground is designed to enhance the chat interaction experience with advanced multimodal input and output.

Azure AI Studio enables users to try out Microsoft’s speech analytics capabilities, including summarization, redaction of personally identifiable information, post-call analytics, and agent assist.

But this is only the beginning of what speech analytics with AI will be able to do.

Speech analytics tools like keyword spotting and smart transcription enable users to analyze calls, including the words used during calls. And this is not just in the call center. Through automated call scoring and conversion logs, users can determine the ROI of their marketing and sales efforts, and tools that can understand those conversations will only grow in importance.

Phillip Britt is a freelance writer based in the Chicago area. He can be reached at spenterprises1@comcast.net.

2024 State of AI in the Speech Technology Industry: GenAI-Fueled Speech Analytics Enable Real-Time Results

The Year Ahead

SoundHound AI Is Bringing Amelia 7 Agentic AI to Vehicles, TVs, and Smart Devices

Sensory Launches Smart Wakewords

Plaud Unveils Plaud NotePin S and Plaud Desktop

CEVA Partners with Sensory