Extracting User Data from Speech Applications

Data makes the world go round. Whether it’s the trail you leave while browsing the web or shopping with your credit card, hardly anyone escapes the reality of our data-driven economy. Siri and Alexa are no exception to this rule. Voice search and personal assistants, along with the growing popularity of chatbots, have given rise to the realization that there’s likely to be a lot of potentially valuable information contained within the massive amounts of voice data that currently sits in the cloud and onsite at a growing number of companies.

The data can be used both proactively (to triage and direct consumer inquiries, for instance) and retroactively (to analyze the data to glean consumer insights).

According to Search Engine Land, voice search, which first emerged in 2008, represent 20% to 25% of current searches. A ComScore analysis predicts voice search will rise to 50% by 2020, as reported by Business 2 Community. Smart speaker use is also on the rise—currently with about 13% of U.S. households owning a smart speaker, with adoption predicted to grow to 55% by 2022.

The Movement to Voice (Smartphone, AI, Chatbots)

Aamer Ghaffar is the founder of Kordinator, a speech technology firm that uses artificial intelligence (AI) and advanced neural networks to triage requests and queries related to employee benefits for legal, finance, and HR departments. The software contains ingrained sensitivity sensors, which can determine a voice’s urgency, anger, or dissatisfaction and escalate such calls to human agents. It’s a firm focused on boosting customer service through predictive technology and analysis. All interactions—whether human or automated—are collected and analyzed to continually improve the algorithms. It’s the same kind of process that’s occurring in many settings, with new applications continuing to emerge. The platform integrates with Salesforce and is used by companies such as Safety and Jamba Juice.

High-touch services like employee benefits, health benefits, and pension and retirement benefits also involve significant repetition—as high as 30% to 35%, says Ghaffar. “Where there’s repetition, that’s a huge opportunity for automation to complement the involvement of humans,” he says.

Harnessing voice data analytics holds multiple possibilities for companies—some yet to be imagined. The ability to mine more precise and reliable insights from massive amounts of data mean that companies can better understand the customers they serve and identify trends and potential opportunities from these insights.

The Emergence of Voice Analytics

Voice analytics solutions can tell all sorts of things about you—age, gender, whether you’re angry or not. Some emerging apps can also even use voice data to diagnose illnesses. Some cable companies use voice analytics to determine, for instance, whether customers are serious about canceling their subscriptions—and can help agents decide how to treat customers according to voice cues.

How does it work?

Richard Stevenson is the CEO of Red Box, a speech technology firm that has been in business for 30 years; the company, based in the United Kingdom, has about 3,000 customers globally—primarily representing police, Coast Guard, Secret Service, and other emergency services clients. Up until about 18 months ago, Stevenson says, the reasons for capturing data were centered on compliance, QA, training, and related uses.

About two years ago, he says, “we started seeing more general business leaders from different sectors within large organizations starting to take notice of speech, voice data, and technology. Consumerization has been a big driver.” That includes people experiencing the internet via voice within their homes through devices like Amazon’s Echo and Google Home. Both of these companies, he says, “are really trying to extract the value of voice.”

Both transcribed and non-transcribed data is being taken out of voice data, says Stevenson. Non-transcribed data relates to “things like intonation, trying to analyze intent, or trying to analyze emotion,” he says.

“We have written...natural language processing models that are domain specific,” says Ghaffar—for instance, domains like employee benefit services and financial health. The technology is used primarily by midmarket insurance companies, he says, that are concerned with providing better service and higher satisfaction to employees.

The technology not only provides organizations with the ability to streamline operations to gain efficiencies and reduce costs but also yields data that can help them better understand customer needs.

How Voice Data Is Extracted

Voice and audio data can be captured through formats like WAV, MP3, and WMA and translated into computer readable format. In a blog post for Analytics Vidhya, titled “Getting Started with Audio Data Analysis Using Deep Learning,” Faizan Shaikh gives some examples of how that data is already being put to practical use:

Indexing music collections according to their audio features.
Recommending music for radio channels.
Similarity search for audio files (e.g., Shazam).
Speech processing and synthesis—generating artificial voice for conversational agents.

Specific types of data can also be extracted from large data sets based on lists (or dictionaries) and patterns. Currently, though, says Stevenson, much of that data is housed in silos.

“Consolidation of, and sovereignty over, voice data is top of mind for organizations that are recognizing the power that voice data extraction may hold,” he says. “As they start to consider what they can use voice data for, they want to get it into one application.” Typically, that data may have been housed in vertical silos—one for call centers, one for mobile, one for back offices, etc. While the cloud has been serving as the repository of much consumer data—including voice data—Stevenson suspects that this may begin to change as concerns over sovereignty and the protection of data continue to grow.

Call centers are one obvious use case for analytics because the ability to extract data and make inferences from that data can help to better direct calls or provide call center staff with prompts or tips on how to best engage callers (to identify a caller who is serious about canceling a service, for instance). Trader floors are another, says Stevenson. There, he says, opportunities center primarily on productivity. For instance, rather than having to do call notes, traders simply speak into their phone and the call gets transcribed into the call record. In addition, calls can be analyzed to determine how conversations are changing as market events occur; who they’re talking to; and the frequency of the conversations.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Voice Data Gap and How to Close It

Companies still aren't doing enough to capture voice data, but advances in transcription, AI, and machine learning could change things

27 May 2019

Extracting User Data from Speech Applications

The Movement to Voice (Smartphone, AI, Chatbots)

The Emergence of Voice Analytics

How Voice Data Is Extracted

The Voice Data Gap and How to Close It

SoundHound Releases Amelia 7.0

IBM Releases Granite 3.3 8B Speech Recognition Model

Nari Labs Launches Dia TTS Model

SoundHound AI Partners with Tencent to Bring Conversational AI to Auto Brands