Five Must-have Speech Recognition Capabilities For the Modern Contact Center

Contact center solutions continue to gain in popularity as companies focus on improving customer relationship management and overall customer experience. At the heart of these solutions is automated speech recognition (ASR) – or speech-to-text (STT) technology – which provides the data that the analytics platform uses to uncover sales, marketing, and operational insights. Because of this key capability, many contact center solution providers are embedding ASR technologies into their offerings.

The latest advances in speech-to-text technology enable analytics software to deliver the most accurate and complete customer insights. To do so, robust ASR solutions must be able to do the following:

Capture All Call Data

When you consider that a contact center might have 10,000 or more call-center representatives, each with an average call time of about 7 minutes, that’s a lot of information being recorded on an ongoing basis. Being able to easily retrieve and analyze all of this data is critical to contact centers – providing them with insights into customer behavior and perspectives, as well as agent practices and effectiveness. Unfortunately, until recently it has been costly and impractical for contact centers to derive actionable insights from all of the data available in call recordings.

This limitation drastically reduces the effectiveness of analytics software, which depends on having access to huge volumes of data in order to produce the best results. However, advanced ASR technology can cost-effectively transcribe 100% of contact center calls. Advances in AI and improvements in computing power (especially graphics processing units, or GPUs) enable high-speed processing of massive quantities of data, with an order of magnitude less hardware, than was required just a few years ago. Providing access to all customer data is a game-changer for analytics solutions, giving contact centers newfound, valuable insight into their data.

Provide Comprehensive Language Support

While support for multiple languages is important in today’s global marketplace, it’s actually more important that ASR software is able to adapt to multiple dialects and accents within the same language. In fact, STT adaptation capabilities need to go even further in order to effectively address the specific needs of contact centers. Since jargon, acronyms, and terminology vary from industry to industry, ASR solutions need to be able to provide industry-specific language models, enabling the software to understand and accurately transcribe industry-specific language. Moreover, as each individual company has its own internal terms that need to be transcribed accurately – such as product and company names – ASR solutions need to be flexible enough to adapt to each unique implementation.

Conduct Sentiment and Emotion Analysis, as Well as Gender Identification

As improving the customer experience becomes a key priority, it’s important for contact centers to be able to identify caller sentiment and emotion. Advanced STT technology uses acoustic analysis – based on an AI methodology known as “deep learning” – to analyze the caller’s tone, pitch, volume, intensity, rate of speech, and other characteristics to identify customer emotion. This capability goes beyond simply identifying callers showing obvious signs of distress, such as shouting. Subtle combinations of acoustic features that convey frustration or disappointment can be identified using modern machine learning methods.

Simultaneously, the ASR system uses another AI model to conduct linguistic analysis on text produced from speech to identify the sentiment being expressed. The sentiment analysis sub-system is trained on text segments labeled as being positive or negative. It then uses what it has learned from these examples to identify novel phrases as being positive or negative. As with emotion detection, sentiment analysis can pick up on subtle cues in people's word choices as well as more obvious cases, such as “I want to cancel my account,” or “I am upset” to indicate a negative sentiment or “This was very helpful” to indicate positive sentiment.

Emotion and sentiment results can be combined into a single score and tracked over the course of a call to identify whether the customer is becoming more or less happy as the call progresses, which can be used as an indicator to the call agent in real-time or used for agent scoring. These scores also aid with the identification of call recordings that are good examples to include in agent training.

In some situations, such as when a contact center conducts customer surveys or wants to differentiate between different speakers in a household, it may be important to identify the gender of the speaker. A robust ASR system will include the ability to identify the most likely gender of the speaker using voice characteristics.

Redact Sensitive Data

In today’s highly regulated business environment, contact centers must protect customer privacy and data security. In addition to redacting personally sensitive information (such as social security numbers) from all calls, there are specific regulations governing data protection and privacy with which contact centers must comply. For example, the Payment Card Industry (PCI) has standards requiring organizations to ensure the security of any payment card data that is stored. The Health Insurance Portability and Accountability Act (HIPAA) regulates the handling of a customer’s personal health information, including how it is stored. For EU citizens, the General Data Protection Regulation (GDPR) specifies how customer data may be processed and stored to protect privacy. Customer phone numbers, credit card numbers and other personally identifiable information (PII) provided during the call must be protected and kept private.

The most effective way to remain compliant with these regulations is to automatically redact sensitive information during transcription. In order for an ASR system to reliably identify sensitive words, there must be a high level of confidence in the accuracy of the words being transcribed, so that private information is not inadvertently missed during redaction. It is also vital that sensitive data be removed from both stored audio and transcripts. Redaction provides protection from both external and internal threats by completely and permanently deleting sensitive information from call data.

Integrate With Other Solutions in the Contact Center Ecosystem

A technology ecosystem is quickly becoming established in the contact center industry. Interactive Voice Response (IVR) systems, data visualization, ASR, SMS text messaging, data visualizing systems, call recording systems, analytics software, and CRM systems all need to be able to work together seamlessly. To play a key role in this new environment, ASR systems need an open API architecture, so the overall contact center solution can scale and grow without interoperability problems and customization costs. This also means being vendor-agnostic, so the ASR solution can integrate with any recording or analytics platform. Additionally, the solution should be deployable both in the cloud and on-premises to adapt to customer needs.

The market is moving toward more sophisticated and integrated contact center solutions – a broader contact center ecosystem. ASR technology plays a key role in this environment and must be easy to integrate with other contact center components. In addition, there are several other must-have advanced ASR capabilities to help analytics platforms perform more accurately and effectively. By enabling contact center solutions to capture all the data, identify sentiment and emotion, provide comprehensive language support and more, ASR technology will unlock the key to data insights that will be a game-changer for analytics solutions and contact center end-users alike.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Five Must-have Speech Recognition Capabilities For the Modern Contact Center

Capture All Call Data

Provide Comprehensive Language Support

Conduct Sentiment and Emotion Analysis, as Well as Gender Identification

Redact Sensitive Data

Integrate With Other Solutions in the Contact Center Ecosystem

The Internet of Things Is Getting Emotionally Intelligent

Are You Listening to Me? Why VoC is Crucial to Business Success

The Key to Business Growth: Using AI to Break Barriers in Sales Conversations

Video: Detecting Deception in Speech, Pt. 1: Methods

Would You Like Fries With That? Speech Technology at the Drive-Thru

Protecting User Data: How Close is the US to its Own GDPR?

SoundHound Partners with Acrelec

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

CivAI Launches AI Voice Game to Demonstrate the Future of AI