Five Must-have Speech Recognition Capabilities For the Modern Contact Center

Article Featured Image

Contact center solutions continue to gain in popularity as companies focus on improving customer relationship management and overall customer experience. At the heart of these solutions is automated speech recognition (ASR) – or speech-to-text (STT) technology – which provides the data that the analytics platform uses to uncover sales, marketing, and operational insights. Because of this key capability, many contact center solution providers are embedding ASR technologies into their offerings.

The latest advances in speech-to-text technology enable analytics software to deliver the most accurate and complete customer insights. To do so, robust ASR solutions must be able to do the following:

Capture All Call Data 

When you consider that a contact center might have 10,000 or more call-center representatives, each with an average call time of about 7 minutes, that’s a lot of information being recorded on an ongoing basis. Being able to easily retrieve and analyze all of this data is critical to contact centers – providing them with insights into customer behavior and perspectives, as well as agent practices and effectiveness. Unfortunately, until recently it has been costly and impractical for contact centers to derive actionable insights from all of the data available in call recordings. 

This limitation drastically reduces the effectiveness of analytics software, which depends on having access to huge volumes of data in order to produce the best results. However, advanced ASR technology can cost-effectively transcribe 100% of contact center calls. Advances in AI and improvements in computing power (especially graphics processing units, or GPUs) enable high-speed processing of massive quantities of data, with an order of magnitude less hardware, than was required just a few years ago. Providing access to all customer data is a game-changer for analytics solutions, giving contact centers newfound, valuable insight into their data.

Provide Comprehensive Language Support

While support for multiple languages is important in today’s global marketplace, it’s actually more important that ASR software is able to adapt to multiple dialects and accents within the same language. In fact, STT adaptation capabilities need to go even further in order to effectively address the specific needs of contact centers. Since jargon, acronyms, and terminology vary from industry to industry, ASR solutions need to be able to provide industry-specific language models, enabling the software to understand and accurately transcribe industry-specific language. Moreover, as each individual company has its own internal terms that need to be transcribed accurately – such as product and company names – ASR solutions need to be flexible enough to adapt to each unique implementation.

Conduct Sentiment and Emotion Analysis, as Well as Gender Identification 

As improving the customer experience becomes a key priority, it’s important for contact centers to be able to identify caller sentiment and emotion. Advanced STT technology uses acoustic analysis – based on an AI methodology known as “deep learning” – to analyze the caller’s tone, pitch, volume, intensity, rate of speech, and other characteristics to identify customer emotion. This capability goes beyond simply identifying callers showing obvious signs of distress, such as shouting. Subtle combinations of acoustic features that convey frustration or disappointment can be identified using modern machine learning methods. 

Simultaneously, the ASR system uses another AI model to conduct linguistic analysis on text produced from speech to identify the sentiment being expressed. The sentiment analysis sub-system is trained on text segments labeled as being positive or negative.  It then uses what it has learned from these examples to identify novel phrases as being positive or negative. As with emotion detection, sentiment analysis can pick up on subtle cues in people's word choices as well as more obvious cases, such as “I want to cancel my account,” or “I am upset” to indicate a negative sentiment or “This was very helpful” to indicate positive sentiment.

Emotion and sentiment results can be combined into a single score and tracked over the course of a call to identify whether the customer is becoming more or less happy as the call progresses, which can be used as an indicator to the call agent in real-time or used for agent scoring.  These scores also aid with the identification of call recordings that are good examples to include in agent training.

In some situations, such as when a contact center conducts customer surveys or wants to differentiate between different speakers in a household, it may be important to identify the gender of the speaker. A robust ASR system will include the ability to identify the most likely gender of the speaker using voice characteristics.

Redact Sensitive Data

In today’s highly regulated business environment, contact centers must protect customer privacy and data security. In addition to redacting personally sensitive information (such as social security numbers) from all calls, there are specific regulations governing data protection and privacy with which contact centers must comply. For example, the Payment Card Industry (PCI) has standards requiring organizations to ensure the security of any payment card data that is stored. The Health Insurance Portability and Accountability Act (HIPAA) regulates the handling of a customer’s personal health information, including how it is stored. For EU citizens, the General Data Protection Regulation (GDPR) specifies how customer data may be processed and stored to protect privacy. Customer phone numbers, credit card numbers and other personally identifiable information (PII) provided during the call must be protected and kept private.

The most effective way to remain compliant with these regulations is to automatically redact sensitive information during transcription. In order for an ASR system to reliably identify sensitive words, there must be a high level of confidence in the accuracy of the words being transcribed, so that private information is not inadvertently missed during redaction.  It is also vital that sensitive data be removed from both stored audio and transcripts. Redaction provides protection from both external and internal threats by completely and permanently deleting sensitive information from call data.

Integrate With Other Solutions in the Contact Center Ecosystem

A technology ecosystem is quickly becoming established in the contact center industry. Interactive Voice Response (IVR) systems, data visualization, ASR, SMS text messaging, data visualizing systems, call recording systems, analytics software, and CRM systems all need to be able to work together seamlessly. To play a key role in this new environment, ASR systems need an open API architecture, so the overall contact center solution can scale and grow without interoperability problems and customization costs. This also means being vendor-agnostic, so the ASR solution can integrate with any recording or analytics platform. Additionally, the solution should be deployable both in the cloud and on-premises to adapt to customer needs.

The market is moving toward more sophisticated and integrated contact center solutions – a broader contact center ecosystem. ASR technology plays a key role in this environment and must be easy to integrate with other contact center components. In addition, there are several other must-have advanced ASR capabilities to help analytics platforms perform more accurately and effectively. By enabling contact center solutions to capture all the data, identify sentiment and emotion, provide comprehensive language support and more, ASR technology will unlock the key to data insights that will be a game-changer for analytics solutions and contact center end-users alike.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

The Internet of Things Is Getting Emotionally Intelligent

As IoT devices have exploded, we're approaching a new paradigm in which the internet-connected devices in our lives are emotionally intelligent and able to react to and interact with the world.

Are You Listening to Me? Why VoC is Crucial to Business Success

Today's consumers don't just want personalization, they expect it. And the call center is no exception. As organizations attempt to meet these expectations, gaining an understanding of the Voice of the Customer is imperative.

The Key to Business Growth: Using AI to Break Barriers in Sales Conversations

Businesses let hundreds of millions of dollars slip through their fingertips because they fail to recognize opportunities embedded in conversational nuance. But AI can help.

Video: Detecting Deception in Speech, Pt. 1: Methods

Interactions LLC NLP Scientist Yocheved Levitan discusses a recent research study on identifying cues of deception in speech, and machine learning-based approaches to classifying deceptive speech in this clip from her presentation at SpeechTEK 2019.

Would You Like Fries With That? Speech Technology at the Drive-Thru

McDonald's recent acquisition of Apprente was big news, but voice-ordering is nothing new to the world of fast-food. This move just goes to show how important speech technology will be in the future of QSRs.

Protecting User Data: How Close is the US to its Own GDPR?

GDPR has already had wide-ranging consequences for companies collecting data, and now some are calling for federal regulations in the U.S. Voice-data isn't exempt from the regulations, and vendors need to be ready.