The 2019 State of Speech Engines

Article Featured Image

A Look Ahead

While vendors have been extending their proprietary systems, interest in open system solutions is on the rise. Mozilla sponsored the Project DeepSpeech and Project Common Voice speech recognition projects. The group released a voice data set with contributions from nearly 20,000 individuals globally. The solution includes prebuilt packages for Python, NodeJS, and a command-line binary that developers can use right away to experiment with speech recognition.

The movement helps the industry in a few ways, according to Speechmatics’ Firth. Open-source sharing of ideas improves product quality as companies either use the systems themselves or create replicas of the best parts. Also, open-source systems enable students and hobbyists to gain experience and possibly entry into the field, which provides companies with a wider pool of candidates to recruit from, driving up speech development work quality.

Sentiment analysis is another area of emphasis. “A lot of interesting work is being done in areas like examining vocal pitches and emotion detection, so businesses can respond to their customers more emphatically,” notes Rakesh Tailor, director of product management for speech at Genesys.

The goal is to have the system understand a person’s mood and respond to them appropriately. This capability could be especially useful in the contact center. In some cases, a customer may prefer to talk to a person rather than a machine and become frustrated when having to wade through a series of voice prompts to get to where she wants to go. Sentiment analysis solutions can identify consumers ready to blow their top, so the firm can take steps to correct the situation.

However, many of the early attempts at sentiment analysis have been a bit crude. Vendors have been trying to understand how to parse word usage and identify words or phrases that illustrate an individual’s mood. The challenge can be daunting. “Words have different meanings in different contexts and from different individuals, so determining a person’s mood is quite complex,” says Stephen E. Arnold, analyst with ArnoldIT.com and producer of DarkCyber, a video blog.

Consequently, vendors are looking at ways to connect different solutions and build more rounded views of the customer. Vendors are pairing speech with other solutions, like speech analytics, social media, and facial recognition, to respond more effectively to customers. Suppliers typically start with speech analytics applications that are equipped with artificial intelligence and machine learning, so the applications can make deductions about information. 

Another area with potential is pairing facial recognition with speech recognition, according to Arnold. Correlating two inputs has promise because each channel can be used as a check on the other. However, the work is just beginning and the tangible, consistent results are lacking. 

Speech engines continue to work their way into more applications as suppliers have continued their quest to improve the systems’ accuracy. Now those suppliers are extending their systems to open source and sentiment analysis, changes that promise to further widen their usage base.

Paul Korzeniowski is a freelance writer who specializes in technology issues. He has been covering speech technology issues for more than two decades, is based in Sudbury, Mass., and can be reached at paulkorzen@aol.com or on Twitter @PaulKorzeniowski.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Hoya Text-to-Speech Engines Gets Top Accuracy Marks

ReadSpeaker--a HOYA Speech company--announced that the two Hoya Speech text-to-speech (TTS) engines, rSpeak and Neospeech (which uses Hoya Speech's VoiceText TTS engine), topped the 2018 Voice Information Associates Text-to-Speech Accuracy Testing charts.

There’s Never Been a Better Time to Be a Developer