▼ Scroll to Site ▼

The 2019 State of Speech Engines

Article Featured Image

“Fine-tuning” may be a good way to describe the speech engine market nowadays. Suppliers like Acapela Group, Amazon, Google, Nuance, and Speechmatics have built solid foundations based on natural language processing that have a high degree of accuracy in translating spoken words. The vendors continue to improve their algorithms so more words are recognized; expand their systems to support more languages; tailor their solutions to vertical applications; and offer customers more ways to customize these solutions and improve their business. 

The Year in Review

Accuracy is a never-ending quest for engines. “Accuracy is still a key driver for nearly all markets, but accuracy does not mean just WER [word error rate],” says Ian Firth, vice president of products at Speechmatics. “Each industry has its specific factors that influence accuracy. For example, it can mean punctuation, speaker detection, and also ‘real world’ accuracy, where noise and over-talk are issues, [which are] not found in pure, clean, and ‘fake’ scenarios.”

In response, vendors have been taking various steps so their systems recognize more words. In April, Speechmatics launched a Custom Dictionary feature, which allows businesses to add any arbitrary word to the list of words available for recognition. The company claims that the new feature increases system accuracy by more than 10%. 

Customization is another area of emphasis. While speech engines can be used in any market for any application, customer service has been an area of emphasis. The solutions enable corporations to offload work from agents to systems and reduce their operating costs. “In the long term, the use of speech engines, chatbots, and virtual agents will lead to increased automation in the contact center,” says Ian Jacobs, a principal analyst at Forrester Research. “In some instances, the bulk of the simpler interactions will be handled by machines, and staff will be assigned to more complex problems.” 

In March, Nuance Communications enhanced the voice engine used in the Nuance Omni-Channel Customer Engagement Platform. The Zoe voice provides natural sounding text-to-speech voices that can be customized to fit an organization’s brand. Customers select from 53 languages and 119 voice options, 17 of which are multilingual.

Acapela Group offers MyOwnVoice, which enables businesses and individuals to tailor its speech engine. Initially, the solution focused on making systems more accessible to individuals with health problems. Recently, the system has been extended to other areas. Transportation companies are an area with growing interest. “Transportation companies make a lot of public announcements and want to do so in a consistent manner,” noted Rémy Cadic, vice president of sales at Acapela Group. Robotics companies are also using voice to make their solutions easier to use. Automobile manufacturers are integrating voice into their GPS systems so drivers are alerted to potential problems, like backups and accidents. Education, financial services, and smart toys are other burgeoning markets. 

In September, Speechmatics introduced Sounds Feature, which offers transcript personalization. The system is suited to media and broadcast companies and creates transcripts so that users understand differences among word pronunciations. The company also teamed with Red Bee Media, part of Ericsson. Red Bee Media’s automatic speech recognition (ASR) technology and Speechmatics’ Subito Live subtitling platform provide same-language subtitling for social media and other unregulated channels, making information accessible for the deaf and hard of hearing.

SpeechTek Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Hoya Text-to-Speech Engines Gets Top Accuracy Marks

ReadSpeaker--a HOYA Speech company--announced that the two Hoya Speech text-to-speech (TTS) engines, rSpeak and Neospeech (which uses Hoya Speech's VoiceText TTS engine), topped the 2018 Voice Information Associates Text-to-Speech Accuracy Testing charts.

There’s Never Been a Better Time to Be a Developer