Speech Engines: Improving Accuracy and Finding New Uses
Speech Systems Gain Intelligence
Increasingly, vendors have been relying on Big Data to collect information and analytics to make sense of their vast collections of spoken words. With data volumes growing, fine-tuning these systems has become more challenging. Rather than rely on humans to do the work, the suppliers have been turning to artificial intelligence to improve system accuracy.
Machine learning is also becoming more common. Increasingly, vendors are dabbling with deep learning and building neural networks, which are connected computer systems that mimic biological networks. Rather than being programmed for each task, these systems learn by themselves and progressively improve system performance.
So what new applications become possible with such technical improvements? More variety is on the horizon. To date, speech recognition products have had a horizontal focus and provided generic-adult, often robotic-sounding speech responses. As technology advances, customization in various forms is becoming possible.
In the enterprise, systems designed for specific industries are taking shape. “The push to vertical markets is certainly something we continue to see,” says Robert Weideman, executive vice president and general manager of Nuance’s Enterprise Division.
Healthcare has also become a major area of emphasis. Nuance’s Dragon Medical One is designed to speed up medical data entry, allowing doctors and nurses to speak to create notes and annotations for patient records.
In addition, the company developed a radiology application based on its Nuance PowerScribe 360, Nuance mPower, and the Nuance PowerShare Network. The product enables healthcare providers to collaborate and create diagnostic reports based on medical images. The vendor also integrated its PowerScribe Workflow Orchestration software with lung cancer screening software from Primordial to streamline doctors’ diagnoses of cancer patients.
But healthcare is just one vertical; vendors are rolling out new products for a slew of other industries. Dragon Law Enforcement speech recognition replaces text with speech entry whenever law enforcement professionals enter daily report information, check license plates, or examine arrest records. Law offices are another area of emphasis. With Dragon Legal, individuals create, edit, and format case files, contracts, and briefs by voice.
Additionally, in the consumer market, more user interfaces are emerging. Acapela Group has been developing tools for visually impaired individuals, and its My-Own-Voice program enables people about to lose their voice to disease to preserve it for the future.
Emotion detection is another area of emphasis. “Microsoft and IBM have been doing interesting work with emotions and trying to find ways to interact with customers more effectively,” Dahl says.
In this case, companies want to understand how individuals are feeling so they can better support them. This capability could be especially helpful in the contact center, where individuals often become frustrated working through various prompts to get help. Systems today can identify consumers ready to blow their tops in real time or near real time so companies can quickly take steps to assuage them.
Technology constraints have long ruled the speech engine. Recent developments, like cloud computing, are making it easier for vendors to customize their services, opening the technology up to new markets.