Speech Technology Magazine


Nuance Launches the Nuance Transcription Engine

Nuance Transcription Engine is capable of translating a minute of audio into text in six seconds.
By Tye Pemberton - Posted May 9, 2016
Page1 of 1
Bookmark and Share

Nuance Communications has launched the Nuance Transcription Engine (NTE), an audio-to-text tool that the company says can be used in a wide range of practical applications from software development and intelligent assistance to real-time translation.

Nuance claims that the solution can achieve 88 percent accuracy and a real-time factor (RTF) on pre-recorded audio from one times at maximum accuracy to an upper limit of 10 times, capable of rendering a text transcript of a full minute of audio in six seconds.

"The recent market demand for big data and for voice of the customer data has encouraged Nuance to bring NTE to the market to satisfy the need," says Greg Pal, vice president of marketing, strategy, business development, and enterprise at Nuance. "Companies have recognized that they have large amounts of unstructured data that could be used to their benefit."

To that end, NTE can output in two formats. The first format is designed to generate human-readable transcripts in a standard presentation. The second is more technically oriented and is optimized for analytics use-cases, producing n-best lists for ambiguous audio, as well as word-level time stamps to make each transcript data-rich and maximally searchable.

NTE is also equipped with multi-speaker recognition and can assign transcript elements to specific speakers (even in monoaural formats where positional data from the audio is narrow) while elements like noise and silence are also flagged. This, in combination with the engine's speed, allows alternate applications to transcribe and stream live audio with only a 10-second delay.

According to Pal, early use cases for the NTE include transcribing large volumes of audio content within call centers to provide rich customer insights and improve service; transcribing corporate audio and video assets, such as Webcasts, Webinars, and interviews, for rapid search and index; and to act as the key enabling technology for speech analytics or for insights pertaining to compliance. Veritone, a cloud-based tech company that specializes in cognitive and analytic tools, is currently using the NTE to convert streaming and recorded media into rich data sources for analysis.

Pal identifies early verticals for the NTE as broadcast media and voice of the customer data, wherein call center audio could be mined for actionable items.

NTE can be deployed on-premises or in a hosted data center and is designed to scale to meet the demands of extremely high-volume use cases.

When asked about best practices to ensure the highest success for the NTE, Pal replied, "Like any speech recognition technology, the accuracy and quality of NTE's transcription is tied to the quality of the audio that is ingested by the engine, so we work hand-in-hand with our strategic partners to explore optimal audio acquisition techniques."

Pal also noted that while the NTE is a technology license designed for implementation by client engineers, Nuance offers professional services to assist clients as needed.

Currently, the NTE supports 15 languages and 30 dialects, including U.S., U.K., and Canadian English, Japanese, Cantonese, Hebrew, and both the World and Gulf dialects of Arabic.

In the near future, Nuance has planned strategies to increase speed, accuracy, and language capabilities.

Page1 of 1