Nuance Advances Text-to-Speech Technology through Deep Learning

Nuance Communications, Inc. announced that it has advanced its text-to-speech (TTS) technology with deep neural networks (DNN) to deliver a new standard of quality, reducing errors by 40% compared to previous speech synthesis techniques. Combining advancements in deep learning with knowledge-based developments, Nuance’s Vocalizer suite of TTS solutions – including Vocalizer Embedded for embedded platforms, Vocalizer Server for cloud applications and the Vocalizer Studio development tool – enables speech output that is nearly indistinguishable from human speech, enriching user experiences across automotive, enterprise, healthcare, IoT, and smart home offerings and resulting in a more intuitive and conversational interaction between people and machines. The application of artificial intelligence (AI) techniques gives Vocalizer the ability to quickly learn new words, phrases, and pronunciations, and communicate with more expression and personality across more than 50 languages.

Nuance’s approach to use deep neural networks for speech synthesis is as follows. First, the networks learn the relation between written text and the corresponding voice characteristics from Nuance’s vast speech data. Then, the system applies this knowledge to the words and phrases in an unseen text. In addition to learning the relations between the orthographic representation of the words and the acoustic output, Nuance’s deep neural nets also use the context of the utterances to ensure that words are spoken in the appropriate expressive manner for the application, with the proper pattern of stress and intonation. For example, street names and driving directions sound clearly intelligible and articulated, whereas dialogs with a virtual assistant sound more fluent and dynamic.

Key applications of Nuance Vocalizer include:

  • Automotive in-dashboard systems and virtual assistants
  • Robotics and autonomous virtual agents
  • Digital television and set-top boxes
  • Omni-channel customer engagement services

Nuance’s enhanced text-to-speech solutions are available for the cloud today and will be made available for embedded devices this year.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues