Speech Technology Magazine

Speech Synthesis With Emotion

By Leonard Klie - Posted Jan 5, 2011
Page1 of 1
Bookmark and Share

>>>>> Business Problem:
Basic text-to-speech applications often sound robotic and lack the variations in tone,
pitch, speed, etc., that occur during normal conversations.

>>>>> Technology Solution: Speech Synthesis With Emotion


Product: Nuance Vocalizer 5
Delivery Method: Installed software.
Features/Functionality: Vocalizer 5 supports more than 40 languages and can handle audio from a library of static prompt recordings or by generating dynamic prompts. A proper name dictionary, which is updated periodically, assures correct pronunciation of names and addresses. The Vocalizer Studio suite of tools allows users to create text-processing rules and modify pronunciations. Vocalizer Studio also includes PromptSculptor, an interface for customizing the intonation and expressivity of speech output.
Business Benefits: With Nuance Vocalizer, callers hear a seamless flow of speech, free from undesirable artifacts, such as clicks and gaps that surface when using separate systems for playback of recordings and computer-generated speech. Enhanced speech quality and accuracy is provided by organized text processing, more comprehensive pronunciation dictionaries, and complete voice refreshes.
Contact: Nuance Communications at 1-781-565-5000; http://www.nuance.com.


Product: MARY Open-Source Emotional Text-to-Speech Synthesis System
Delivery Method: Installed software.
Features/Functionality: MARY TTS is a multilingual (German, U.S. and U.K. English, Turkish, and Tibetan), multiplatform (Windows, Linux, Mac OS X, and Solaris) speech synthesis system written in Java. With the EmoSpeak tool, MARY can synthesize emotionally expressive speech using diphone voices. Users can also select expressive unit voices, such as a German soccer announcer. Using the Speech Synthesis Markup Language (SSML) <prosody> tag, MARY can control the intonation generated for HMM-based voices from markup and change the shape of the intonation curve. MARY/XML also supports a new <vocalization> tag, with which users can request the generation of nonverbal or paraverbal expressions, such as “yeah,”
“m-hmm,” laughter, sigh, etc.
Business Benefits: MARY can read and interpret several markup languages, including SSML and Agent Player Markup Language (APML). Several audio formats can be generated, including 16-bit wav, aiff, .au, and .mp3. Each request is processed in a thread of its own, which allows the server to process multiple requests in parallel.
Contact: DFKI at +49-0-631-205-750; mary.dfki.de.


Product: Loquendo TTS
Delivery Method: Installed or embedded software.
Features/Functionality: The tone, pitch, speaking rate, and volume of each of Loquendo TTS’s voices can be fine-tuned and fully controlled within the Loquendo TTS Director or Loquendo TTS Voice Experience environments. Voices have been enriched with expressive cues that allow for highly emotional pronunciation. These cues, which can be typed directly into the text with the appropriate punctuation or selected from a drop-down menu, contain conventional figures of speech, such as greetings and exclamations (hello, oh no, thank you), interjections (oh, well, hmm), and paralinguistic events (breathing, laughing, coughing), to convey additional layers of expressive intent, such as gratitude, doubt, or confirmation. A Pronunciation Lexicon ensures that specialized vocabulary, abbreviations, acronyms, and even regional pronunciation differences sound as the developer intended them. Loquendo supports 30 languages with a total of 72 voices.
Business Benefits: Loquendo TTS is able to read any kind of dynamic data and prompts in server-based telephony, multimedia, multimodal, and embedded applications.
Contact: Loquendo at +39-011-291-3111; http://www.loquendo.com.

Page1 of 1