TTS Finds Global Media

With the advent of multilanguage text-to-speech (TTS) that is attractive to the ear, the world is beginning to embrace the technology as a viable language solution. From television to Web sites, interactive voice response to ATMs, corporate videos to games and toys, the world is rushing toward a new destiny that will be multilingual, enabled through TTS and automated translation.

Within the next five years, at least 50,000 major Internet channels will be running 24/7, as well as hundreds of thousands of corporate channels and rentals. Information and Web media that previously were slow, text-heavy, and clunky will become fast, minimalistic, and chock-full of action and movement.

With this change in communications will come shorter content that is designed for minuscule attention spans. Longer content will be split into smaller chunks and delivered by TTS or as flying text over images. And everyone in the world will be crawling over every Web site and other media expecting to see and hear content in their choice of languages. The pressure is on for developers of TTS and automated translation to perform miracles to reach that goal.

Users will expect personalization in any language. “With TTS we are bringing human life and personality to the channels—television, mobile phones with Web interfaces, Web applications, and ATMs—all with human personalization,” says Umberto Basso, founder and CEO of H-Care, an Italian firm that develops TTS platforms for international media and Web applications. “It is all about personalizing the experience, yet being consistent with whatever the customer or prospect is doing, in multiple languages.”

Language Demands Drive Development

TTS is gearing up for the linguistic changes to world media. For example, Nuance Communications’ Vocalizer for Network has been rolled out in 24 languages and 34 voices. By the end of 2010, it will have grown to more than 40 languages and 50 voices. Loquendo offers a complete range of speech technologies, including TTS in 28 languages and 70 voices. Both of these bring breathtaking results in TTS that astonish audiences of actors, directors, and producers, who gasp when they learn that this is today’s TTS. And the actors quake in their shoes. Now is the time for them to hitch their wagons to this train because the march toward TTS innovation is only just beginning, expanding quickly across Turkey, Israel, India, China, Poland, and other countries.

The European Union supports emerging speech and translation technologies by providing grants and funds to developers. This speech funding requires multiple partnerships among countries, and the entire EU database of dictionaries is donated free of charge to any company striving to improve automated translation.

Pelle Nauclér of Nordisk Undertext in Sweden sees language and media sharing as inseparable. “Streaming is still in its infancy, and people are happy just to have streaming media,” he says. “But the demand for localized content—both voice and text—will grow fast, and discontent with monolingual content will increase until it materially affects revenue. In a few years, content that is not localized will be at the bottom of the search engines and the bottom of the number of views.”

As TTS becomes a key factor in translating video voices for product demos, corporate videos, and training multimedia, the words spoken with TTS will be a major boost to businesses, helping them rise above the global cyber clutter. PLYmedia’s Nimrod Kozlovski, who heads up strategy and legal affairs, and Yoni Silberberg, head of business development and cofounder, are among the pioneers tackling audio search optimization affected by TTS. “Our technologies extract certain keywords from the audio or text and take action on the content as part of smart optimization so that audio and video content is better processed and ingested by search engine optimization in various languages,” they said in a statement.

It will be vital for such technologies to function with each TTS result if a Web site is to be found in the upcoming noisier, more bloated World Wide Web.

TTS, Television, and the Movies

TTS is making fast headway with consumer exposure in Europe as it reaches the first steps on the small screen. “One of our applications is an electronic television program guide with an avatar as the anchorman,” H-Care’s Basso says. “The avatar guides the viewer through the selection of movies, audio, and shows on the television screen. TTS reads the names of the shows, the actors’ names, and show schedules. Importantly, with TTS dynamic action we can pass the show information in advance through the TTS engine, then use tools like [Loquendo’s TTS] Director to assure that the show names will be properly spoken and tuned.”

Such tools are crucial to creating TTS that is appropriate for media, enabling different sounds, intonations, and paralinguistic events, and even changing the pronunciation and voice to better reflect the context and intent.

Perhaps the most explosive potential for TTS will be in the translation of movie and audio voices from one language to another. Today’s TTS developers are hustling, under enormous pressure, to discover how to extract from TTS the emotion and depth of performance that is so vital to media translation. “For media applications, in terms of the direction TTS is going, we are making great headway on the issues of quality, naturalness, and accuracy,” says Marc Fabiani, network TTS product manager at Nuance. “We will nail cloning a voice sooner than we will nail voice acting.”

According to Fabiani, the Nuance PromptSculptor, a tuning tool that comes with Vocalizer for Network, allows units from a speech database to be recombined to modify TTS. “Typical Vocalizer speech databases contain hundreds of thousands of units of speech, so there is a great deal of variety to draw upon, and you can nudge the output very similarly to the way you work with a human being,” he says. “Using this tool the TTS voice results from cloning will sound natural, like the French version of the same actor—for example, Jack Nicholson speaking in French.”

Creating a foreign voice that truly sounds like Jack Nicholson requires flexibility in pitch, stress, duration, emphasis, and expressivity. “Furthermore, in terms of inference, how can TTS know how to interpret a line of dialogue? There is not enough information in the text itself to embody interpretation. Advanced artificial intelligence is required,” Fabiani says. “Then to add complexity, how [do you] accomplish this from a different language? That is the real stretch because languages are not mapped closely.”

TTS for Business Media

Lack of passionate performance notwithstanding, TTS is ready—now—to facilitate many media needs. Global companies with worldwide branches, smaller companies trying to go global, e-learning, and Web games are all drained by the costs of producing international versions, often in dozens of languages, plus the need to internationalize immediately, not after weeks of delay. Most companies skip three-quarters of all languages for media translations, or drop translation altogether, due to the cost. Clearly, affordable TTS as an on-demand service represents a viable alternative to costly and continual recordings around the world.

TTS results can be e-retailed through various sources, like www.TranslateYourWorld.com and other localization centers. Promotion by these centers will help bridge the gap between the old image of TTS and the new, and they will be the avant-garde promoters of TTS for consumer implementation. Additionally, because TTS is computerized, it can be modified, upgraded, and improved as the original media script changes or is rewritten, bringing recurring revenue with each addition or modification.

Rosanna Duce, vice president of international sales at Loquendo, sees movement with each technological advance. “Today text-to-speech may still be used most heavily for IVR, weather forecasting, avatars, animations, online radio, and similar applications,” she says. “However, text-to-speech is preparing for media, from character roles to narration, and we’re waiting for the concept to catch on with content developers. The core question is not what are we doing now with text-to-speech, but, rather, what will be possible in the future?”

Perhaps it is time for the world at large to discover that TTS can help globalize companies—today. And TTS will improve like fine wine, growing better over time.

It is inevitable that people around the world will want to flip from voice to text—and text to voice—from one language to another, at the touch of a finger or spoken command. That wave is coming, if it’s not already here. Countries that embrace TTS and content localization will be first to the global finish line.

“Get used to the idea of text in and speech out because someday this will be Main Street. We are all moving our ship into the lock, and pumping up the water to the next level,” Nuance’s Fabiani says.

The global voyage for TTS is about to depart, and now is the time to jump on board before you miss the boat.

Sue Ellen Reager is CEO of @International Services, a global translation and localization firm. She can be reached at sueellen@internationalservices.com.

TTS Finds Global Media

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

VoicePing Releases VoicePing 3.0

Voiskey Officially Launches

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

DeepL Acquires Mixhalo

The Voice Can Sound Right, and the Video Can Still Be Wrong

Canary Speech Partners with NeuroLexIQ

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Voicelyt Launches Voice Score

DXC Partners with ElevenLabs

Nabla Launches Dictation for Mac

Fish Audio Raises $52 Million in Seed Funding

Deliverect Partners with SoundHound AI

OrcaRouter Launches OrcaDub