Speech Solutions for Media Makers

With speed-reading subtitles, the words are impressed upon the brain, producing greater emotion and reaction than with standard subtitles.

For decades, the media industry has suffered under the yoke of the overwhelming cost of media translations through redubbing voices, lip syncing, and other human voice translations. The mountains of media produced each year have for the most part been unilingual, available in only one language. Those that are translated are generally limited to major motion pictures, and then only in a few languages.

The term media covers a massive scope of visual and audio materials from corporate promotional videos and training to cartoons, documentaries, dramas, business podcasts, Flash, and streaming live events. The cost of translation for a media half hour varies from $6,000 per language for corporate to $25,000-plus per language for television.

Today's headache for media creators is how to monetize media distribution over the Internet. One of the strongest contenders for monetization is media globalization, where users pay per view or per subscription in order to view in foreign languages. The speech industry holds the key to one of the main avenues of media globalization: enabling mass translation of content while dramatically lowering costs.

Media Translation Through Speech Products

The first dramatic change speech products bring to the scene is the ability to generate automated subtitles. When a narration track without music and effects is run through speech recognition, automated subtitles in the original media language can be created and stored. These are run through automated machine translation software to produce a generally understandable conversion from one language to another. If desired, these "generally understandable" subtitle translations can be improved in online interfaces.

The past major detractor of subtitles has been the need to read one or two lines of content displayed below the image, forcing viewers to spend 90 percent of their time reading the bottom of the screen. A new text subtitle solution, speed-reading subtitles, may be the perfect companion for automated speech-to-text.

Speed-reading Subtitles

Speed-reading subtitles morph the concept of subtitles from two long lines constantly at the bottom of a screen into a real-time adventure in speed-reading. Words flash onto the screen and into the mind. The words are then read, felt, and understood easier and better than with traditional reading methods. Speed-reading subtitles flash upon the screen and disappear immediately so that the reader continues to stare directly at the main image, rather than at the bottom of the screen. The words are impressed upon the brain, producing more emotion and reaction to written text than the traditional reading method.

The ideal subtitles for the Web are "overlays," text superimposed by the online video subtitle player at runtime. These overlays function on desktops, laptops, tablets, and smartphones. With overlays, one single media will be available in 200 languages and dialects, with the language selection available through a changeable dropdown menu.

Older Web subtitles were burned directly onto the media, so for 10 languages, 10 copies of the media were required. Cost to "burn in" varied, but was generally around $750 per language per half hour of media. The overlay approach offers users a wider variety of languages, eliminates storage issues, and removes burn-in costs. Overlays can be updated or modified at any time and go live instantly; the burned-in approach costs at least $1,000 per half hour to correct or change.

Combining speech recognition's ability to create original language subtitles with automatic translation into speed-reading subtitles will offer media creators the potential to take their media into dozens of languages at low cost. With speed-reading, subtitles also become a more viable option for advertising and marketing pieces.

Company Web sites will become living, breathing media over the next few years. The static pages we now click to go to the next page will navigate via spoken word, and the static text will become a moving artistic sharing of short blips of information supported by voice (including TTS). This morphing will turn company Web sites into Internet media ripe for implementation of these new speech-to-text subtitle experiences whose purpose is attracting potential buyers for company products from around the world.

Sue Ellen Reager is CEO of @International Services, a language and software solutions company that performs translation, voice recording, and global system testing for speech and DTMF applications as well as media localization.