Speech Technology Magazine

 

Voice Translation Hits Its Stride

Cross-language communication is globalizing business and affecting the world.
By Sue Ellen Reager - Posted May 1, 2013
Page1 of 1
Bookmark and Share

This year, the World Trade Center in Atlanta conducted its first online global seminar, in 78 languages. Air France held its first Web conference in 20 languages. The World Wildlife Fund organized its first Webinar subtitled in the major languages for participants. These are indicators that 2013 will be the year voice translation becomes an accepted part of the global landscape and affects the way the world does business.

Voice translation is created by applications that combine many speech products to produce a continuous flow of automated spoken voice and/or subtitles that are a translation of a speaker's conversation. It serves as an automated interpreter from one language to another, as well as a provider of automated captioning for the hearing impaired.

In recent years, there have been many apps for cross-language communication. One glance at Google Translate (which now speaks and functions on mobile Apple and Android devices), plus focused apps such as WordLens, iSpeak Russian, Jibbigo, and TripLingo, show the increase in popularity of spoken translation apps. With these, users speak in tweets (limited number of characters) and then wait for the text-to-speech (TTS) audio conversion. These apps are the forefathers of today's cross-language communication software that is only just hitting its stride. As of this year, major conferences, live conversations, and online presentations can become multilanguage, using new products like TYWI Live or Mobile Interpreter.

Feeling the Impact

How will voice translation affect us? Imagine bringing SpeechTEK online in dozens of languages and inviting the world to attend. In the past, the cost of globalizing such a conference could run $20,000--you would need multiple interpreters, interpretation booths, cables, microphones, channeled headphones for listeners, plus AV technicians, all to provide language conversion only for the people in the same room as the speaker. Budgeting for breakout rooms was beyond consideration.

But this year, interpretation arrives via the Web simultaneously in the same room and across the ocean. With these new technologies, the cost of language conversion drops to as little as 5 percent of the previous cost, plus offers dozens more languages. Speeches and exhibitor demonstrations become multilanguage, transported live in real time around the world, expanding the reach of investment.

The Big Three

Why will voice translation hit its stride this year? Because our industry has reached a maturity that permits multilanguage speech applications with quality results. For decades the "quality results" were the stumbling block. But as of this year, our Big Three--speech recognition, automated translation, and text-to-speech--have become mature technologies on a global scale.

All three technologies are required to function in many languages at a certain accuracy level to successfully convert one language to another. Until this year, at least one of the three was weak in each language, with weakness breeding inaccuracy and inaccuracy breeding unusability.

Translation and speech recognition. The best text output for voice translation is blocks, phrases, or sentences, because automated translation is most successful when converting text into another language that contains subject, verb, and object. This need for blocks results in short delays for processing, and increases the tug-of-war between linguists striving for accuracy and technologists striving for speed.

Automated translation. Even Google Translate is massively more mature than in the past. Add to that the dozens of other software options for professional automated translation and we enter a new life cycle of automated translation that can be trained and improved by users to align with their personal and business terminology.

Translation and text-to-speech. Our deepest gratitude goes to Apple's Siri, which somehow overcame the decades-long barrier to popular acceptance by attracting the consumer to our industry. The technology of today's text-to-speech in multiple languages has become accessible, producing understandable to high-quality output for voice translation applications, the TTS results of which are accepted by the public, thanks to Siri's pioneering efforts.

At this time, voice translation is strange and surprising. It is not perfect, but it is communication. The world will become more comfortable with it, and each of us will begin to attract hundreds or thousands of new potential buyers around the world, capturing them in ways heretofore inconceivable until the Big Three reached maturity.


Sue Ellen Reager is CEO of @International Services, a language and software solutions company that also performs translation, voice recording, and global system testing for speech and DTMF applications as well as media localization.


Page1 of 1