Speech Technology Magazine


WebRTC Enables Dual-Language Speech

Anticipating the powerful impact of cross-language talking.
By Sue Ellen Reager - Posted Nov 10, 2014
Page1 of 1
Bookmark and Share

According to Webplatform.org, "The RTC in WebRTC stands for Real-Time Communications, technology that enables audio/video streaming and data sharing between browser clients (peers). As a set of standards, WebRTC provides any browser with the ability to share application data and perform teleconferencing peer to peer, without the need to install plug-ins or third-party software."

From a linguistic perspective, WebRTC is perhaps the most exciting change in communication in the last decade. Most of the international language community (basically, 3 billion people) does not yet know of the upcoming upheaval that will be caused by WebRTC. But its impact will be powerful, because WebRTC enables developers to create multilanguage speech applications that can be used from the same device by anyone in the world. This "translates" into applications that allow people in the same room to talk across languages using speech recognition and auto-translation.

The world has had an urgent need for across-language talking since the advent of mobility. I myself have traveled extensively on business. Each time I went to a new country, I struggled endlessly to try to extract information from an airport information booth attendant, stumbled trying to communicate with hotel receptionists, and ate mystery food at restaurants. In Rome, it took 30 minutes of hand-waving for the doctor to comprehend that I was allergic to the medication he was giving me, and after an operation in Morocco in a Bedouin clinic with no windows and flies everywhere, I awoke to an entire staff that only spoke Arabic—and I had no idea whether I would live or die.

WebRTC makes it easy for a browser to transport the sound of a speaking voice as audio to a remotely located speech recognition server in the proper language. Just a tap on a tablet or click on a laptop can switch languages. This approach returns a text result that is then translated and also potentially delivered as TTS translated audio.

Today's technology has reached that ephemeral point in time when all of the basic components are available to solve the language communication riddle. The giants of technology have been working on solutions for decades. WebRTC enables developers to access those fine solutions from anywhere in the world to solve the across-language talking problem. In layman's terms, WebRTC acts as a connector, transporting audio data through a browser to a destination. That audio can be "tagged" for identification during its travel, and thus can be organized by developers for association with a user, a place, and a time. Only a connection to the Internet is required to use this transport mechanism, a fact that is extremely important for people in lower economies, many of whom have so little money that they can hardly pay the monthly costs of service, much less any type of long distance fees.

Currently, most devices have built-in speech recognition; however, it is always in a single language. Even the incredible Dragon only enables one language at a time per computer. Thus, the current state of technology leaves users without the versatility to switch from a speaker of one language to a speaker of another language. Therefore, a nurse may be able to communicate simple phrases to a Spanish-language-speaking patient, but that patient cannot answer back. Limiting speech recognition to one language per device is a "one-to-many" experience, not conversational, and thus unusable for any type of across-language dialogue. This is where the impact of WebRTC will drive a new dynamic.

WebRTC is now currently available mostly on several browsers for laptops and desktops, and is expected to be available on mobile devices within a year. Installation on mobile devices may become the catalyst for the across-language upheaval in the international community. Microsoft and Internet Explorer appear to be moving in a slightly different direction through the interesting development of Lync, which will have its own strong advantages of a different kind.

Sue Ellen Reager is CEO of @International Services, a language and software solutions company that also performs translation, voice recording, and global system testing for speech and DTMF applications, as well as media and video localization.

Page1 of 1