Speech Technology Magazine

 

Innovative Research in the Labs, Part VI: AT&T

Research labs are available on every vendors' site, but AT&T stands out as a research lab in and for itself
By Nancy Jamison - Posted Jan 30, 2007
Page1 of 1
Bookmark and Share

This month we turn to another company with a rich history of research and development: AT&T. For more than 100 years, AT&T has produced an incredible array of technological advances, including the transistor, the solar cell, and the communications satellite. Known initially for its telecommunications research, AT&T labs has grown far beyond and is committed to developing the next generation of universal network and communications services, including advances in speech technology.

Among some of its early advances in speech was "Voder," the first electronic speech synthesizer, introduced in 1936; SAM (speech-activated manipulator), a speech-driven robot, in 1989; and a real-time translator that used automatic speech recognition (ASR) and text-to-speech (TTS), in 1992. In addition, AT&T Natural Voices TTS, AT&T Watson speech recognition, and How May I Help You (HMIHY) conversational natural language dialogue have all been driving forces behind AT&T's VoiceTone business.

Application Focus
AT&T research labs are divided into three areas. Speech technology falls under the IP & Voice Services branch, which includes things like VoIP and other communications technologies, core speech and language technologies, including TTS and ASR, and natural language understanding. Sub-projects include dialogue management, machine learning, spoken language translation, multimedia and multimodal processing, information retrieval, user interface design, Web mining, telecom features, and session initiation protocol (SIP) signaling.

What makes the research in this lab different than many other labs is that it caters to one customer, AT&T, and the needs of its thousands of communications customers, a large portion of whom have deployed contact center and self-service, network, and mobility applications and speech technologies and are looking to extend their capabilities.  

Although the lab continues to improve core technologies, its primary research is geared more toward improving the customer experience, from how they access information to how they communicate with other people. It also emphasizes using speech to improve business for those customers, application usability, customer satisfaction, and lower costs. To this end, it has a three-part focus:

  • Multimedia and multimodal processing for IPTV;
  • Services over IP; and
  • Speech and language processing for contact center automation.

AT&T works on the infrastructure of communication, from the network to the device, and adjunct applications that improve communication. Examples include incorporating natural language technologies into email response for contact centers and the Web, and multimodal applications that enable a user to seamlessly move between input methods—such as pen, keypad, or speech—on any device.

AT&T's IP & Voice Services often blends technologies of its core areas together. For example, in its VoIP Meeting Service, speech enhances one portion of the conference application to allow users to record the conference and then find vital information within the recording using AT&T's Speech Logger. Web technology also blends nicely with research in spoken language. For example, the lab is developing personalized agents for the Web to help customers navigate and conduct business easily. Imagine having your own virtual agent to whom you can say, Contact AT&T and sign me up for their new DSL service, and, when you are done, translate this document into English for me.

Summary
AT&T labs' research in voice-enabled services continues to focus on inventing and innovating technologies that have business impact and on advancing speech, language, and multimedia technologies to improve customer experiences in communications, security, and entertainment. We can expect new applications that allow a user multiple ways to access services and information, such as speech mining of information in a contact center, using a mobile phone and speech applications to find location-based services, or using voice commands to get to the right program on TV.


Nancy Jamison is the principal analyst at Jamison Consulting. She can be reached at nsj@jamisons.com.

 

Page1 of 1