The 2013 Speech Luminaries

The Developers' Friend
Mazin Gilbert, Assistant Vice President of Technical Research, AT&T Labs–Research

AT&T has been in the spotlight this year, with several offerings using its proprietary Watson Speech Technologies (not to be confused with IBM's Watson). The projects have been helmed by Mazin Gilbert, assistant vice president of technical research at AT&T Labs–Research. Gilbert is responsible for overseeing the lab's efforts involving machine learning, Web mining, natural language processing, speech mining, contact center analysis, multimodal voice search, and automatic speech recognition.

In April, AT&T teamed with Interactions. AT&T Watson's speech engine will be integrated with Interactions' speech-enabled virtual assistant applications. The companies plan further development of virtual assistants for customer care.

"The customer...just wants a seamless, user-friendly experience," Gilbert says. "If you're sitting in a car and...talking to a virtual assistant...you don't really care where the technology resides. We're trying to support a connected life experience."

AT&T has also made a concerted effort to work with developers over the past year. In July 2012, the company released several AT&T Watson-enabled speech APIs aimed at developers who want to create apps with transcription and voice recognition capabilities but don't necessarily have the know-how to do so.

"[These APIs] include an open speech or generic API, and that is sort of the holy grail of being able to transcribe speech into text," Gilbert says. "That API is trained on a million-plus words and hundreds of thousands of speakers, and that's available to developers who want to do speech recognition and don't have a clear notion of what application they need."

The initial APIs offer developers technology for local business search, short message service, Web search, voicemail-to-text, dictation for general use of speech recognition, question and answer, and a U-verse electronic programming guide. More will be offered for social media, gaming, language translation, and speaker authentication.

"We're providing the software that goes into your application, and this software basically talks to our API that sends speech in real time and is able to recognize it," Gilbert says. "Some developers want to build their own APIs. Some of them don't have that expertise and they want to pull their software into their application. We're doing this so people don't have to reinvent the wheel."

The Medical Master
Juergen Fritsch, Chief Scientist and Cofounder, M*Modal

The healthcare industry is being challenged to meet the requirements of several pieces of U.S. legislation, which call for healthcare providers to use electronic medical records (EMRs). These records essentially take the place of paper and pen, making medical records easier to track, manage, and share. Getting an entire industry to comply with looming deadlines, however, is no easy task. Nonetheless, Juergen Fritsch, M*Modal's chief scientist and cofounder, is ready to help.

M*Modal offers clinical transcription, documentation services, and a proprietary Speech Understanding solution.

As chief scientist, Fritsch's research efforts include combining speech recognition and natural language processing technologies to improve medical language systems. Among his projects, Fritsch has developed what the company calls closed-loop clinical documentation work flows. The technology determines where dictated content is translated into medical documents and analyzes it for incomplete and inconsistent content.

M*Modal's products that employ Fritsch's research include M*Modal Catalyst, a cloud-based natural language understanding platform that combines structured and unstructured data found in electronic clinical systems.

Under the Catalyst umbrella, M*Modal released the M*Modal Catalyst for Quality in late 2012. Aimed at clinical documentation improvement (CDI) specialists, the solution uses natural language understanding to uncover previously inaccessible data found in EMRs, such as dictated notes and lab results.

M*Modal also has a number of coding solutions, and recently struck a deal with 3M Health Information Systems that enables its clinical documentation platform to work with the 3M 360 Encompass System. The union marries metrics, analytics, CDI, computer-assisted coding, and cloud-based speech understanding into a unified data work flow.

"We really do believe that most problems, whether it's EMR adoption or EMR's ability to actually report on quality measures or coding, [are] a documentation issue at the time a record is created," says Aaron Brauser, director of Catalyst products at M*Modal, adding that, if organizations improve this part of the interaction so the information is reusable, "it makes everything else go that much better."

The 2013 Speech Luminaries

The 2013 Market Leaders

The 2013 Implementation Awards

The 2013 Star Performers

Deepgram Launches Streaming Speech, Text, and Voice Agents on Amazon SageMaker AI, Integrates with Amazon Connect

Vonage Integrates with Salesforce's Agentforce Voice

Wispr Raises $25 Million to Build Its Voice Operating System

Lorikeet Launches Voice 2.0