Innovative Research in the Labs Part V: Intervoice Center for Conversational Technology at the University of Texas at Dallas

This month we look at a marriage of private industry and education. Within the Human Language Technology Research Institute (HLTRI) at the University of Texas at Dallas (UTD) are eight centers of specialized, cross-disciplinary research on human language technology, one of which is the Intervoice Center for Conversational Technologies (ICCT). UTD, one of the fastest growing universities in the country, focuses primarily on science, engineering and business. A core research area, HLTRI, has some of the world's top researchers in computational linguistics, with award-winning experts in various aspects of text processing—specifically question and answer systems.

ICCT was started four years ago with a grant from Intervoice to conduct core research in computational linguistics as an additional methodology to enhance more traditional statistical approaches to speech technologies. Now, the synergies of cross-pollinating research between electrical engineering and computational linguistics has resulted in improvements to core recognition technology and natural language conversational systems.

Besides filing numerous patents and being on the cusp of commercializing some breakthrough research, ICCT has developed a test suite to benchmark automatic speech recognition (ASR) vendors. This suite allows Intervoice to determine the accuracy and load performance of the various commercial ASR vendors. This has allowed Intervoice to accurately pinpoint which recognizer works the best for which types of applications and environments. Additionally, several specific patent-pending technologies have been developed by ICCT that will change the face of natural language applications. Some key technologies are described here:

Paraphraser

The paraphraser takes input as a set of task descriptions, each consisting of one or more text sentences defining a specific task or category in the application. The paraphraser will use these descriptions to generate all of the most common ways that someone could request each particular task. Directed-dialogue grammars can be automatically produced from the list of potential utterances output from the paraphraser. For natural language dialogues, the paraphraser can generate training corpora for a statistical language model (SLM) ASR, using the same category descriptions. This eliminates the requirement for collecting thousands of user utterances when training natural language systems.

Semantic Categorizer

An SLM-based ASR engine outputs a text transcription of a user's spoken utterance. Given this text transcription, the semantic categorizer will extract the user's meaning, no matter how they say it, and then categorize the transcription into a specific category from a set of predefined categories using the same category descriptions used by the paraphraser. As with the paraphraser, categories are defined by a simple text description with none of the complex, hand-built, meaning extraction rules that current systems require. It also can categorize text phrases in any domain equally well without any training or modifications for a specific domain.

Auto Tuning

Auto tuning allows live testing by taking results in transcription form, running them through the semantic categorizer and paraphraser, and then generating new grammars to tune the application. As a side benefit, if the application sees some number of unrecognized word/phrases with the same semantic category, it can determine that users are discussing a new issue that needs to be dealt with. The system can send the list of similar utterances to a supervisor for review, who then can modify the application to address the new issue immediately.

Ontology Analyzer

Lastly, the research collaboration has developed an ontology analyzer that can read domain-specific literature in electronic form to discover all the words commonly used in domain-specific conversations. The analyzer can automatically extract ontologies for specific domains or vertical markets, eliminating words not in the domain and defining the domain-specific meanings of words used in the domain. For example, "bank" in finance refers to the brick and mortar business, not "river bank." This greatly reduces time to create natural language grammars. In addition, the ontology analyzer can automatically detect customer-specific applications with product- specific names by analyzing customer documents and add these names to the lexicon, as well as eliminate words not in the domain. These ontologies are input to the paraphraser and the categorizer to assist in their operations.

N-best Re-ranking

One other patent-pending technology allows Intervoice to re-rank the N-best output of any commercial recognizer by applying computational linguistic technologies to refine the list. Each entry in the N-best list is graded for semantic coherence, syntactic correctness, as well as lexical and phonetic accuracy. The individual grades for each entry are combined, and the resulting rank is used to determine which entry in the Nbest list is most likely to be the original utterance. This process has produced 20 percent improvements in word error rates.

Summary

Regrettably, the length of this column defies explaining how innovative and potentially revolutionary ICCT research is. So, let's just finish by saying that ICCT's technologies have the potential to truly revolutionize the way natural language applications are developed, greatly increasing the speed and ease of developing them, while plummeting the cost.

Have any innovative research news from R&D? Please email Nancy Jamison at nsj@jamisons.com.

Innovative Research in the Labs Part V: Intervoice Center for Conversational Technology at the University of Texas at Dallas

Aircall Acquires Vogent

Grok Voice Mode Comes to Apple CarPlay

Krisp Launches VIVA 2.0, an Infrastructure for Voice AI Agents

DomoAI Launches TTS and Integrates OpenAI's GPT Image 2.0 in Talking Avatar Workflow