Hands-On: An Interactive Display
At last summer’s SpeechTEK 2010, a number of speech technology vendors submitted products for review by attendees as part of the SpeechTEK Labs group of sessions. These hands-on, interactive demonstrations involved technologies in four categories: speech analytics, natural language processing and machine translation, voice application development tools, and innovative products and solutions. Judith Markowitz, president of J. Markowitz Consultants, Peter Leppik, founder and CEO of Vocal Laboratories, and Amy Neustein, editor-in-chief of the International Journal of Speech Technology, moderated the speech analytics session. Deborah Dahl, principal at speech and language consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group, led the natural language processing and machine translation session. Bill Scholz, president of the Applied Voice Input/Output Society (AVIOS), led the voice application development tools session. Thomas Schalk, vice president of voice technologies at ATX, moderated the session on innovative technologies. A total of 17 companies submitted products for review and conducted the demonstrations.
Approximately 40 attendees visited the nine vendors participating in the analytics lab and saw a spectrum of offerings. Some packages contain powerful statistical engines to mine mountains of data looking for hidden relationships; others filter transactions to find problem records. Some accept input from a variety of data sources, while others are integrated into other products or services and are mainly intended to help manage those products.
No matter which package you choose, you’ll need to invest time and training to interpret the output to gain the most benefit. We’re still a long way from having sophisticated software that can replace the judgment of a person who understands the underlying business or technical process, but in the hands of the right person, these tools can pave the way to faster and deeper insights.
Angel’s Caller First Analytics engine is integrated directly into Angel’s hosted IVR platform. It’s driven by MicroStrategy’s Business Intelligence software, and the platform is designed to allow clients to tune their applications for optimal performance. Getting started with Angel’s hosted platform and on-demand provisioning should be simple and easy.
Autonomy Explore takes data from multiple sources (e.g., contact centers, client surveys, and Web server logs), tracks trends over time as they relate to specific events, and includes “sentiment analysis” to determine whether client feedback is positive or negative. It examines client interactions out-of-the-box through a single-channel point of contact, reflecting how clients interact with companies in the real world.
BBN (Raytheon Systems)
AVOKE Call Browser follows a call from start to finish, using split-channel recordings (it separates the client’s voice from the agent’s) to provide an accurate assessment of CRM problems. It has the added value of giving CRM feedback on how partners are handling the client base. The demo worked well and was easy to use.
AVOKE’s network-based recording platform generates whole-call recordings of client interactions with no hardware or software on the client’s premises. It also lets clients go from high-level dashboards and summary statistics to individual call recordings and events. The end-to-end call recording should make a huge splash.
Calabrio Speech Analytics focuses on evaluating agent factors—notably, the script. The system’s versatility and flexibility support checking practically any relevant data category or subcategory in calls. The demo worked quite well.
Calabrio showed how its analytics helps verify agents are providing multiple mandatory pieces of information on every call. Using a phonetics-based engine, the system flags calls where the script was not followed, giving high-level compliance statistics, the ability to review individual recordings, and manual overrides of automated flags to correct instances where the software might have erred. Automated script-compliance verification is of immediate and practical value to many contact centers.
The Eureka 7 Platform uses large-vocabulary, continuous speech (LVCS) and four cluster-algorithms
to extract topics in a call. The user interface works easily and shows state-of-the art engineering principles at work.
CallMiner automatically categorizes recorded conversations with agents based on the subject of the call, the client’s emotional state, and actions the agents took. Dashboards show trends in client interactions over time, and let clients drill down into specific groups of calls or individual recordings. The categorizations show how particular business issues or policies are related to client satisfaction, making it clear where the company might need to make changes.
ClickFox Pulse is equivalent to having a team of seasoned statisticians study an enterprise to determine reasons for client churn rate. These “virtual” statisticians reveal the pulse of the enterprise. The demo showed clients having difficulty navigating IVR and online bill payment systems. Pulse gave immediate business intelligence feedback. Any enterprise could easily learn how to use the system.
ClickFox feeds data from multiple sources (e.g., IVR, Web, client surveys, churn rates, and trouble tickets) into a single analytics engine. This lets companies see how specific issues throughout the enterprise could be related to high-level business outcomes, like churn or client satisfaction.
ClickFox’s interactive interface makes it easy for an analyst to explore relationships among disparate data on the fly, seeing which metrics are related to each other and how strong those relationships are.
Nexidia’s Enterprise Speech Intelligence suite uses phonetic indexing and can mine a prodigious amount of data in real time. Such immediate feedback is a necessity in today’s world.
The engine automatically identifies key phrases in call recordings and tracks relationships among them, allowing easy identification of trends and key drivers as well as the ability to drill down to individual audio recordings where particular phrases were found. Data can be viewed in a variety of intuitive ways, including trend graphs, word clouds, relationship maps, and dashboard dials.
Verint’s Impact 360 Speech Analytics measures many facets of client-agent interactions. It maps the paths leading to either call resolution or nonresolution. This program is an excellent example of how to gather data that enables an enterprise to increase its client base, bottom line, and market share. The demo was sophisticated, yet easy-to-understand.
The analytics engine automatically discovers, identifies, and tracks client behavior indicators and key phrases. It shows which phrases are trending up or down, and searches the audio archives for specific words in individual call recordings. Full-text searching of recordings identifies those with specific words and identifies other common phrases that might be related (e.g., searching for “Web” suggests “Web site” and highlights “passwords” as a related term).
Voxeo’s VoiceObjects Analyzer is integrated into the VoiceObjects platform for speech applications. It automatically tracks key events during calls and generates statistics for application tuning, behavior analytics, and system administration. Navigation from high-level performance and behavior statistics to individual call records and details about particular interactions is very easy.
Analytics are tightly coupled into the speech-application platform, making it easy to see the effect changes to the application have on caller behavior and call outcomes.
NATURAL LANGUAGE PROCESSING AND MACHINE TRANSLATION:
Natural language processing is a relatively new area in the speech industry. While natural language processing in the form of statistical language models is more frequently used in call routing applications, natural language processing systems that attempt to fully analyze users’ utterances and not just classify them into categories are only now starting to become available. These systems have potential applications in a wide variety of areas, including traditional call center self-service applications, multimodal mobile applications, and speech-to-speech translation. This session was well-attended, which speaks to the high level of interest in this topic among SpeechTEK attendees, who had the opportunity to glimpse into the future and observe two systems that are at the vanguard of a new generation of language understanding. The vendors involved demonstrated newer, more powerful paradigms for natural language processing based on linguistic analysis, used in multimodal environments, and supporting multiple languages.
Cambridge Mobile presented a natural language processing system focused on support for mobile conversational applications. Its technology is based on traditional linguistic levels of analysis, including a dictionary, grammar, and semantic analysis. Unlike the grammars used in current speech applications that are based on semantic concepts, such as address or destination city, the Cambridge Mobile grammar is based on syntactic concepts, like verbs and prepositional phrases. This allows it to be much more reusable across applications than current grammars, reducing development effort and allowing different applications to share grammars.
Like Cambridge Mobile, GyrusLogic is based on a linguistically motivated foundation, and includes syntactic and semantic analysis steps. GyrusLogic supports multiple languages, including multiple languages in the same application. To support call center applications, the GyrusLogic platform is designed to work with VoiceXML. This is a promising way to keep the benefits of VoiceXML as a standard platform while leveraging more powerful emerging techniques, such as full natural language processing. Multimodal mobile applications are also possible. GyrusLogic supports natural language typed input as well as spoken input.
VOICE APPLICATION DEVELOPMENT TOOLS:
SpeechTEK attendees were offered the opportunity to test drive a collection of tools for developing and administering voice applications, including voice dialogues, grammar managers, prompt managers, voice application testing systems, and libraries of reusable components. Hands-on evaluations allowed attendees to determine the amount of ramp-up time necessary to use each tool, and how easy each tool was to use. The eight laboratory demonstration systems all showed a significant degree of sophistication in contrast to the development tools of just a few seasons ago. Both application development and postdeployment maintenance have traditionally been expensive, to the point of serving as a barrier to entry. Tools such as those described here directly address this concern and promise to increase the quantity and quality of speech applications at significantly lower costs.
The AjaxWeaver VoiceXML Orchestrator is a Web-based development platform for building speech-enabled IVR applications using any Internet browser. It uses simple drag-and-drop functionality designed to let anyone build a live voice application in minutes without programming, expert knowledge, or special software. The VoiceXML Orchestrator also provides a well-designed extensible plug-in framework for developing custom widgets. The tool’s video display appealed to attendees.
Angel demonstrated its Caller First Analytics (CFA) product, which provides visibility into how a voice application is being used, including the most frequent customer selections and most common caller path patterns, hang-up points, and in-depth task completion information. Enterprises using business intelligence can make appropriate changes to the call flow, contain common call issues, and increase customer satisfaction by addressing caller needs quickly. CFA combines call data, call volume data, application performance data, and VUI analysis data into a single interactive environment, and offers both search and data views plus the ability to drill down to specific data points to understand a caller’s intention. According to Angel, one customer saved more than $1 million annually by using CFA.
Cambridge Mobile observes that most mobile application developers don’t have the expertise to speech-enable their smartphone applications, so in response the company previewed World Bench, a development tool that allows users to integrate conversational speech into their applications. The developer does not need to create grammars or lexicon; they are all prewritten. With a set of sample text sentences and some drag-and-drop operations, a working app can be ready in a few hours. Most important, the user can interact conversationally with the app and every other conversationally enabled mobile app on the phone.
Loquendo demonstrated its new, full-featured Text-to-Speech Director. The range of features accessible via TTS Director include emphasizing a keyword in a phrase, accepting input written in SAMPA or IPA, using standards (such as Speech Synthesis Markup Language and Pronunciation Lexicon Specification), adjusting speed and pitch, mixing music and sound effects, and adding emotions. Loquendo also demonstrated TTS in Arabic, including full support for typing right to left in Arabic characters. Attendees were shown how to use the TTS Director to generate and edit prompts for voice applications, both client-server and device-based, and were given the opportunity for hands-on experience at tuning prompts according to the particular requirements of a speech application.
Microsoft displayed its Unified Communications Managed API (UCMA), which contains a new namespace for developing VoiceXML hosting applications in managed code. When combined with the VoiceXML support in the Microsoft Speech Platform, this namespace enables VoiceXML host applications to launch a VoiceXML page and process it against a UCMA Audio/Video Call, as well as get exceptions and results from processing the page. Microsoft demonstrated how to write speech applications with UCMA and the Microsoft Speech Platform.
Openstream displayed a new application authoring tool, a World Wide Web Consortium standards-compliant (SCXML, InkML, SMIL, SRGS, MRCP) offering that emphasizes device-independent delivery-context interaction (e.g., the ability to leverage camera, GPS, etc., without rewriting code across all of the popular mobile platforms). The tool also offers an interoperable open data format that renders multimodal content with data from other tools, such as Microsoft Office, and contains support for both connected and disconnected operation as well as the ability to use local and remote speech. The tool even integrates with notification services, Twitter, and other applications.
Voxeo demonstrated the new VoiceObjects Analyzer, a tool for analysis and tuning of speech recognition performance based on reporting and analytic techniques employed in business intelligence environments, such as Cognos and MicroStrategy. The demonstration included analysis of caller navigation patterns, transaction completion rates, and in-depth customer behavior analysis.
Nearly three-fourths of all mobile devices used in warehouses and distribution centers run a terminal emulation application to connect the devices to host systems, and the majority of this terminal emulation software is from Wavelink. The company demonstrated how to easily voice-enable mobile devices in just minutes and have them up and running using Wavelink’s Speakeasy. The tool has the capability to add both text-to-speech and speech-to-text functionality to applications, resulting in not only increased productivity, but also improved accuracy of mobile applications and devices. It allows hands-free data collection without the need to modify host applications or to require additional hardware and software. Wavelink explained that the ability to combine voice with other types of data entry, such as barcode scanning, helps ensure tasks are accurately completed.
Six vendors showcased innovative solutions at the Innovative Solutions Lab, effectively illustrating that speech technology continues to evolve, and proving that voice applications are reaching new dimensions. Product demonstrations included customized text-to-speech, manufacturing automation, natural language, advanced analytics, mental stress assessment, and multimodal solutions.
Autonomy etalk focused on understanding unstructured information, underscoring the key role of sophisticated speech processing using several customer interaction analytics. The company’s technology augments phoneme-based approaches with unique technologies that enable computers to form human-like understanding of unstructured information. This serves as a platform for the conceptual and contextual understanding of information. Another product, Autonomy Explore, executes an ingestion process that retrieves, analyzes, transcribes, and indexes recordings to allow word/phrase-based searches, conceptual searches, and script adherence matches.
BrainGauge demonstrated mental workload testing in real time, and attendees took mental stress tests to familiarize themselves with the solution. Two tests were conducted: Attendees were asked first to repeat digit strings right after hearing them, and then to identify colors that flashed on a screen, both of which came at faster and faster speeds until the subject could no longer keep up. These and other tests illustrated how BrainGauge’s solution can help employers identify the most capable job candidates in industries with high workload and pressure levels.
Cambridge Mobile exhibited a solution to help developers create conversational speech applications using predefined lexicons and grammars. This approach allows developers to focus on dialogue design elements. Cambridge Mobile did not demonstrate its solutions during the lab, but promised new mobile device applications in the near future.
Loquendo showcased TTS Director, which allows the creation of custom prompts. TTS Director features a user-friendly design and enables command selection from a detailed yet intuitive drop-down menu. The tool allows users to change the voice and adjust the reading mode and modify acoustic and prosodic parameters, such as sampling frequency and coding, pitch, speaking rate, and volume. The final edited message can then be saved in text and/or audio formats. TTS Director supports many languages.
Openstream demonstrated SmartAssistant, a multimodal personal assistant that can adapt to a user’s context, such as motion, time-of-day, date, location, or device-type, and then switches to the mode that is most appropriate for that context. SmartAssistant can understand natural speech and automatically adjusts various attributes, such as voice calls, texting, and instant messages, through speech/touch/gesture inputs. It further enables the user to share contexts across applications.
Wavelink demonstrated how easily Speakeasy adds TTS and speech-to-text functionality to enterprise applications. Speakeasy enables hands-free data operation without modification to host applications or requiring new hardware or servers, which cuts implementation time and mitigates equipment costs. Combining voice with other modes of data entry, such as barcode scanning, also provides a valued cross-check to ensure task accuracy, increase worker productivity, and improve workplace safety.