November 1, 2011
By Nava Shaked Manager, CRM & Call Center - IBM Israel Ltd.
The View from AVIOS

Oh the Places You'll Go

Recently, I pondered the question: What would Dr. Seuss have said about speech scientists in their quest to achieve acceptability? What would he have written about "the places we have gone"—the ways in which speech applications have developed, and how far we have gone in trying to become mainstream?

In the spirit of Dr. Seuss, I suggest that mainstream is not the right place for us. Maybe speech technology's place is unique and designated for applications that are otherwise impossible. Maybe its strength is most apparent when it is placed side by side with other technologies.

Let's examine three nontraditional approaches in which speech technologies are being used creatively as part of a broader effort (as opposed to a separate module). I will illustrate the unique contribution of speech and the implementation of those three approaches in the industry:

Fusion: Speech enhances other, complementary technologies.
Multimodality: Speech is a means for a next-generation communication interface.
Personalization and Customization: Speech is used to maximize data and the customer experience.

Fusion

There are two approaches. The first is to use several analysis methods of speech algorithms together. Integrating several technologies creates a synergy. Several areas in the industry demonstrate how speech has helped take a project to the next level. By combining speech biometrics with recognition, you can verify the user while collecting information about his request. Another example is melding speech analytics with emotion detection to improve customer service.

The second type merges technologies. For example, security and defense applications have been using speech for some time now, especially for possible identification or as an indicator of a security breach. Those applications use a sophisticated fusion of several biometric techniques to increase confidence levels. The algorithms behind these fusion processes take into consideration that speech can be a passive form of interaction in which the speaker is not aware that his voice is being sampled or monitored.

Multimodality

In speech recognition, as the complexity of an application grows, so does the need for fusion between the technologies. On one hand, speech is intuitive; on the other hand, it is environmentally dependent. The latter is a limiting factor in making speech the sole gateway.

The multimodality approach uses various communication methods, and speech is only one of the methods. Internet and smartphone development has created a race among technologies to compete for consumers' hearts and minds.

The challenge is to allow navigation and search capabilities quickly and comfortably, merging the touchscreen, typing, and spoken word input capabilities of a smartphone. While doing so, try to interpret the meaning to identify a user's intent and direct him to the right place.

When we discuss concepts like touch, talk, and type, we let the user choose the means of interaction, and speech is chosen when it best fits. Avatar customer representatives or personal assistants are becoming more popular. They speak to us naturally and interact as part of Internet self-service. Again, speech is one more way to provide an excellent service.

Personalization and Customization

Traditionally, speech has been used in call centers to identify people and provide self-service. Today we use it to provide call center optimization from another interesting angle: personalization. Customer experience has been considered crucial to success, and one conclusion is that customers would like to have their own personal customer experience. In fact, this is one of the best places for service providers to differentiate themselves and create branding.

When we personalize our IVRs, we use all the data we have to push information to customers during the dialogue and to ask subtly for preferences while interacting with them. Such sophisticated dialogue requires speech technology to be flexible, creative, and innovative. Those are critical qualities when you interface with vast databases and decision-making mechanisms because you need to consider legal and marketing requirements, as well as an organization's preferences.

Finally, where do we stand today with speech? Going back to Dr. Seuss, we have mountains to climb, mix-ups to solve, and fears to fight. But we have opened new venues for speech technology to make a difference and influence man-machine interaction, and that's what it's all about.

Nava Shaked is the CEO of Brit Business Technologies, a call center optimization consulting company that specializes in speech technologies, and the chairperson of AVIOS Israel. She can be reached at nava@business-tech.co.il.

Oh the Places You'll Go

LALAL.AI Launches Lynx Voice Cleanup Mode

VoicePing Releases VoicePing 3.0

Voiskey Officially Launches

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

The Voice Can Sound Right, and the Video Can Still Be Wrong

DeepL Acquires Mixhalo

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Canary Speech Partners with NeuroLexIQ

Voicelyt Launches Voice Score

DXC Partners with ElevenLabs

Deliverect Partners with SoundHound AI

Fish Audio Raises $52 Million in Seed Funding

Nabla Launches Dictation for Mac

OrcaRouter Launches OrcaDub

Smallest.ai Gets $21 Million to Build Voice 4.0