Speech Technology with Impact

Article Featured Image

When unified messaging (UM) emerged in the early 1990s, it was touted as a productivity enhancer for the mobile worker. Suddenly, users could receive voicemails, faxes, and emails in a single inbox. More than a decade later, the concept of unified communications (UC)
has taken over and garnered vast interest across enterprises. The promise of UC is that it delivers myriad forms of communication and productivity tools, including UM, instant messaging, presence, collaboration, transactional applications, and more, on any device of one’s choosing.

With interest in UC growing, it’s no wonder that speech technologies are being used to provide an even greater level of flexibility in how communications of all types are delivered to and handled by the end user. Surprisingly, it’s not just interactive speech recognition and text-to-speech that are getting all the attention. Now, a dozen or more companies have introduced productivity-enhancing products that are feeding the UC groundswell, using large vocabulary speech-to-text. Providing services such as voicemail transcription, voice-to-text messaging, or even speech-to-blog capabilities, start-up companies such as Jott Networks, Pinger, SpinVox, Talkr, and Yap are popping up everywhere. Established vendors such as CallWave are also refining existing offerings, using the latest developments in speech technologies to provide natural complements to UC offerings.

Features: Pros and Cons
The most prevalent function in this group is voicemail transcription services, which turn voice messages into emails or SMS messages. Many provide the ability for a user to text or email a person back, and converted voicemails are stored, allowing users to search and sort the way they would with email. In addition to voicemail transcription, a growing number of companies are providing services that enable the user to take notes, speak information for conversion to text in blogs, or to send text messages, all in hands-free mode from the phone.

There are, however, downsides. Accuracy of the speech-to- textconversion, especially in long voicemails, can sometimes be as low as 62 percent to 70 percent. This causes many services to use human transcribers to convert voicemails to text. This, in turn, raises questions about privacy and security, increases costs significantly, and puts a limit on the scalability of these services. Still, for the mobile professional, the downsides might be worth the price.

However, one well-established vendor, CallWave, is a prime example of how the richness of today’s ASR complements UC, while addressing issues of cost, accuracy, scalability, and privacy. A 9-year-old company based in Santa Barbara, Calif., CallWave delivers PC applications that complement the telephone. CallWave’s products include Visual Voicemail for Email; Visual Voicemail, and Text Widgets & Gadgets; and desktop applications for text and voicemail. However, unlike other vendors in this area, CallWave has recently added a few unique capabilities to the mix.

CallWave’s Vtxt service uses proprietary speech recognition and Web 2.0 visual voicemail technology to send the gist of a voicemail to a handset. CallWave provides just the pertinent information to the user as an SMS message or email subject. The user can quickly determine whether to act, and if so, how urgently. The gist capability is entirely automatic, avoiding the human transcriber cost and privacy concerns. If a message is long or critical, the user can access the full content through the handset or a desktop computer.

CallWave’s service also enables the voicemail text to become searchable and storable. After getting the gist of the message via SMS or email, users can review Vtxt voicemail in text and audio form, and also perform desktop texting, callback, permanent archive, search, and other call management features, all from a personal Web page.

It is exciting to see new uses for speech-to-text that have broad applicability and appeal for business users. Companies like CallWave have gotten to the immediate needs of the mobile worker by honing in quickly on what is important in a message and allowing a response. Even as speech conversion gets better, it is still great to be able to get to the gist of the voice message, immediately, without having to wade through the details.

Nancy Jamison is the principal analyst at Jamison Consulting. She can be reached at nsj@jamisons.com.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues