Bringing Video to the Voice Arena

Article Featured Image

WHEN THE SPEECH APPLICATION development tools market emerged a few years back, it made sense that application vendors focused on call centers. With dialogue states costing thousands of dollars each to design and build, the vendor community placed its original focus on the largest call centers with a very narrow type of call involving relatively few dialogue states.
Things worked fine for a while, but enterprises soon looked to automate other types of calls, such as product support, and to do more with their call centers.
A new type of call center followed quickly, one that incorporates multimedia applications, namely video, to traditional interactive voice response (IVR) systems. This new call center application, called interactive voice and video response (IVVR), would allow enterprises to interact with their customers or potential customers in a whole new way. Call center agents could, for example, push instructional videos to help with a technical installation. Video menus, product demonstrations, account histories, location maps, flight information, or other relevant information could be sent to a phone’s display screen for customers to see while interacting with self-service systems or live agents.
“There are some interesting things you can do in a contact center now, like pushing a video to a potential customer to let them see how a product works,” says Rob Marchand, senior director of product management at Genesys Telecommunications Laboratories, which has included IVVR capabilities into the latest release of its VoiceGenie IVR solution.
Genesys’ VoiceGenie 7.1, which the company released March 23, is a voice self-service platform with built-in support for rich media applications, such as video voicemail, video conferencing, video-enabled music and games, video call recording, and video sharing. It comes with the tools needed to help enterprises, managed service providers, telecommunications service providers, system interface developers and integrators, and industry development partners add the video and voice elements of VoiceGenie to their offerings.
“Companies clearly want to improve customer satisfaction today. Many of them are doing so by embracing new technologies that support better customer service, such as speech services and multimedia, and creating a more dynamic environment to improve the overall customer experience,” says Wes Hayden, president and CEO of Genesys.
Similarly, when Envox Worldwide released the Envox 7 communications development platform in late February, it too incorporated comprehensive support for voice and video. Envox 7’s video capabilities enable developers to add IVVR to traditional call centers, but also to build in other multimedia applications, like video messaging and video call recording. The vision behind the release was “to create a flexible communications development platform that can handle all types of voice, video, and text communications,” says Mark Flanagan, president and CEO of Envox.
CosmoCom has included IVVR in its CosmoCall Universe all-IP contact center platform since November 2005, and supports its development and deployment with a drag-and-drop graphic user interface (GUI) service creation tool called CosmoDesigner that helps developers design the call flow. Creating a call flow is as simple as building function block icons that include things like IVR commands, automatic call distribution routing and queuing commands, chat, email, and other functions, dragging them into the graphical user interface, and interconnecting them with arrows. The visual representation is automatically translated to an XML representation that can be posted on the Web for immediate execution by CosmoCall Universe, all without having to take the system off line or to restart it.
The latest version of Voxpilot’s Open Media Platform has also been significantly enhanced to support multimedia content delivery solutions, allowing developers to deliver multimedia content, live mobile TV, and video surveillance applications on existing advanced circuit-switched third-generation mobile networks. Loquendo’s VoxNauta platform also allows for the delivery of voice and video applications over advanced mobile networks.
A number of other speech application vendors, including Cisco Systems, are also adding video capabilities to their traditional voice and telephony applications. Cisco currently has a few video components connected to its unified communications and telepresence offerings in beta testing and plans for their full release in the late summer or early fall. The company has even created a new business unit specifically dedicated to multimedia content, says Mike Bergelson, the company’s director of business development. “It’s a key area of strategic investment for the company.”
One of those investments included the acquisition by Cisco of speech application development tools vendors Audium and Metreos last summer. Both tool sets included capabilities for integrating voice, video, data, and mobility services.
“The first implementations will be in an easy area, like adding video content in the customer care setting,” Bergelson says. “As people wake up and realize other implications for business processes, it will get more complicated.”

New Tech, Expanding Market

Though IVVR has been around for a few years now, it is still a relatively untapped technology in the industry and is just now starting to gain momentum as people realize other possible areas for its use. Among those areas is the ability to turn caller hold time into an advertising revenue opportunity with engaging and entertaining video messages. The technology can also enable advertising-supported directory assistance and similar mobile search applications.
Volt Delta Resources, for example, has been active in the latter, and is using Envox tools to build its service. “The capabilities provided by this cutting-edge communications development platform have enabled us to effectively handle more than 4.4 billion calls for our customers, and to deliver breakthrough applications that have helped our business grow,” says Rich Oldach, executive vice president of strategic marketing at Volt Delta.
Video technology could also be a key enabler of call center interactions with deaf and hearing-impaired customers. Communication Services for the Deaf (CSD), a provider of video relay services that use video interpreters to translate IVR dialogues into sign language, has been using CosmoCom’s IVVR for some time.
“For our hearing-impaired callers, IVVR is not a luxury, it’s a necessity,” explains Benjamin J. Soukup, CEO and president at CSD. “It’s the only way we can offer the kinds of greetings and caller options that most call centers provide with IVR.”
Tim Moynihan, vice president of global ma

VoiceXML is a Web-based, open-source platform for creating audio dialogue states. But for all that VoiceXML has brought to the market, it is not without its limitations. Suited primarily for call center applications that require only basic input from the caller and that deliver only basic audio information to the caller, VoiceXML does not allow for multiparty conferencing or complex frequency analysis used for outbound call progress detection; nor does it allow users to pause, rewind, fast-forward, or control the speed and volume of audio files for playback or to begin playback of audio files at an arbitrary point. It also does not allow for frequent call transfers to other phone numbers or outbound calling.
To handle those kinds of functions, the World Wide Web Consortium (W3C) developed the Call Control eXtensible Markup Language (CCXML) to provide the call management, event processing, and conferencing capabilities that VoiceXML lacks. These include call routing, find me/follow me capabilities, call bridging, selective call answering, conferencing, dialogue execution, and coaching, whereby a third party can connect to a call but only have one of the participants hear what is said.
rketing at Envox, expects to see many more speech application vendors climb on board the video train. “We’ll see development kits that include telephony and VUI design, but there will be other technologies like video and SMS thrown into the mix,” he says. “We’re moving fast from a world that has been voice- and email-centric to one that includes video and everything else. Speech is an important component of most applications, but it’s not the only one.”

Daniel Hong, a lead analyst at Datamonitor, agrees. “As we roll out the tape over the next several years, multimedia communications will gain greater momentum in the enterprise and service provider markets. Companies that are considering video as part of their multimedia strategies should look to those vendors that have a clear technology and business differentiator on the platform and performance level, which will drive greater innovation and success on the application layer,” he said.
The advent of IVVR and other video applications tied to speech is largely intertwined with the increased amount of bandwidth available and the growth of 3G mobile phones and computer-based softphones, both of which now offer advanced video capture and playback functions. Advanced, 3G mobile phone networks are more prevalent in Europe and Asia, but are starting to take off in the United States as well. Those phones are no longer just for talking, but rather, are becoming mini entertainment centers, with mobile and Voice over IP providers offering TV shows, video clips, music video downloads, video mail, games, and much more to handsets.
“As a result of the rapid evolution of IP telephony, along with other innovations such as IP-based video messaging, enterprise and personal communications are quickly expanding beyond a collection of distinct communications methods or channels to a single communication infrastructure with multifaceted end points,” Envox’s Flanagan explains.

Brought on by VoiceXML
These new video-enabled technologies, and the development tools that go with them, are also closely linked to the advancement of the Voice eXtensible Markup Language (VoiceXML), and subsequent Call Control XML (CCXML) development platforms.
Though originally designed for creating audio dialogues, VoiceXML can manage both audio and video content together, and because of that, speech application developers are becoming more motivated to adopt these open- source, Web-based platforms to deploy multimedia applications.
Since they rely on open-source, Web-based architectures, developers using those platforms can use one tool to connect to many different applications, networks, hardware devices, even back-end customer relationship management (CRM) and enterprise resource planning (ERP) software, data warehouses, and other collaborative systems. And, they can do so without having to initiate different languages or modify existing languages to create separate voice and video paths. That’s because in VoiceXML, the video resources are no different than the audio resources from a language perspective; the same basic syntaxes and semantics apply, and usually just involve substituting the word “video” for “audio” in the coding language, according to the VoiceXML Forum.

Tools Simplifying VIDEOGAME Development Too

An area that has seen tremendous interest in voice technology of late is the videogame arena. The speech industry has responded, and a number of application vendors have recently launched toolkits to help game developers incorporate new speech capabilities into their online and console games.
Among them are Vivox and Fonix, two of the leading suppliers of voice applications to the game development community. At the recent Game Developers Conference in San Francisco, both companies announced special speech application development toolkits specifically designed for the gaming industry.
The Fonix VoiceIn Game Edition allows developers to add voice recognition to their game programming. Also included is a version of its popular DECtalk text-to-speech application that allows developers to use TTS voices as placeholders in the programming until recorded dialogue can be added; VoiceIn Phonetic, which aligns phonetics to audio data to allow animators to match a character’s facial expressions with the phonetic components of the speech recognition; and VoiceIn Karaoke, which compares the timing, pitch, and tone of a singer’s voice to the reference song.
To date, Fonix speech technology is featured in 20 games for Microsoft’s Xbox, Sony’s Playstation, and the PC. “Fonix continually looks for ways to provide developers with cutting-edge tools to bring interesting, new interfaces and features to videogames,” says Tim Hong, vice president of games at Fonix.
Vivox used the conference to launch its Jumpstart Program, developed with IBM and Online Game Services Inc. (OGSI). Jumpstart promises to simplify the process of launching voice-enabled online games and offers a beta testing platform as well.
“Game developers have a lot on their plate and need efficient and effective ways to integrate voice communications without compromising their development cycle,” said Rick Frye, vice president of sales at Vivox.
“As voice becomes required in online games, the Jumpstart Program helps to accelerate the integration process and mitigate technology risk. We are looking forward to making it easier, faster, and cheaper for online game developers to implement voice with this dynamic alliance,” added James Hursthouse, CEO of OGSI. —LK

Once created, those elements can be stored, reused, and redeployed across virtually any network or device as needed. That will speed time to market for most products. Envox claims that its Envox 7 tools can reduce deployment time by as much as 50 percent, while Convergys claims savings of 35 percent to 40 percent with its ExpressWare toolkit, a VoiceXML-ready collection of tools for creating more than 30 reusable, drag-and-drop speech elements.
Among vendors, “new speech applications and development tools are increasingly making the move toward open standards,” observes Bill Meisel, president of speech application consulting firm TMA Associates. “VoiceXML is the way people are going for speech, as it continues to be developed very vigorously.”
That’s welcome news for both system developers and the enterprises that deploy speech applications alike. Depending on the vendor, there are literally dozens of platforms currently available. There are so many, in fact, that “it’s getting harder to separate the tools from the platforms,” Meisel maintains.
The growing support for VoiceXML and CCXML as standard platforms for speech applications by speech application development tool providers is the start of a reversal in a long-standing tradition in the industry. For years, speech application vendors only developed tools that supported their own proprietary systems. “Tool vendors want to put together a suite of products that only work with their own systems,” says Jim Larson, an independent speech consultant and VoiceXML trainer, “primarily to expand their market and lock people into using their systems and tools. They also get locked into using a specific platform.
“There have been a lot of great tools, but most have had very limited interoperability,” he continues. “If the editing software and the graphics application are not working together, when you make a change in one, the code is not changed in the other portions of the application because the tools are not talking to one another.”
“Over the last couple of years, we’ve seen a bigger push toward unifying tooling platforms, moving away from monolithic solutions to disaggregated tool sets,” Cisco’s Bergelson says. “Standards are also encouraging independent innovation. With proprietary systems you had to rely on one vendor to do it all; now you can purchase from multiple vendors.”
There’s More to Speech
Despite the big emphasis on tools, though, Larson is quick to point out that enterprises and developers should not rely solely on them for the creation of speech applications.
“In reality, most of the work in designing a speech application involves writing down opinions and getting feedback.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues