In the first issue of the cyberpunk comic Global Frequency, a Russian émigré wanders the streets of San Francisco with a bomb in his head. The bomb has the power to destroy the entire city. An agent is sent to confront the threat, maintaining contact with his superiors on his mobile phone. As he approaches the Russian, the agent’s handset spits out a diagram of his brain, detailing exactly where the agent needs to shoot to neutralize the threat. With the information in his palm, his mission ends successfully, with minor damage to the vicinity and slightly more damage to the Russian’s head. Yet, the point of interest here—unless you live in San Francisco— isn’t the bomb; it’s the handset.
The ability within mobile devices to simultaneously conduct phone calls and data transfers—videos, static images, and text—has in recent years become more science fact than science fiction. It hasn’t saved a city yet, but it can certainly make life easier for the consumer and provide a significant source of revenue for enterprises that know how to harness the technology properly.
Vendors and analysts are excited about the potential. Combining video and voice gives enterprises greater branding opportunities and a new forum for up-selling products and services. Telecommunications companies and service providers will also discover greater earnings with the addition of value-added services, such as video ringback tones. And most important, consumers will see improved service with the integration of video into their handsets.
Though many of these applications remain conceptual in terms of live deployment, the interest in video telephony extends decades. AT&T first presented a videophone during the 1964 World’s Fair in New York. It was not a rousing success. But earlier this year, AT&T introduced to the American marketplace 3G phones that are capable of harnessing the company’s Video Share program. The service allows users to swap live video over their handsets as they continue their voice calls. In a promotional skit, a woman uses a phone to shoot a live video of a workplace crush to her friend, who is on the other line. Together, they pass gleeful judgment. But unlike the 1964 videophone, AT&T isn’t the only company taking this technology seriously.
One factor that eases the integration of video and voice is VoiceXML. While initially designed for audio data, VoiceXML can now support video applications, and the advent of VoiceXML 3.0 promises greater functionality and support for multimodal features. Consequently, VoiceXML looks to become the standard that will drive interactive voice and video response (IVVR).
However, while a common programming language is crucial for interoperability, it’s only a fragment of what is necessary to maintain a robust network that can withstand a deluge of video from a variety of carriers. The success of IVVR is largely contingent on the success of the third generation of mobile phone standards and technology, or 3G. And the swift, reliable use of video requires 3G handsets on a 3G network.
"You have to have that road, that 3G network, built out," Datamonitor analyst Daniel Hong says. "You can, right now, with a 2.5G network spit out an image to the phone. But it’s going to be relatively small and you’re using [Wireless Application Protocol, or WAP] to do that. If you’re thinking about the Internet, you can’t do much with a 14.4 modem. You have to have full broadband to utilize video. It’s the same thing."
And callers must have the hardware to support a 3G network. "Having IVVR on a 2.5G phone is kind of pointless," Hong continues. "It’s like having a Ferrari on a congested street."
The FCC began its first auction of Advanced Wireless Service (AWS) spectrum licenses in August. Five weeks and 161 rounds later, T-Mobile, as expected, emerged as the top bidder. For $4.2 billion, the mobile networking company staked its claim in 120 3G mobile licenses. In terms of population, T-Mobile essentially purchased a 3G network covering 474 million people, including the superurban areas of New York, San Francisco, and Chicago. The completion of this rollout is expected in 2008.
Ultimately, T-Mobile moved in the interests of gaining the foothold necessary to stay competitive. Three other major carriers—Sprint/Nextel, Verizon, and AT&T (Cingular at the time)—had already entrenched their 3G networks. According to data provided by the CTIA Wireless Association, 3G networks in America are highly functional. AT&TBroadband Connect serves 165 cities, including 73 of the top 100 markets.
Sprint/Nextel’s EV-DO service, a wireless high-speed data service launched in 2005, currently covers 201 million people, while Verizon’s EV-DO service covers 210 million people. Additionally, Sprint is looking beyond 3G by rolling out a trial of its 4G WiMax network.
But there’s a hitch: None of these networks are compatible with each other. "America is not a standards-based market," says Sanjeev Sawai, chief technology officer and vice president of research and development at Envox Worldwide. "In Europe and Asia, there are 3G network standards that allow you to do video transmissions. In America, most of the large carriers have proprietary technology that they buy from vendors." In other words, that smitten employee with AT&T Video Share can only showcase her prospective beau to her friend only if that friend also has an AT&T 3G handset. If the friend has a 3G handset from Verizon, then the lonely employee is better off talking to the microwave.
"You have an application that works on a specific application on a specific device," says Michael Perry, director of voice self-service at Avaya. "That’s not really the kind of application our customers are looking for. I think that’s really what’s keeping the technology from being adopted in the United States."
So a large problem with mainstreaming IVVR in America is not so much the expansiveness of the network, nor the technological capabilities of the handsets; It’s American business behavior.
In Europe and Asia, telecom providers themselves don’t roll out applications—they willingly team up with partners. Since technology is largely proprietary among American telecommunications companies, vendors trying to sell their solutions must first go through a rigorous qualification process with each carrier. Each application must be specifically tailored. And because of this territoriality, it’s unlikely that the American 3G network will standardize in the near future. Telecommunications companies simply have too large an investment in their own equipment.
That’s not to say North American adoption of IVVR won’t happen; its progression will simply deviate from European or Asian models. "Each carrier will have their own camp of preferred providers," Sawai says. "And that’s how it will happen. It won’t happen as it does in Europe and Asia, where people build applications and servers and they can sell it to any carrier they want, where different carriers can run the same application. Because of the different business behavior and standards issue in the U.S., each carrier will develop its own set of partners who will provide their own set of applications only on their network."
Where Are We?
Another issue is getting American consumers to buy in. The price of handsets isn’t obscene and, according to an AT&T spokesperson, it will naturally drop as the company adds more handsets to its collection. In fact, one of its phones retails for $49.99 after a two-year contract and rebate.
The price of service, however, might cause some people to sit on their wallets. "Video, ultimately, involves downloading data," says Vijai Shankar, product marketing manager of the Genesys Voice Platform at Genesys Telecommunications Laboratories. "So for wireless subscribers, if they’re going to be downloading chunks of data, it’s going to be a cost component to their monthly plans." He adds that the quality of video clips sent through 3G networks are decent, "but it’s definitely not like a T1 sort of thing." In other words, American subscribers might want more bang for their buck.
And while many leading IVVR platform and software vendors—such as Avaya, Comverse, CosmoCom, Dialogic, and Hewlett-Packard—are based in North America, most of their IVVR applications are deployed in Europe, the Middle East, or Asia. Dialogic, for instance, is currently testing in Portugal an IVVR system that allows callers to see the customer service agent.
Still, Perry sees great American interest. "I think you’re seeing a lot of customer demand for it," he says, citing estimates that America is only six to 12 months behind Europe and Asia. "I think the market’s ready. We just need the carriers and providers to roll those services out."
Others are wary about the American market. Jim Machi, Dialogic’s vice president of marketing, surmises that America’s lag with IVVR is due to the time it has taken Americans to acclimate to a visual interface on their handsets. "It’s different not because we’re backward, but because our culture is a little different," he says. "We tend to get around in our cars"—which would inhibit video usage. "We’re not on a train for a long time."
"We haven’t seen as much take-up of true video services," adds Scot Harris, Intervoice’s director of global product marketing. "There’s a lot of interest in the U.S. in streaming video, but we haven’t seen quite as much interest in video mail and other types of video services as we have in international markets."
Datamonitor’s Hong adds that the technology has yet to mature. "Right now, there’s no killer app," he says. And because there’s nothing that end customers really want, device penetration is limited. "It works both ways," Sawai agrees. "On the supply side and demand side, both of them are lacking right now."
But Perry estimates that once video IVR transactions become more refined, customer demand will drive the market. "You’ll see the networks evolving, you’ll see the devices become more ubiquitous, and you’ll see some great applications rolling out," he says.
That American IVVR seems to ride the end of the caravan is expected when one considers that mobile technologies typically gain their following in Europe or Asia before reaching American shores. The advantage for Americans is that this allows vendors to study IVVR deployments in other countries and develop a long-term plan for an eventual penetration into the North American market.
Texting, for instance, was massively popular for a few years in other countries before it reached ubiquity in the U.S. In 2002, Europeans exchanged 30 billion SMS messages by phone per month, while that number only reached 176 million in the U.S., according to Scott Ellison, program director of wireless and mobile communications for analyst firm IDC. Ringback tones that play music on a mobile device in lieu of a traditional dial-tone are widely popular in Europe and Asia, and, in fact, were predicted to become a $1.5 billion market in Europe in 2005. Those ringback tones are just starting to gain traction in America. To build off this popularity, Dialogic is currently working on video ringback tones.
"We’re actually seeing some interest in video SMS," Harris adds. "If I’m taking a video of something interesting or funny that’s going on, I can send that video to you with that notification. Or let’s say I’ve got a news service. CNN and a lot of the other news services are constantly looking for a person-on-the-street view of what’s happening in the world. Video SMS makes it really easy to send them a video."
Envox currently has two live deployments, both video messaging services that allow users to swiftly record and share video. The first is in Singapore. The second, released in mid-September, is in Europe.
Are We There Yet?
It would be a mistake, however, to assume that those happy citizens across the Atlantic live in a little matrix of mobile video. "I wouldn’t say [video telephony is] commonplace at all," Sawai says. "It’s just about coming, but only gadget-savvy folks are using it. But it’s not commonplace by a long shot. We’re [still] proving the technology. Once we prove it and make it reliable and good quality, then all the creative applications will come up."
While Europeans and Asians certainly leverage their 3G technology more than their American counterparts, IVVR remains in a phase-one capacity. In all, very few IVVR solutions are currently available. A 2006 Datamonitor survey noted 30 to 40 live deployments of IVVR solutions and 50 to 70 pilots. Most of these deployments and pilots are small, as vendors study how end customers react.
Still, this should not scare off developers, because the incentive to continue with IVVR is great. "The next generation is already video-centric," says Jim Larson, an independent consultant and VoiceXML trainer. "Having been raised with kid TV shows that present short, intensive video clips to keep them entertained, the fast-paced videos on MTV and the noisy video background of CNN’s Situation Room are preparing people to demand video."
The popularity of streaming media and online video sharing, through Web sites like YouTube, emphasizes the point. And it’s hard to deny that, regardless of past trends, the appeal of video on mobile devices is starting to grow. Recall the recent iPhone commercial, in which the handset played a YouTube clip of a skateboarding dog. It’s notable, however, that the iPhone’s WiFi device and phone aren’t integrated, and that one must be turned off before the other can be used.
But developers and analysts are excited about IVVR’s potential for self-help, particularly in buying tickets for sporting events and concerts. Avaya, for instance, is demonstrating an application for booking stadium seats. The system pipes out a video to callers showing the view from various spots in the arena.
Intervoice is in discussions with European clients on a billing-related application that would allow callers to access their financial reports and statements over their phones. Here, the IVVR would connect an enterprise directly to its end customer. For instance, if Bank of America was equipped with an IVVR and the caller had a 3G phone, the customer could dial the bank, request a copy of his bill and, while still maintaining the call, also receive an image of his bill directly into his handset.
And in two months, Envox plans a European deployment of a voting application to coincide with reality shows like Pop Idol and Big Brother. The application would allow viewers to vote as they watch the shows on their handsets.
Genesys’ Shankar predicts that the next step for IVVR will be in streaming media—an area where he, like Intervoice’s Harris, has noticed high American demand. "Hopefully it’ll happen in the next six months or so," Shankar says. "More guys are looking initially to start out with video-on-hold and video prompting capabilities, which is our initial first step. I think in streaming, they plan to look at it more at a later stage. I really don’t know what’s holding it back besides them trying to see the market first."
A great deal of IVVR, and how best to market it, remains speculative. Harris wonders if a more effective plan would be to "push" information at callers. "A lot of people are seeing [IVVR] as a pull-type application," he continues. "I want to go some place, I want to get information, so I’ll pull it down."
And pushing information at callers allows for more branding and up-selling opportunities. For instance, if an individual is interested in a preview of next week’s "Gossip Girl," he can, after viewing the video and while hooked into the IVVR environment, access a menu of previous episodes. Or, he might be referred to a shopping environment where he’ll have the option of buying "Gossip Girl" merchandise. In this way, the customer experience becomes totally integrated within the handset.
"We think part of the reason there hasn’t been much success is because of the pull nature that most visual IVVRs have right now, as opposed to offering somebody an opportunity to talk about their preferences and subscribe to services," Harris says.
While vendors are particularly excited when detailing their concepts, the lack of technical standards currently in place could prove problematic. "The standards that are in place right now have a lot of gaps and need to be filled in as well," Sawai says.
Though Envox already has two live deployments, Sawai insists it will need at least 10 before it can start firming up standards. Until then, he says, "we really need to bring together best practices of the content world and the media world and mobile telephony."
However, developers recognize some hurdles. For instance, the logistics of operating a phone and watching a video at the same time can be complicated and would necessitate the use of headsets or Bluetooth technology.
Additionally, as Larson points out, developers still need a way to capture video so that it’s legible. While designers have plenty of experience working off the Internet, the small screen on a handset creates new problems, and information needs to be distilled and clarified. This makes sense when one considers the way in which individuals use their handsets to access the Internet. The space afforded doesn’t provide the same browsing experience that a desktop or laptop computer provides. And by virtue of being mobile, users rarely have time to sift through a wealth of information; they want their data quickly and efficiently.
"The broad conclusion I’ve come to is that content has to be tailored to the scenario," Sawai says. A lot of equipment will be needed, for instance, to translate TV- or DVD-quality video into something viewable on a small screen, which isn’t always clear. The image "can swim a little," says Robert Finan, principal solutions engineer at Genesys’ Europe, Middle East, and Asia unit. "So you have to make sure that none of the details in the page are too small—otherwise they may not be legible." And developers, according to Harris, absolutely must test their applications on as many different handsets as possible to determine what the end user’s experience might be.
To that end, Hong believes that analysis of customer behavior is crucial. Thus it’s prudent for businesses to continue deploying their applications in smaller stages to prevent enterprise overuse of IVVR—a common symptom of any new technology. After all, just because people have it doesn’t mean they’ll want to use it. Sawai mentions that even in America, it’s possible to watch television on a handset. "But how many people do that?" he asks. "Is the experience good enough?"
"Remember when fonts were first introduced?" Larson adds. "Documents looked like ransom notes with all types of fonts and colors. We’ll have some terrible applications where video is misused and overused, but those applications will fall by the wayside as developers create reasonable guidelines for using video."
The question is not whether IVVR will happen in America; analysts and developers unanimously agree that it will. The real question is when, and that elicits responses of between one and five years. "We have it on our roadmap at this point," Intervoice’s Harris says. And customer interest remains high. "I want to accelerate it very quickly," he adds. "There are other things that are generating money today that have a greater interest level from network service providers, but [IVVR] is near the top."
His sentiment is popular, as the mobile environment is currently headed toward the integration of voice with all other forms of data. "You don’t want to be caught flat-footed when the marketplace really does catch onto it," he says.
Sawai takes it a step further. He anticipates that in two years, someone will have developed a killer application that everybody will want; in five years, it will become so commonplace that even Americans will take it for granted.
"I think it’s going to happen a lot faster," Avaya’s Perry says. He foresees practical applications rolled out en masse within 12 to 18 months. "This is something the industry has been talking about for years. Video will be explosive. Look at the explosive growth that’s going on in self-service right now. Just look at how these things are coming together. The companies are looking to differentiate themselves in several ways on the quality and value of the service they give their customers."