The Big Strike or Fool's Gold?

“The mobile streets are paved with gold,” some voices in the media seem to be clamoring. For example, in April, both The Telegraph in the United Kingdom and The New York Times ran headlines lush with 1849 California Gold Rush imagery: The Telegraph with its “Apple’s iPhone Is Gold Mine for Developers,” and the Times with “The iPhone Gold Rush.” The Times even began its article, “Is there a good way to nail down a steady income? In this economy? Try writing a successful program for the iPhone.” Reading some accounts of the mobile development market, you can almost see entire rivers of long-bearded and red-flanneled developers panning for gold in iPhone streams.

The Times Gold Rush piece was about Ethan Nicholas, maker of the popular iShoot game. In one day the game managed to net its creator around $35,000—a veritable fortune if it remains sustainable in the long term, which is rather dubious. While some applications, in general, seem to have high initial uptakes, they often tend to fall off in downloads after an initial burst of enthusiasm, says Datamonitor associate analyst Ryan Joe, speaking about no application in particular. That is to say, on the off chance that one does manage to strike it big, which is rare, the chances of it holding on to any semblance of permanence are even slimmer.

For all its implicit promises of gold in them thar hills, one has to take a hard and skeptical look at the mobile development space. For one, it’s easy to hype, given its tremendous amount of growth as of late. In a 2008 report, Nielsen speculated the mobile space had “reached a critical mass as an advertising medium,” with 40 million active U.S. users of mobile Internet technology in May of that year. This year, Nielsen reports nearly 60 percent of mobile Internet users in the U.S. and Western Europe intend to increase their use during the next two years, while more than 25 percent of nonusers intend to adopt during that same period of time. Moreover, in the the next 12 months, 39.4 percent of U.S. users intend to increase their reliance on mobile software and applications.

All of the growth that has been generated is more or less brand new. With the advent of the iPhone, an entirely new segment of users beyond the workaday BlackBerry addicts are buying high-end smartphones. According to a 2008 Nielsen report, Apple’s iPhone, which was just slightly more than a year old at the time, had 4 percent of the overall U.S. mobile Internet market captured—second only to Motorola’s RAZR and RAZR2 phones, which had 10 percent market share. The iPhone, however, had the fastest growth, and was set to outstrip Motorola. In the third quarter of 2008, the iPhone also managed to edge out the BlackBerry as the most-used smartphone, according to The Industry Standard.

Out in Front

No matter which device manages to pull ahead in the ensuing race between the iPhone and BlackBerry Storm (or even Palm’s dark horse, the Pre), the fact is that phones are becoming more powerful, and more will be possible in the application space.

As they reach further into the consumer market, those applications might be a true opportunity for speech—which, at best, has faced an indifferent consumer market—to make real headway. Datamonitor predicts that revenue for automatic speech recognition in mobile handsets will grow to $32.7 million this year, up from $26.1 million in 2007, despite the current recession. The mobile app could prove to be the foot in the door that gets consumers thinking about other ways they might speech-enable tasks. If it’s of good quality, too, it might even turn the tide from what some see as a negative public perception with regard to recognition accuracy.

That is starting to happen already, as Nuance Communications recently discovered. The company commissioned a survey, “Moments of Need,” which, among other things, looked at how speech services, like voicemail-to-text, could penetrate further into the market on smartphones. The survey found that while placing calls and texting were the features most frequently used, some segments use their phones for social networking, games, music, and purchases. In fact, 70 percent said they plan to use those features more in the next two years.

“We were encouraged by that because it means that if we get people to start using messaging, we can get them to start thinking about voicemail being [received] visually,” says Eric Collins, vice president of marketing for Nuance Mobile. “They’re willing, often, to accelerate that adoption of new behavior. We think of all that as being very encouraging in terms of what we do and what we sell.”

That said, because this is a growth industry sector, it’s easy to extrapolate outrageous growth. As growth happens, though, some real challenges will face speech along with all of the opportunities.

One of the biggest and most immediate is monetization. While the media is quick to grab eyes with headlines promising untold riches, they also acknowledge the difficulty of really making it as one wades deeper into bodies of their texts. The Times story, for instance, notes that Nicholas’ success is rather difficult to achieve, especially as more developers enter the space and glut the market, often with free offerings. If you need proof of how crowded the space has become, then look at the dozens of applications in Apple’s App Store (the largest such venue) dedicated entirely to just posting on Twitter. How can any one of them hope to be profitable?

One answer may be speech, suggests Daniel Hong, lead analyst of customer interaction technologies at Datamonitor, who notes speech might be a real differentiator in a sea of similar products.

“Speech improves the functionality of applications, especially in hands-free, eyes-up environments. I don’t think speech is a killer app for the iPhone; rather, it improves the overall [user interface] between user and iPhone and iPhone app,” he wrote in an email.

The industry has a number of approaches from which to choose as it tries to monetize some mobile applications: direct sales, software as a service (SaaS), ad-driven (though these have not really come into full fruition yet), and simply as a loss leader to another service or product.

“If it’s device-resident, people are willing to pay for it,” says Irv Shapiro, CEO of Ifbyphone, a hosted voice application and platform provider. “I have a couple of different calculators, and I’m willing to pay for those because they’re standalone on my device, and they do things I want them to do.”

For Shapiro, if an application just gives access to another service, like a FedEx package tracker or a newspaper reader, it doesn’t make sense to pay for the application. The application ought to be drawing customers to the service and generating revenue in some indirect way.

Shapiro’s company models its own pricing as a mix of SaaS and loss leader.

Ifbyphone offers a number of hosted telephony solutions that include voice broadcasting, which enables users to automatically deliver interactive phone calls to a given list of people or entities. The application is free, with Ifbyphone making its money from associated value-added services. With the voice broadcast application, for instance, users get 100 minutes for free. If they go over that time, they begin paying.

“The voice broadcast app is very popular among deejays in Manhattan,” Shapiro explains. “They’ll do gigs at various bars and…what they’ll do is use our iPhone-based ‘vodcast’ app to send out viral messages to 200 people saying, ‘Tell all your friends to be at this bar tonight.’ Well, we’re not making any money off those guys, but of the 200 people who go to listen, a handful of them work for a little larger enterprises, and they say to the guy, ‘How’d you send me that message?’ He says, ‘It’s this cool thing by Ifbyphone,’ and before you know it we have someone who is a second or third relation to the original little app who has become a customer paying us tens, hundreds, or thousands of dollars a month.”

How often that kind of scenario pays out for Ifbyphone is unclear, but the up-front investment is ostensibly low, so even if the company drummed up very little business that way, it might still make sense on a balance sheet. Software development kits (SDKs) are another way that vendors can capitalize on the mobile space. It also gives their technologies a larger mindshare. Many vendors, including Ifbyphone and Vlingo, which offer solutions that voice-enable any text field on a phone, have looked to that model to generate extra revenue.

“I was with a customer yesterday,” says Troy Cross, director of sales for North America and Asia at Vlingo. “We were asking him if he thought speech was strategic. His answer was interesting, and one that I liked. He said, ‘Is a keyboard strategic? Maybe not, but you need one.’”

It’s an apt metaphor. Like speech, a keyboard is just a modality, and if speech becomes a central one in the mobile space, then it might be as important as a keyboard is to a mouse. However, whether “strategic” translates to “profit” is hard to say. Many of the startups writing applications for the iPhone or the BlackBerry Storm are not publicly traded companies, so there’s no way to know whether they’re making money. The same is true of Vlingo, which has its own suite of voice-enabling applications. The company is not public, and analysts will not readily speculate on whether it is profitable—though it has managed to snag some major partnerships, including last year’s $20 million contract with Yahoo! to speech-enable the oneVoice mobile search application.

Gotta Have Faith

Even if speech does provide a boost in sales for a third-party developer, that doesn’t necessarily mean that a company will start bringing in money hand over fist or even meet costs. The sale of SDKs speaks more to the fact that many people are willing to try and do something with speech than its success. Still, faith is starting to build.

Another potential shake-up in the mobile space, and for speech at large, is the entrance of players like Google, Yahoo!, and Microsoft. Asked about what they might mean for speech, one analyst responded simply, “Let the games begin.”

In March, Google launched Google Voice, which, among its services, provides users with free voicemail-to-text transcription. Unlike competing services (which are often paid), the process is entirely automated, and Google makes no claims about its accuracy. Despite the fact that it’s free, most analysts seem to agree that the move will be good for even voicemail-to-text competitors.

“By having a company with the marketing muscle and presence of Google behind it, that’s going to call attention to this particular solution and make a lot of people try it,” says Bill Scholz, president of the Applied Voice Input/Output Society (AVIOS). “People will find themselves satisfied with the free product, and others will find—let’s say—that they got exactly what they paid for and would rather have significantly higher quality. They will start knocking on the door of SpinVox, PhoneTag, Yap, GotVoice, or MyVoice—there are so many of them out there.”

Google’s model is not high-touch, but relies instead on self-subscription, Shapiro explains. “There is a certain amount of support, but what it does is introduces concepts to millions of people very rapidly, and then those areas that can be commoditized, I would be concerned,” he says. “If I were providing directory assistance, I would be very nervous about Google moving into voice.”

Unofficial Confirmation

Many more see it as a confirmation of speech’s importance, a significant inroad and validation outside of the call center space or warehouse in which speech has historically been hemmed in. The more Googles, Yahoo!s, and Microsofts we see in the space, the better, they say. Others, particularly smaller players, feel that the technology giants just might be able to knock off some of the firms that have come to dominate the sector. But as speech makes strides with big names and looks to make bigger market penetration, there is one sector where it has notably stalled: among teenagers.

“Have you watched teenagers and seen how fast they can type? It’s amazing,” Scholz says. “They probably type with two thumbs as fast you can with 10 fingers.”

Scholz suggests that the speed with which many teenagers can text using a mobile-sized QWERTY keypad might partially nullify speech’s wider use-value for much of the demographic segment. While he doesn’t have data to back his assessment, based on conversations he has had and anecdotal accounts, Scholz says it looks like the “antispeech sentiment is moderately strong” among teens.

“Accuracy, Speed,” a PowerPoint presentation of a 2008 Vlingo-commissioned usability test, shows results that seem to support Scholz’s thesis that full mobile keyboard typing might be as fast as speech, and—though Scholz doesn’t say this—more accurate to boot.

Vlingo found the median speed to enter a text message for a spoken interface was about five seconds, whereas typed on a full QWERTY keyboard on a BlackBerry it took closer to seven. When the same users were asked to rate the acceptability of modes from one to five (one being unacceptable, and five being very acceptable), voice came in at slightly higher than four, and typing on the full keyboard came in just below four. Thus, in terms of speed the two modalities seem fairly evenly matched.

In accuracy, typing on a QWERTY keyboard is better asserting itself. The percentage of times a text message was sent correctly on the first or second try in the study drew full BlackBerry keyboard rates of just below 100 percent, while spoken messages varied across devices from the mid-90s to the high 80s. Rates for spoken messages were lowest on a full keyboard BlackBerry.

But regardless of what that might suggest, Vlingo rejects Scholz’s analysis.

“What we’ve seen is that young people like to message,” Cross says. “How they message is not the issue. [Whether] they really like triple-tapping or they really like typing on QWERTY keyboards, they like to message. If there’s another mechanism that helps them do that faster or easier, we found that they’re as excited about it as anybody.”

Looking further into the commissioned findings, Vlingo’s data shows that more than 50 percent of full keyboard users in the study preferred speaking to typing on a BlackBerry.

Collins, for his part, is glad Nuance has not only hedged its bets but invested in both speech and predictive text. “We’re investing in the idea that there might be, for the sake of a demographic or the sake of privacy, for instance, the need to use at any particular moment a type of input modality,” he says.

Private Moments

Collins describes the lives of teenagers as being fraught with privacy perils—shared bedrooms with siblings, homes with watchful parents, schools with teachers and adolescent crushes—in short, a kind of Panopticon where one doesn’t know if he is being overheard or watched. For teenagers, voice clearly doesn’t make sense. The question for speech, then, is that if teens learn to pound through QWERTY, or predictive texting, or even Pearl, then will they be less likely to use speech as adults?

“If you’re seeing young people on QWERTY keypads, then we’re still going to see them when they get to the business world,” Collins predicts.

He adds, however, that teens might end up using voice for email, to get directions, or check on flights. Whether one uses speech or not is as much a factor of age as it is of what the phone is being used for, the device’s form factor, or even the user’s rate plan.

Collins’ assessment is a cautious one, which seems typical of the company for which he works. Nuance seems to be hedging its bets all around with regard to developing third-party applications. It hasn’t entered the direct-to-customer market, preferring instead to work directly with carriers and original equipment manufacturers.

The way Nuance explains its strategy, the company’s partners (which include heavyweights like Verizon) already have a strong market presence and reputation. They can most effectively move products, which leaves Nuance to focus its efforts on building technology. How well that approach is paying out is unclear, though.

For now, the mobile space may actually be like the California Gold Rush. For certain, money is there for the taking. That seems to be borne out by any number of studies and statistical analyses. But just how and where that will apply to speech is only in the beginning stages of making itself apparent—and that process is slowed by the secrecy surrounding the players in the field.

With regard to larger companies—those beyond the one-man Ethan Nicholases of the world—we don’t know with great certainty just who is making a killing and who is getting murdered. It will likely take some time and a few bankruptcies before the field settles. Expect to see plenty of losers at the end of the day, but for the few winners, the gambit will just as likely seem entirely worth the risk.

The Big Strike or Fool's Gold?

Deepdub Partners with Wonderful

Ramco Introduces Chia Conversational AI Agent

DeepL Launches on AWS Marketplace

Ubie Partners with Mayo Clinic on a Voice-Enabled Healthcare Digital Front Door