March 1, 2008
By Moshe Yudkowsky President - Disaggregate
Industry View

The Next Small Thing

Many years ago I purchased my first car. When I brought it home, my father took one look at it and said, "Toyota? Whoever heard of Toyota?

Although we laugh today—and my father drives a Toyota now—at the time he was absolutely right. In the United States, Toyota was a relatively obscure company with a sparse dealer network. So how did Toyota reach the success it now enjoys? The company didn’t gamble on a single monstrous success—the next big thing—but instead pursued something it could definitely do—many small but important things. In incremental but significant steps it improved quality, managed prices, expanded its dealer network, and built stylish cars.

Every week I hear about the next big thing in speech technology; most of them are "visionary" new applications or "explosive" new market segments. To put it generously, I find that people promote their ideas a little too enthusiastically. As an antidote, let’s talk about the next small things instead. Small is easier than big. Big is out of your control; small can happen now.

A few slow trends will have enormous impact. Perhaps the most important in the U.S. is the aging population. The pool of young, low-cost workers will diminish and encourage adoption of cost-saving speech technology. An increasing elderly population will require geriatric care and spur the search for innovative solutions, one of which can be speech technology.

Teens and Bachelors
Another slow but increasing social change is the tendency of individuals to live by themselves until rather late in life. Many of these individuals satisfy their needs for human interaction through cell phones, instant messaging, Twitter, social networking Web sites, and other text-based services. Speech communication is in relative disfavor with the coming generation. Your teenager likely favors text over talk, but is that just a fad, a lack of social speech applications, or a fundamental limitation of speech technology?

Speaking of technology, we’re starting to see incremental improvements in the way speech cooperates with other methods of sending and receiving data. Mixing different methods to find the best and most convenient means to communicate is now second nature. How often do you follow up a phone call with email, or vice-versa? Most of my business telephone calls are now simultaneous instant messaging sessions as my callers and I trade files and snippets of text. I have yet to see a big thing from these multimodal interactions, but companies continue to release incremental improvements, demonstrations, and products.

The quality of speech technology continues to depend on small improvements and increased computer power. Accuracy of speech recognition and biometrics and intelligibility of text-to-speech increases as vendors introduce slightly more refined technology on a regular basis. Not that I’m complaining; the cumulative effect over the past two decades is nothing less than magical. And instead of trying for big things, many vendors currently focus on incremental improvements in very specific areas, such as telematics.

Another interesting development is speech-to-text (STT). Just as we saw with early, less-accurate automated speech recognition systems, STT appears in low-stakes applications where accuracy doesn’t matter as much. For example, a service company that matches advertisements to online video uses STT of the video’s audio stream and other data to select which ad to run. If there’s a mistake, no real harm is done; in the meantime they can refine their ad-selection algorithms. STT isn’t setting the world on fire at the moment, but accuracy continues to improve and people continue to find new niches.

Finally, I’d like to close with a rather unusual but important small thing. Progress in our industry, like any other, depends in part on how well we interact with each other to trade ideas and brainstorm new ones. Until a few years ago our social networking revolved around national conferences that happened once or twice a year. Lately we’ve seen the creation of local speech technology professional associations that conduct small-scale meetings three or four times a year. For nothing more than some membership dues and the cost of a local commute, more and more of us can meet our peers on a regular basis. This leads to better professional development, business opportunities, and ultimately to better speech technology.

Moshe Yudkowsky, Ph.D., is president of Disaggregate Consulting and author of The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions. He can be reached at speech@pobox.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Next Small Thing

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API