Speech Technology Magazine

 

Where's the Stack?

Speech technology should take a page from Web site builders: freely available tools that work together
By Moshe Yudkowsky - Posted Aug 7, 2015
Page1 of 1
Bookmark and Share

I found the secret Palm Pilots graveyard: a small cabinet in my office. I unearthed over a half-dozen of them, including ones I'd disassembled in forlorn hope of resuscitation, along with a stack of a dozen PDAs from a different manufacturer.

Yes, I've been on a rampage, cleaning up my office and workroom. I dug up a treasure trove of obsolete speech recognition gear. I tossed the old analog headsets without hesitation, along with some specialized hardware. I had a bit of a struggle, but I got rid of CDs, books, and conference handouts that are all that remain of several attempts to create industry-standard APIs. These collectibles have gone to that great recycling bin in the sky.

I found a stack of CDs I myself made, quite professional-looking, with demo VoiceXML and CCXML software included. Not only is that demo 10 years out of date, but the entire concept of distribution on a CD or DVD has gone out of style. If colleagues ask for software, they get a URL; at a trade show, I'd distribute flash drives, and even those are starting to disappear.

What I can't find is a much more important stack. Not a physical stack, but what software engineers call a stack: a collection of standard tools that, working together, create a basis for providing a service. Consider the LAMP stack: Linux operating system, Apache Web server, MySQL database, and PHP Web page creation software. The LAMP stack powers a significant portion of the Internet, serving up Web sites around the world. And the basic cost is zero. They are all open-source software projects.

While just LAMP alone would suffice, you'd need more to produce a decent Web site. Your best bet would be a content management system; I've tried out some for my corporate Web site, and the breadth of each stack—the number of possible integrated add-ons—is nothing short of astonishing.

Once I've created and tested a Web site, I have to deploy it. This too can be done for free, at least for small test installations. Hosting services have mastered uploading a project, based on standard stacks, from your computer to theirs (and to their highly curated, highly reliable companion technology), and hosting these projects on the Internet in what amounts to your own virtual computer. The ease of use is stunning—an hour of my time can get a fully functioning Web site up and running.

I have yet to find anything similar for speech technology—there's no stack. I have found many interesting projects in speech technology available as open-source projects; no doubt some of them are excellent. And I do have a little speech project I'd like to test. Which free offerings provide the tools I need? How do I integrate these tools? Do they even work together?

The latest Web site–building tool I tried offered me a dozen choices of software language for writing Web pages and several choices of database, and asked other questions I barely understood (the tool is written around a stack I'm just starting to learn). These different tools from different groups, rallying around a few informal standards, work together seamlessly.

Speech doesn't have that, as far as I can tell. As I scroll through a list of speech technology projects on Github, only a few standard keywords appear ("X compatible" or "works with engine Y") but usually in the context of connecting to a commercial service. I see no integrated stacks. I certainly do not see anything similar to the ecosystem I described above: a turnkey-free ecosystem with turnkey-free deployment, and no licensing fees.

Please don't get me wrong. As noted, I see excellent projects online. But I don't see anything like the LAMP ecosystem, or the highly integrated content management systems I mention here.

To be fair, some speech technology firms offer free accounts and phone numbers to test your applications, but these applications run on their servers, using their non-open software, with fees of all sorts waiting just around the corner. And of course free is not the important part: As with the current Internet, companies that offer a hosted, proprietary, carefully maintained speech technology environment will flourish no matter what, and ultimately even "free" products cost money once you start to use them on a larger scale. But until we have freely modifiable, freely available, freely deployable stacks—and a few competing ones at that—our industry will lack the main source of rapid, painless, low-cost innovation and experimentation that drives the wild creativity of the Internet. 

Moshe Yudkowsky, Ph.D., is the president of Disaggregate Consulting and author of The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions. He can be reached at speech@pobox.com.

Page1 of 1