Speech Business: Open Season

The most recent SpeechTEK in /> New York turned out to be a wake-up call for redefining what we mean by 'open' in the IVR and speech world. This opening sentence is itself somewhat of an oxymoron, since IVRs (with a few exceptions such as Syntellect, which jumped on the java bandwagon years ago) have been among the most closed computing-related platforms in existence until fairly recently. As for speech engines, complex algorithms that integrate with acoustic models are hardly the stuff of drag and drop simplicity.

So why is the 'O word' on everybody's lips these days?
The most obvious answer is building market momentum. After a period of 'hoarding' that is typical of any emerging technology, the participants from all aspects of the ecosystem have lately come to realize that 'if we don't hang together we will hang separately,' to paraphrase the revolutionist Franklin. This means interoperability and portability through the most visible of open standards contributions, markup languages such as VoiceXML, SALT, and on the engine and media control side, MRCP. So promoting plug-and-play to buyers who can get warm and fuzzy feelings from not being locked down is a big part of the current 'open season' frenzy. While it's all to the good of the market, and allows IT buyers to map speech into the evolving service-oriented architecture middleware story that surround the contemporary enterprise, we would argue it is only baby steps. Put more provocatively, supporting open standards is not a differentiator in the next two years.

Big 'O' vs Little 'o'
The larger open story that emerged from SpeechTEK is the open source story. Once again, we would note that speech recognition engine development is about as far from open source-compatible as one can imagine, although there are several open source engine projects out there. Here's how open source is impacting speech in the most visible ways: (1) at the browser level via the Vocalocity Open VXI browser program; (2) at the component level via IBM's Reusable Development Component (RDC) contributions to Apache; (2a) at the tools level via IBM's contribution of speech-related editors to the ECLIPSE Foundation; and (3) at the OS level with Linux.

All of these join VoiceXML (at some level) as free resources that can be baked into product. The open VXI browser is the heart of voice browsers from Avaya, Aspect, and Nortel, among others. IBM RDCs and editors will be incorporated into development environments from the likes of Audium, Fluency, VoiceObjects, and Vicorp, among others. Furthermore, they will gain richness and relevance as these platfdorm players and their customers enhance them. OK, this is all pretty obvious--a little 'o.'

The Big O here is not so obvious. That's because it is all around us—it's the air we breathe. Take Linux, for example. It's pretty easy to see that a free OS definitely lowers the cost of the computer platform we run VoiceXML (dare we say SALT and Linux in the same sentence?) applications on top of. But have we thought of the implications of how that savings within the enterprise allows a reallocation of dollars? If we knew that underneath a tightly controlled top-line spend there was money swishing around and Linux is involved, how would that affect our decisions about incorporating open source?

A better and more complex question is around the component level. First of all, before we can even talk about the Java community and open source we need to understand that from a development perspective, if you're not using the Eclipse framework for your development plat form, you're missing out on a lot. Go there now. Once you're there, an interesting thing happens. Unlike Microsoft, which is still locked into the Big Release paradigm for its Visual Studio development platform, good things in the form of new components and capabilities happen every day. New functionality shows up for the Eclipse house you're living in. This is why Microsoft will always be working extra late, trying to play catch-up on its next big release and scrambling the plan of record every few months to retrofit a response into that release.

The positive manifestations of this open source style of development with this idea of continuous enhancements rather than quarterly or semi-annual releases. The other aspect of this style is that broad-based coalitions (a la the RDC phenomenon) can occur rather quickly, again based on everyone riding the same bus. These may yield something useful, or they may not, but unless you're embracing the open source model, you'll be reacting to a steady stream of them rather than being in on the announcement release.

As Michael Dickerson of Vocalocity is prone to repeat, the open source model allows one to support many more efficiently than many supporting themselves. Open source is not anarchy; there is a rationalizing aspect to settling some issues for all participants. Hey, what do we teach our kids? Sharing is good.

Mark Plakias is principal researcher at OPUS Research. You can reach him at mplakias@opusresearch.net , or better yet, connect to the resources available at www.opusresearch.net.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Speech Business: Open Season

Triton Digital Partners with ekoz.ai on Voice-Cloned Podcast Ads

Soul App Launches Full-Duplex Voice Model

Mistral Unveils Voxtral Open-Source AI Voice Model

Vonage Partners with AWS for AI Voice Agent Integration