Introducing Miller's Law
Last issue we made the somewhat radical suggestion that if (perceived) hardware expense was slowing down your voice application sales cycle, just get rid of the hardware. This led into a discussion of hosted scenarios, including hybrid "borderless" cases where existing customer premises equipment (CPE) could interwork with in-network hosted speech resources using IP and VoiceXML.
In the interest of equal time and all that, we should address the opposite proposition—the argument for really, really dense CPE. In this context "dense" is a good thing: more speech processing per square foot, per slot, whatever metric you want. (And by the way, you can plug your "dense" media server into your "dumb" switch for a really "smart" solution!)
This issue of the best way to deliver the most speech for the smallest footprint has been creeping up on us for the past year now. It is driven generally by telcos and, more specifically, by vendors seeking the carrier dollar. Two camps have merged: the "OTS CPU" school and the "dedicated DSP" school (Kelsey Group terminology). The OTS (off-the-shelf) CPU school is very much dependent on the software products of ASR vendors such as IBM, Nuance and SpeechWorks, which run on a classic client/server model. Voice browsers are clients of the ASR server. You add servers to meet demand. And server farms are populated with ASR running off CPU processing power and telephony blades on OTS servers.
The dedicated DSP school is an alternate model, based on the idea that CPU speech processing is inefficient. Instead of using OTS processors, the dedicated DSP solution uses powerful DSPs in blades that fit into commercially available CompactPCI or other varieties of slots. This is not a new idea; DTMF processing has always been handled at the DSP level and first-generation ASR was relegated to the blade along with touchtone processing.
Furthermore, blade manufacturers are slowly creeping up the value chain. They are adding mission-critical ASR functions, such as end-pointing and speech detection at the blade level, even though it duplicates functions performed by ASR software using CPU resources.
You know all this anyway, right? So here's a wonderful insight from colleague Daniel Miller about the relationship between the basic four layers of silicon, hardware, platform and applications: "Commoditization of the lower layers (silicon and hardware) accelerates innovation at higher layers."
I like this so much I want to call it "Miller's Law". The OTS CPU school would argue that this profound observation supports its architecture—as Moore's Law delivers faster/cheaper commercial CPUs every 18 months, the price-performance ratio will just get better. But that same argument can equally be applied to DSPs, which consume less power overall. You also don't have the client/server complexity to manage, which adds a tax by requiring more servers to manage the servers. DSP evolution also has momentum from cellular handset markets, which are high-volume/low-power customers. Finally, these same DSPs will be running a whole lot of media besides speech, including video for MMS and MP3 for mobile music.
The practical consequences of the dedicated DSP school are potentially highly disruptive, however. While some practitioners can run Nuance or SpeechWorks onboard, there are two scenarios that leave an OTS CPU ASR solution out of the equation. The first is embedded speech running locally in a handset, PDA or other non-server environment where a DSP is available. The other is in scenarios where the dedicated DSP advocate has elected—again in the interest of density—to run its own ASR software algorithms instead.
Astute readers will remember that none other than Lucent is already in this camp, although it uses commercial RISC processors rather than true DSPs on its CompactPCI Speech Processing Board. Interestingly, it does not use per-port pricing on the product.
But back to Miller's Law. As DSP silicon and hardware packaging evolves, it gets commoditized. This shifts innovation to the higher levels where ASR and speech processing live, just beneath applications. The logical outcome of Miller's Law in the speech arena is a proliferation of speech baked into silicon at the blade level. Early evidence (from Israel, among other places) demonstrates that this is in fact the case. As noted in the case of Lucent, this innovation may also disrupt pricing models, which, in the OTS CPU-dominated market of today, are mostly wedded to per-port pricing.
It also means some massive mental gear-shifting in the way you evaluate strategic partners and assemble and price solutions. For some time now, the ASR-on-CPU software model has been at the center of the voice ecosystem. As these speech-enabled blade-runners start to percolate in the market—along with the neo-Copernican idea that maybe the world doesn't revolve around CPU-based software—the whole platform issue gets more complex. Maybe that's an argument for in-network solutions after all.
Mark Plakias is a senior vice president and managing dierctor of communications and infrastructure for The Kelsey Group. He directs research relating to speech, operator services and wireless media technologies. He may be reached at 212.366.0895