November 6, 2023
By Leonard Klie Editor, Speech Technology and CRM magazines
Editor's Letter

Interoperability Benefits Everyone, So Everyone Should Get Behind It

Research firm Gartner has predicted that by 2025—now less than two years away—at least 75 percent of U.S. households will have at least one smart speaker. I’m proud to say that I’m ahead of the curve. In my living room, I have had for almost two years now an Amazon Echo Dot with the Alexa virtual assistant. I use it to make phone calls, set reminders, listen to music, place and track orders on Amazon, ask questions, make weekend plans, build shopping lists, and receive sports, news, and weather updates.

I also have an iPhone with Apple’s Siri, which I use for too many tasks to list here.

And in my car, I have Honda’s voice assistant, which I access through a button on my steering wheel to make and receive phone calls, adjust the temperature in the car, and play the radio.

I could certainly do a lot more tasks with these systems. I haven’t done some of them because I’ve been hesitant to turn over some of my most personal or financial information to these systems out of an abundance of caution. The speech capabilities in each of them are very good, but ongoing recognition errors have prevented me from doing some other tasks. And then there are domain-specific restraints as well. In the car, for example, I prefer to keep my eyes and ears on the road, and so I wouldn’t want to book an airline trip while driving. I also would be reluctant to conduct a complex financial transaction on my phone while riding on a crowded city bus or subway car.

And yet one other hindrance is the development environments and architectures behind each of these systems. In some contexts, I’d love it if I could start a transaction on my iPhone when I’m out and about and then continue where I left off on my Echo Dot when I get home, but those kinds of handoffs are largely impossible right now. Each system provider and speech application developer has created its own application programming interfaces (APIs) and erected walled gardens around those APIs, limiting access to resources outside of their tightly controlled ecosystems.

As our cover story, “Industry-Standard Speech Application Building Blocks Take Shape,” points out, much more is possible, but only if the development process becomes easier and interoperability becomes more of a reality.

While the need to improve interoperability is clear, a lot of work must be done to forge a comprehensive set of standards, the article asserts. It also notes that despite the daunting challenge, vendors, ad hoc consortiums, and academic institutions are trying to fill the many voids.

Already we are starting to see quite a bit of progress. The article singles out Amazon for extending its ecosystem and enhancing its architecture so the same software can run on Alexa as well as other home assistants. Other organizations, like the World Wide Web Consortium and the Open Voice Network, have taken a lead role in these efforts, but where is the rest of the industry? Why haven’t other platform providers, speech tech application developers, mobile service providers, and others jumped on board yet? Interoperability would benefit everyone, and so everyone should get behind it.

The upsides for the industry are huge. They would include the ability for users to gain access to more sources, more ideas, and more content; expanded opportunities for internal and external innovators to develop new products and services; quicker time to market, as developers will be able to build once and use everywhere; and greater vendor choice for best-of-breed and partner components. Additionally, with interoperability, development costs will come down and the pace of innovation will go up.

Businesses, consumers, and third-party application developers have expressed the desire for more freedom and flexibility. We just need the rest of the industry to come together to make it happen.

Leonard Klie is the editor of Speech Technology magazine. He can be reached at lklie@infotoday.com.

Interoperability Benefits Everyone, So Everyone Should Get Behind It

Deepdub Partners with Wonderful

Boost.ai Introduces Adaptive Voice

Krisp Launches Listener-Side Accent Conversion for Meetings, CX and Voice AI Agents

Revmo's Voice AI Rollout Yields 71 Percent Conversion and 99.9 Accuracy Across Donato's 174 Stores