The State of Speech Developer Platforms
Speech is becoming a new way to interact with intelligent devices, whether they are traditional computer systems or new Internet of Things endpoints. The ecosystems needed to deliver such capabilities have rapidly matured. The basic application building blocks have been in place, but they often require experts to use them. Consequently, the leading speech application development vendors have been focusing on easing development tasks, extending their system capabilities, and courting third parties. The end result is the technology is becoming more commonplace. “More individuals are being exposed to speech applications in their homes, so they are becoming more comfortable using it at work,” says Rakesh Tailor, director of product management for speech at Genesys.
The Year in Review
Because of speech’s tremendous potential, large, well-known, extremely deep-pocketed, and primarily consumer-focused technology suppliers, like Amazon Web Services, Apple, Google, and Microsoft, have been driving speech development. The market has been in a fledgling state, so building applications has often required customers to tinker with obscure application programming interfaces (APIs). Vendors are taking a few steps to simplify such work.
In April, Amazon enhanced Amazon Transcribe, a managed speech recognition service. Now, enterprises can add new words, such as product names, domain-specific terminology, or names of individuals, to the base vocabulary. Also, to improve accuracy rates, the system recognizes when the speaker changes and responds appropriately.
Vendors are also trying to make their systems easier to program. In August 2018, /7.ai improved its modeling workbench so that it automates more functions and offers more self-service options. A new graphical user interface enables non-data scientists and business analysts to build and test conversation models. Machine learning features suggest possible improvements to intent models so that businesses can better understand what a customer is trying to accomplish.
Third parties are also trying to address application development challenges. For instance, Google Actions require that developers create packages for Google Assistant manually and deploy them via an often cumbersome command-line interface. Dialogflow enables developers to work with a web interface when building such connections.
In addition, suppliers like Aspect Software, Avaya, Cisco Systems, Enghouse Interactive, Genesys, Nuance, and Plum Voice have created development platforms that focus on enterprise needs, mainly in contact centers, customer relationship management, and sales force automation.
In July 2018, Genesys integrated its solution with Google Cloud’s new Contact Center AI solution. The integration enables companies to rapidly deploy bots.
In March, Nuance added tools and starter packs based on common terminology by industry so customers could accelerate development time and more easily build cross-channel applications. Companies can use the solution with IVRs and web-based virtual assistants to improve automated customer dialogues.
While the plumbing work is helpful, the end goal is to integrate speech functionality into business applications. The speech development vendors are trying to encourage third parties to add speech capabilities to their core systems. The more capabilities offered, the more uses for their solutions. Currently, the race to gain third-party support is not close. “Amazon seems to be well ahead of the pack in terms of third-party support,” says Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interactions Working Group. The vendor terms its software applications as skills and has been adding skills at a rate of about 1,000 per week. Other suppliers have been reluctant to list how many third parties they have.
Want to understand what is happening with speech application development platforms? Then take a close look at smartphone application stores.
As we've done now for the past few years, Speech Technology magazine is again dedicating its first issue of the new year to a preview of what's to come in our small corner of the world. We've highlighted the six technology areas where we see the most impact: speech engine, speech analytics, voice biometrics, virtual assistants, speech developer platforms, and assistive technologies.
Finding a Voice in the IoT