Did you ever think that you could one day go to WalMart and add items to your shopping list by speaking into your phone? How about getting into your car and having a navigation assistant speak directions to you? Until very recently, such scenarios may have seemed futuristic at best, but these and thousands of other applications are now part of the commercial market, thanks to speech technology developer programs.
Several speech tech giants are sharing the wealth of their technology and offering developers platforms to play, create, and ultimately sell applications that can turn into profits.
"The whole idea is to allow developers to build applications in speech where there's minimal or no knowledge of speech," says Mazin Gilbert, assistant vice president of technical research at AT&T Labs.
Angel recently unveiled Lexee, a software development kit (SDK) that uses Angel's Site Builder to create conversational, personalized voice solutions that can quickly adjust mobile solutions depending on customer needs or market changes. The Lexee SDK is a Web-based point-and-click application that does not require any coding background.
Lexee enables businesses to provide a voice-activated iOS or Android mobile solution to their customers, track the impact of their mobile solutions with analytics, and allow their users to be more productive and flexible by letting them have conversations via their mobile apps.
Prior to the official launch, Angel developed an app using Salesforce.com which allows users to verbally request information and reports to be pulled from their Salesforce.com account. Rather than search for information manually, Lexee enables mobile applications to perform tasks and execute transactions, such as updating sales information or quickly pulling reports, all by voice commands.
In July 2012, AT&T launched seven AT&T Watson-enabled speech APIs that developers can access to quickly create apps and services with voice recognition and transcription capabilities.
The first set of APIs focus on seven areas: Web search, local business search, question and answer, voicemail to text, short message service (SMS), a U-verse electronic programming guide, and dictation for general use of speech recognition.
"This includes an open speech or generic API, and that is sort of the holy grail of being able to transcribe speech into text," Gilbert says. "That API is trained on a million-plus words and hundreds of thousands of speakers, and that's available to developers who want to do speech recognition and don't have a clear notion of what application they need."
Initially, the APIs will be available for Androids and iOS, with more AT&T Watson Speech APIs coming for areas such as gaming, social media, speaker authentication, and language translation.
Developers at Verdatum, a provider of software focused on voice productivity, management, and workflow, used AT&T's APIs in-house for its Verbble solution, a voice input application for mobile workforces that provides a proprietary talk, type, and tap shell over the top of business applications and services, such as Salesforce.com, Oracle CRM On Demand, Word/DOCX, PDF, SQL, and Outlook/Exchange. Verbble enables users to employ a native device application to complete data input, validation, and editing. When the data is input, a single tap routes it back to the original system programmatically, as if the user was at her desk the entire time.
"Speech recognition is only one part of the Verbble platform, but it is certainly the most visible and impactful component," says Michael Fitzpatrick, Verdatum's chief technical officer. "Integrating the AT&T Speech API expanded our platform's technology reach and prepared us to be able to leverage some of the pending advanced functionality of the AT&T Speech API."
Nuance Communications has the largest speech developer program, NDEV, which has more than 12,000 subscribed members.
NDEV Mobile brings Nuance's Dragon speech platform to mobile developers via the Dragon Mobile SDK, and offers broad language coverage and support for mobile app developers supporting the iOS, Android, and Windows 7 platforms.
The program has yielded many voice-enabled apps, including Price Check by Amazon, Ask for iPhone, Merriam-Webster, Dictionary.com, RemoteLink from OnStar, SpeechTrans, Yellow Pages, AirYell from Avantar, iTranslate, Taskmind, SayHi Translate, Vocre, Bon'App, Dolphin Sonar, and Sonico iTranslate, among others.
Coupons.com's Grocery iQ is an iPhone and iPad app that integrates Nuance's Dragon voice technology through the NDEV Mobile developer program. Grocery iQ creates, manages, and shares shopping lists and helps users find and use coupons as well. Free to download, the app lets users add items to their shopping lists by simply speaking what they want. By integrating Dragon voice recognition into the iOS version of Grocery iQ, users are able to dictate multiple items in a continuous list for automatic recognition and addition to their list. Users can also add items by typing or scanning bar codes on product packaging using the camera on their mobile device.
Tearing Down the Walls
Rather than viewing its members as competitors, speech tech companies are seeing the benefits of extending their technologies to the developer community and are focused on lowering the barriers of entry.
At Voxeo, the welcome mat is out for developers. "We've always been open to developers," says Tobias Goebel, director of mobile strategy. "We're really tearing down the walls of getting in touch with our technology, and you can start building apps and trying them out free of charge."
Voxeo developers can get a free account in its host environment as well as a free download of its premise product. The company has a customer developer forum called Evolution, where users can sign up for a free account. From there, developers can start writing voice XML applications and upload voice XML scripts, and Voxeo will provide free phone numbers for testing. The company also offers APIs for SMS, so in addition to building IVR applications, developers can build SMS apps, such as confirmation messages.
Goebel says that this is a fully self-sufficient approach that lets developers get their own accounts, resources, and documentation.
"You don't even have to sign up to get an evaluation download of our software; you don't have to interact with Voxeo at all," he says. "If you like how it works and you're ready to go to production, then you would contract with us. We also offer deployments without a contract, which is a pay-as-you-go model."
VoiceVault has had a developer program in place for more than a year, and has roughly 400 developers. It also has a self-registration program and provides access to its APIs and its voice biometric engine, which can be accessed free of charge for 90 days. Documentation is free, and there is a self-help community developer forum.
"We have very few requirements. We've made it open, easy to use, and as frictionless as possible," says Nik Stanbridge, director of product marketing at VoiceVault. "If someone wants to join our program, they don't have to talk to a salesperson. We want to encourage developers to talk to each other and not feel as though they would get a hard sell from us, which tends to put people off."
After the 90-day trial period, a developer may reach out to the company and extend the trial. This is done on a case-by-case basis. Typically, at some point during this extended trial, project and pricing are discussed. Since each project is different, costs are custom tailored.
Powered by the AT&T Watson speech engine, AT&T's Speech API supports speech-enabled apps that run on virtually any cell network in the United States. There are seven speech contexts available that are built and maintained by AT&T, and which the company tunes on an ongoing basis. Developers can send audio, and AT&T sends the text of what an end user said. Key features include native and HTML5-based SDKs and seven optimized speech contexts.
"We're providing the software that goes into your application, and this software basically talks to our API that sends speech in real time and is able to recognize it," Gilbert says. "Some developers want to build their own APIs, they want to specialize in their platform; some of them don't have that expertise and they want to pull the software into their application. We're doing this so people don't have to reinvent the wheel."
Gilbert, stressing the openness of the program, says there are no prerequisites for developers. "There are no requirements," he says. "The whole idea is that we're trying to fuel innovation in the industry, we're not acting as a filter. Building speech applications takes anywhere between a minimum of three months to three years, mostly in IVR types of applications. The approach we've taken is to make it simple—they are just plug and play."
There is a registration or introductory fee of $99 for all of AT&T's APIs. In the coming year, there will be a monetization pricing model.
Nuance features three service tiers for developers, Silver, Gold, and Emerald, and provides access to Nuance Mobile SDKs, training materials, partner APIs, and support services to facilitate Nuance speech and text input solution integration with developer applications.
Our mission "was to make it really easy for third-party developers to integrate speech, both dictation as well as text-to-speech, and integrate those core services into applications," says Kenneth Harper, director of product management and marketing at Nuance. "We've seen some developers who have been able to fully integrate Dragon Dictation and text-to-speech in their application in a matter of days. That is our goal, to simplify how third parties can get access to this technology."
The Silver program is free to develop and free to go live. It provides automated speech recognition dictation and search models for more than 20 languages; network TTS for over 45 languages; a speech kit SDK; support for Android, iOS, and Windows Phone 7 platforms; Bluetooth; a customizable UI; and help with an application via a centralized speech resource.
"The Silver program is targeted toward those types of application developers where volume isn't high," Harper says. "Maybe they're going to be shipping their application to 10,000 consumers, in which case there aren't going to be a lot of transactions that we're handling on our side."
The Gold tier runs about $300 to develop and $3,000 to go live. It provides the same offerings as the Silver program, plus an HTTP interface, help via an online ticketing system, and Secure Sockets Layer.
"Here, we do have a fee. In exchange for that, we support a lot more volume," Harper says. "This is for developers who expect very high downloads, and this is where some of our more successful applications fall into."
The Emerald program offers everything in the Silver and Gold tiers, plus additional speech capabilities, dedicated tech support, the highest service-level agreements, and consulting services. Pricing in this model is customized, with more services offered depending on a developer's unique needs.
"Maybe they have a unique domain or they want us to build a custom language model to help improve accuracy of dictation or maybe they want us to do some consulting in the area of design," Harper says. "This is where there is a lot more customization, and we also expect that with these types of applications, volume is going to be highest. This is where we have very successful applications that are on hundreds of thousands of mobile phones."
In the coming year, Harper says Nuance has a goal of exposing more speech technology to third parties and expanding the Silver and Gold communities.
"Over time we're going to offer more abilities to customize dictation and text-to-speech for specific use cases that developers care about," he says, recognizing the growing need for customized and vertically focused solutions.
Harper expects a steep increase in developers and handset manufacturers alike using speech technology. "A big part of our business is selling our solutions to handset manufacturers, such as a personal assistant that's available out of the box. But that's really only half of the mobile phone ecosystem. Our vision with the NDEV program is [to] get lots of third parties also integrating speech into their applications for the downloadable market. We're starting to move in that direction, where there's going to be a pervasive voice ecosystem on the phone."
The future for speech developers looks bright. "Speech has become mainstream," Harper points out. "One reason for that is what Apple has done with Siri. Apple has done a tremendous job of using speech and natural language to create a good experience around speech-enabled interfaces. That's created a certain level of awareness in the market that we haven't seen before."
Check out the following Web sites for more information on the developer programs highlighted here.
- Angel Lexee SDK: http://www.angel.com/labs/lexee.php
- AT&T Developer Program: http://developer.att.com
- Nuance NDEV Mobile: http://dragonmobile.nuancemobiledeveloper.com/
- VoiceVault Developer Program: http://www.voicevault.com/developers/
- Voxeo Developer Program: http://evolution.voxeo.com/
Staff Writer Michele Masterson can be reached at firstname.lastname@example.org.