Put Speech Recognition in Its Place
The media's fascination with Apple's Siri personal assistant has spawned a tremendous amount of conversation about speech recognition, and the resulting positive interest certainly is welcome for most of us who work with speech technologies. However, Siri is about more than just speech recognition. Behind the speech recognition interface, Siri combines natural language processing and, it is claimed, a low level of artificial intelligence to adapt to the user over time. However, when considered without the hype, Siri doesn't offer any functionality via speech that can't be provided through a keyboard interface.
Keep in mind that speech recognition is just one of several interfaces. It can be beneficial when used in the appropriate places for the appropriate tasks, but not in all instances. For example, using speech recognition to access your banking details in a private location is appropriate and is an excellent alternative to a soft keyboard. Running out the door of your home, you want to confirm your checking balance; using a speech recognition front-end to a banking application gets you your balance quickly while you juggle your keys, jacket, and umbrella. However, doing the same on a train or in other public places would not be appropriate, and a soft keyboard would be the appropriate, albeit much slower, interface to keep your information private.
My aim in mentioning the above points is not to dissuade anyone from considering speech recognition as a useful interface, but, rather, the opposite; I wish to highlight speech as a friendly front-end to useful services. Without the useful back-end services, speech recognition is nothing more than a speech-to-text tool.
Companies with useful services provided via the Internet and/or smartphone applications should consider using speech recognition as an alternate interface. If customers find your services useful via a GUI front-end, the success of Siri should provide a good example of how you can extend your service and make it useful to customers through another channel (voice instead of a GUI browser) or via a more natural interface (speech versus soft keyboard).
An example of a very successful mobile offering that could be further enhanced by adding a speech recognizer as an additional interface is Evernote. Though the popular mobile application has over 10 million users due to its usefulness, it relies heavily on soft keyboards for mobile input. The ability to capture information quickly for future use offers an extremely compelling reason to use Evernote. A natural language interface would speed the capture, tagging, and proper filing in the correct notebook, thereby increasing the frequency of use of the application, especially when the user is "on the run," a situation many mobile users find themselves in when quickly capturing information for later use.
When a company considers providing its services via speech recognition, it should also keep in mind the amount of new skilled resources and the considerable time required if speech interface will be provided in-house. Outsourcing the development and delivery of the speech interface can immediately provide experienced speech recognition expertise while increasing the speed to market. Furthermore, outsourcing should allow companies to more profitably structure the cost of the new interface. And if the speech interface becomes a huge success, scaling up is much faster and usually less costly than in-house delivery.
Given the flexibility of today's speech recognition platforms—see "Are There 31 Outsourcing Flavors?" (November/December 2011)—the ability to outsource just the speech recognition portion of a speech-enabled solution is available today. Note that even Apple is rumored to have gone down this route with Siri. Is it any wonder that Siri is so successful?
By adding a speech user interface with the input of skilled speech experts, you, too, can turn your useful service into a customer-pleasing superstar.
Kevin Brown is an architect at HP Enterprise Services, where he specializes in speech solutions design. He has 18 years of experience designing and delivering speech-enabled solutions. He can be reached at firstname.lastname@example.org.