Why Didn't You Say So?

There is no question that technology as a whole has brought sweeping and significant changes to the world over the past 10 or 15 years. From the den to the factory floor, people are connected to information and services in ways they once would have never thought possible. Now, wireless connectivity is giving companies even more options for serving up the data that drives business, and getting that information out into the field. The business case for a wireless, mobile work force is simple and compelling. But what is the business case for enabling those wireless connections with speech? A natural human interface, speech technology provides an expressive and convenient way to enter and send data, work with applications, and access information and services through just about any device. It is also the only user interface mode capable of providing a consistent user experience across all devices, large or small. As mobile devices, wireless networks and Web services become standard tools for mobile workers, speech technology will put more power behind the benefits that wireless devices inherently provide. This added value can be quite significant. Take the case of a large North American appliance repair company that recently went wireless. This group employs 13,000 technicians who make an estimated 11 million repair calls annually. If a speech-enabled interface could save those technicians even one minute per call, that would save more than 183,000 personnel hours per year. At a very conservative $25 per hour, that's a little more than 41 cents per call - or more than $4.5 million annually. Eyes Busy, Hands Busy In the world of business such things are rarely so cut and dried, but repair workers and field sales personnel are prime examples of how a speech-enabled interface can help enhance the value of wireless connectivity by streamlining tasks in the everyday work processes of mobile employees. First, step back into the days of paper-based processes. A beverage distributor working from a paper-based system may leave the shop each morning with a raft of forms and invoices. During her route, she must log each delivery, order and customer interaction by pen, and somehow try to keep all those papers organized in her truck. At the end of the day, someone back at the office has to file it all or enter it into the computer system. If the driver needs to access any of those papers a week down the line, she would have to call back to the office and have someone find the file and read her the necessary information. Wireless connectivity has made things much easier for field workers. With a mobile device such as a personal digital assistant (PDA), Tablet PC or Pocket PC connected through the airwaves, the driver has immediate access to information on all her customers - inventory, order status, credit status and more. Information is entered once, organized automatically and synchronized with the back office. Updates, changes and alerts can be pushed out to the field in an instant. But the representative might still have to spend a few minutes at every stop retrieving each customer's invoice on her laptop, or several minutes filling out sales reports or invoices or other forms at the end of the day. Often these forms are provided electronically through Web-enabled applications that allow the representative to fill them out while in the field. Interfacing with this kind of business application is another area where speech has the potential to make the lives of wireless workers much more convenient. If the distributor's forms were Web-based and speech-enabled, the representative could fill them out while driving to the next customer's location by speaking to the application. The application responds just as if she's typing in the data, displaying the information as it's filled in, or reading it back if she desires. As another example, if a field repair technician is examining a malfunctioning refrigerator, he may want to see a diagram of the compressor. He may also want to know if they have one in stock for that model. Traditionally those questions would have required the technician to put down his wrench, climb out from behind the appliance, wipe off his hands, and go thumb through a manual out in the truck, or make a telephone call back to the shop and involve another employee in his query. With a speech-enabled Web application, the driver would simply have to ask the application to find what he wanted - "Show me the specifications for Whirlwash refrigerator model eight." In both examples, the speech interface allows workers to save time and work more efficiently by accessing information while their hands and eyes are occupied. This is a feature of speech interfaces that potentially can be of significant value to many types of businesses and employee roles. The ability to browse for information and use applications by voice turns the laptop, Tablet PC, Pocket PC or other mobile device into a portable administrative assistant, allowing the worker to focus on getting the job done. A Natural Interface Another way that speech can help improve the effectiveness of mobile employees is simply by creating a natural, easy and consistent way for them to interact with their information systems. In the wireless arena there is a wide range of devices, most of which are not large enough to support a traditional keyboard. For these smaller devices, input has always been a problem. The stylus and the alpha-numeric keypad often are not practical for data input. Imagine trying to execute a complex business transaction using tap and touch with a stylus or using just your thumbs on a keypad. Without speech technology, many business applications and Web-enabled forms are out of reach for workers in the field. Allowing these employees to find applications and operate them via voice dialog can enable mobile workers to do more when they're on the road. For example, a real-estate agent on her way to show a house receives a call from her prospective buyers. They'd like to see some additional listings, but she's well on her way and she hasn't brought her laptop. If the agency's Web-based listing service is speech-enabled, she can access it through her cell phone. She may ask for all houses between $200,000 and $250,000 in a certain zip code. The listing service will recognize that she's using a cell phone with no visual display and trigger the speech interface. The speech interface will ask her what she wants, interpret her request to and from the listing service, and read her the listings. If she is using a Pocket PC or other mobile device that has a screen, she might also access floorplans, imaging or maps of the neighborhood to see if it's near a freeway, a listing of the closest schools, or other visual aids, and would be able to show these graphics to her customers when she arrives at the appointment. Meanwhile, Back at the Office Wireless applications have also established themselves within the walls of many companies. Lower bandwidth General Packet Radio Service (GPRS) and higher bandwidth 802.11b "Wi-Fi" networks are proliferating, bringing wireless connectivity to internal and external Web applications from portable devices such as laptops, Tablet PCs, PDAs and converged PDA/telephone devices. Again, given the limitations on text input for smaller devices, adding a speech interface to business applications can add significant value. There is a wide variety of potential applications in which a speech-only or speech-plus-visual (known as "multimodal") interface can be used within these wireless networks. Hand-held devices can be used for e-mail, Web access, project status updates and much of the paperwork related to financial transactions. For example, a factory floor worker with a PDA can use a speech interface to query a SQL database and retrieve information regarding the status of an order. Using speech, he can update that status form, access and update inventory and stocking records, check part numbers, place orders and do anything else already handled via Web-enabled applications - from anywhere on the manufacturing floor. The SALT Approach Building a speech-enabled Web application is easier to do than one might imagine. Using the Speech Application Language Tags (SALT) specification, a royalty-free specification recently submitted to the World Wide Web Consortium, Web developers can add a spoken dialog interface to any Web application or Web service in a way that's easy and intuitive for them. SALT is a small set of XML elements that apply a speech interface to a document using HTML. Since it relies on the same programming model, Web application developers can learn to use SALT immediately, and can use it equally effectively with HTML, XHTML, CHTML and WML. Using SALT, applications can handle traditional visual browsing, multimodal browsing and voice only browsing. Furthermore, developers can serve the needs of all browser types through one integrated, unified Web application, giving each application greater reach to serve mobile users. The benefit of SALT lies primarily in its simplicity and flexibility. Rather than maintain a separate speech infrastructure, SALT can be added into a company's existing Web infrastructure and applications, extending those applications to incorporate speech at the code level. Any Web-based application, form, search engine, email client or other Web service can be enabled to take input in the form of spoken dialog. And since SALT works with existing scripting languages, it gives developers greater flexibility to build more powerful applications. Making Speech Mainstream Considering that companies across the globe are moving to Web-based applications for many traditional business functions, speech-enabled Web applications and services are poised to become mainstream in the very near future. Already, access to speech-enabled voice-response applications, auto-attendants, directory services, customer relationship management systems, Web portals, calendaring and messaging applications are possible on existing wireless networks. To tap into this burgeoning market and make SALT application development more accessible to Web developers, Microsoft is working with a number of industry leaders to build the partnerships and develop the technology necessary to make speech a mainstream technology, such as the recently released Microsoft .NET Speech software development kit (SDK). In addition, SALT has the support of more than 50 industry-leading companies, many of which are also developing SALT-based products. This will help accelerate the development of a wide range of new speech-enabled applications, giving businesses and their mobile workers - even those without access to a PC - even more convenient ways to access online goods, services and business applications. The scenarios for speech technology outlined in this article illustrate the types of very practical uses for these kinds of SALT-based, speech-enabled applications that will add value to enterprises with existing Web investments. The forms these applications might ultimately take are anybody's guess, but at the very least, our refrigerator technician can get the information he needs without putting down his wrench - a minute saved.
James Mastan is the director of marketing for the .NET Speech Technologies Group at Microsoft Corp. and a member of the SALT Forum. He can be reached at jmastan@microsoft.com .
SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues