Speech Technology Magazine

 

HTML5 Is Live

The programming language will become the platform of choice for designing and deploying multimodal applications.
By Leonard Klie - Posted Nov 10, 2014
Page1 of 3 next »
Bookmark and Share

With more than 63 percent of adults who own mobile phones using their devices to go online, according to the Pew Research Internet Project, it's no wonder that companies are devoting increasing attention to mobile app development.

Those looking to incorporate speech elements into their mobile Web sites and applications will have an easier time of it as HTML5, the latest version of the HyperText Markup Language for describing the contents and appearance of Web pages, makes its way through the standards adoption process.

The World Wide Web Consortium (W3C) released HTML5 as a candidate recommendation in late July of this year and as an editor's draft in September. And while W3C expects HTML5 to continue to evolve significantly before it becomes a formal recommendation, many elements of it are already available. "You don't need to wait for them all to be done," advises Jim Larson, an independent speech consultant. "You can start using them now."

HTML5 by itself isn't expected to turn the speech technology world upside-down, but it has generated a good deal of industry buzz for its flexible audio and video features. "HTML5 makes speech more accessible to Web developers," Larson says. "A lot more elements, including speech, are embedded directly into HTML5."

HTML5 provides direct access to speech through the audio and video tags. A third element, the canvas tag, covers graphics.

Speech capabilities were built into previous versions of HTML, "but you have to stand on your head to make them work," Larson says. HTML5 "makes the process of adding speech [to Web applications] so much more streamlined."

Web standards expert Dan Burnett, president of Burnett Consulting Services and standardsplay.com, points out that one of the biggest holes in previous versions of HTML was that "there was no way to get access to media and send it anywhere."

HTML4 notoriously required proprietary plug-ins and application programming interfaces (APIs) for loading elements such as speech. These APIs often prevented applications that were built for a specific browser from loading correctly or working properly in another browser.

HTML5, by contrast, provides a common interface that makes loading elements easier. "With HTML5, there is much less of an emphasis on writing code and more on incorporating programs right into the browser," says Deborah Dahl, chair of W3C's Multimodal Interaction Working Group.

"HTML5 is an open Web platform," Dahl explains. "The idea was to create a universal, open platform for interoperable applications that can work on all [browsers and operating systems]." This applies to both static and mobile Web pages, she says.

That's the appeal of HTML5, according to many experts. Application developers can write code once and have their applications work across browsers, operating systems, and devices. "With HTML5, you design once and you can deploy anywhere," Larson says.

Another stated design goal for HTML5 was support for multimedia. Because of that, Dahl expects HTML5 to become "the platform of choice for designing and deploying multimodal applications."

HTML4 was created for displaying documents and forms online and didn't really consider mobile devices, Dahl explains. "HTML5 is so much more robust. It makes mobile a better fit for the Web and makes sure that the Web supports multiple devices and vice versa."

The Role of APIs

While speech is more ingrained into the basic DNA of HTML5, the greatest impact that HTML5 might have on speech-based application development will more likely come from several companion specifications.

Chief among these is the new JavaScript-based Web Speech API, which makes it easy to add speech recognition to Web pages and to create voice-driven Web applications. The Web Speech API enables developers to generate text-to-speech output and to use speech recognition as an input for forms, dictation, and for command and control of applications and devices, all right within the scripting. The API allows Web pages to control the activation and timing of speech-related events and to handle the results and alternatives.

The Web Speech API "allows Web developers to add speech in many innovative ways to both their mobile and desktop Web sites," 

Page1 of 3 next »