September 12, 2006
By James A. Larson program co-chair, SpeechTEK 2021
Forward Thinking

Speaking and Listening to the World Wide Web

Developing and sharing content is a growing activity on the Internet. In addition to passively observing Internet content, users are actively adding to it by uploading their pictures to flickr.com, and sharing their thoughts in blogs and wikis. Readers rate books on amazon.com, and teens post real and fantasy personas on the extremely popular myspace.com, hoping to attract the attention of other teens.

Many contribute content to the same Web page over a period of time. An example is the widely used wikipedia.com, a handy online reference replacing the traditional bulky and expensive encyclopedias. Hundreds of volunteers interactively create, review, and update the expanding encyclopedic content. It is no surprise that readers enjoy authoring interactive Internet content. People genuinely enjoy interacting.

Interactive voice dialogs have created active users who speak and perform actions, rather than passively listen. Content authors have become more like playwrights and less like reporters, while users have become more like actors, rather than observers. With interactive content, the boundaries between audiences and creators become blurred, as lectures become conversations, reports are morphed into discussions, and stories transform into activities.

Because people vary the speed, volume, and pitch of their speech, speech is more expressive than text. Speech is also faster and more convenient than typing. We use our voices everyday to provide content as we interact with others. Just imagine how voice content can enhance our current Web sites:

Goods and services ratings: Web site visitors could add comments and critiques for the benefit of other users. A speaker's tone would better convey her opinions and feelings about a product or service. The resulting Web site experience would be similar to shopping with several friends.
Audio annotation: Web sites could offer spoken commentary, individualized audio tours of a Web site, and suggest alternatives and advice to the Web site's content.
Commentary: Web site visitors could express their views in town hall discussions. They could contribute short stories and anecdotes to a comedy Web site and become an online version of Jay Leno or David Letterman without having their own late night TV show. A verbal wiki can be more interesting than a text-based wiki because of the emotion expressed in contributors' voices.
Traffic and news reports: People could call in and report traffic conditions at various locations using their cell phones. Radio and Internet listeners could hear the most recent messages and adjust their routes accordingly. Anyone can become a reporter by phoning in eyewitness accounts of emerging news events along with pictures captured by their cell phones.
Celebrations: User groups capture and collect audio comments from members of a family or group about topics, events, or holidays. These audio memories can be fondly reviewed years later.
National landmarks, museum exhibits, and other frequently visited monuments: Web site visitors could inform listeners that they just missed seeing the bears in Yellowstone Park, where to locate the secret symbols at a tourist site referenced in a popular book or movie, and reminisce about how they used to play in the grassy meadow that has since been turned into a parking lot.

Users repeatedly return to Web sites after contributing content because they want to learn how others have responded to their contributions. Generally, the aggregate of several individual contributions is more informative and useful than individual contributions, which is why wikis are so popular today. Voice content contributed by Internet users will have a similar effect, making Web sites more interesting and compelling.

We have already seen an explosion of person-to-person voice communications on the Internet due to Voice over IP technologies. Expect to see a dramatic increase in voice interaction with Web content. While we only listen to radio and TV content, we will talk and listen with Internet content.

James A. Larson is manager of advanced human input/output at Intel Corporation and is author of the home study guide and reference "The VXMLGuide" www.vxmlguide.com. His Web site is www.larson-tech.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Speaking and Listening to the World Wide Web

Nex-Gen Chat Solutions with Generative AI You Can Trust

Speech Technologies in the Low-Code/No-Code World

Meeting the Rising Demand for Voice-Based Biometric Systems

More Web Events

Tips for Reviewing Voicebot Vulnerability

Safety and Ethical Concerns Loom Large in Voice Cloning

Apple Proposes Acoustic Model Fusion to Improve Speech Recognition

Aculab Launches Audio-to-Audio Translation