History Calls: Delivering Automated Audio Tours to Mobile Phones

Automated audio tours are a popular resource at many cultural heritage sites around the world.  This application was first introduced more than two decades ago using personal audio cassette players.  These "personal" tours allowed museum visitors to enjoy exhibits more privately, without a human guide or docent, and let them move along at their own pace.  Typically a patron would rent a player for a fee and then follow a prescribed route through the museum accessing the audio information at predetermined locations.  These types of audio tours were successful and popular; and through the years they became regular features at museums around the world.

The 1990s brought a new wrinkle to the business when museums began to employ digital technology.  Automated tours incorporating digital sound were no longer held captive by the sequential nature of analog tape.  Patrons could wander through exhibits accessing information in any order they pleased.  Using digital players, the private and personal nature of automated tours increased dramatically and the popularity of this service continues to rise. 

Though popular with many museum patrons, the management of these types of tours is a major undertaking.  The digital players employed for audio tours require constant maintenance for repairs, recharging, replacement, etc.  Whether the service is provided solely by the institution or is shared/sub-contracted with a private provider, the upkeep of automated audio tour machines is time-consuming and expensive. In a 2001 survey of museums (reported in Museum News July/Aug 2001), respondents indicated that the largest disadvantages of automated tours were: installation and equipment malfunction.   

The main source of cost and frustration with museum audio tours is the rental and continuous upkeep of the audio player hardware.  Large exhibits can require hundreds of these machines.  Thousands of patrons of all ages, interests, and technological know-how rent and use the players, carrying them throughout the museum every week.  Commercial vendors have developed heavy duty players but accidents, careless handling, and absent-minded patrons still result in damage and loss.  Because exhibit information is stored directly on each machine, updates must be accomplished individually. Sophisticated storage racks can facilitate the updates and recharging, but maintenance is still a time consuming operation.  

In reviewing this important and popular museum service, one obvious way to alleviate much of the time, cost, and frustration associated with hosting tours would be to get museums out of the player rental business.  One way to accomplish this would be to deliver audio tours directly to visitors' own hardware: their personal mobile phones.

Given the widespread availability and popularity of wireless telephones, researchers at Southern Utah University chose this technology as a logical option for testing a new method for audio tour delivery. Using patrons' mobile phones as a robust and widely available "player," an interactive audio tour was created using VoiceXML. As the project developed we dubbed the experiment, "History Calls."

History Calls

VoiceXML was used to create an automated audio tour for an exhibit of historic photographs in the gallery space of our university library.  The 20 photographs in the exhibit were hung in traditional museum fashion.  Creation of the automated audio tour followed a four step process: 1) research, 2) preparing the audio, 3) creating the VoiceXML document, and 4) linking to the voice server.


Substantial research was conducted to learn about the people and events depicted in the 20 photographs in the exhibit.   The photographs covered 100 years of local performing arts history dating from 1901 to 2001.   Significant information was found in the university library theater archives and, because of the local focus of the photo exhibit, community members with first-hand knowledge of the people and events depicted in the exhibit also contributed. 

The information gathered through the research process was used to create description placards for each photograph, a three-fold exhibit brochure, and the script for the audio tour.   The tour script consisted of one or two paragraphs per photograph that would translate into 30-60 second recordings when narrated. 

Research also discovered important primary documents within the library's extensive oral history collection.   A review of the recordings uncovered several interviews directly relating to the people and events illustrated in the photo exhibit.  These audio "documents" opened up a new avenue of programming for the VoiceXML audio tour.  First person narrative from the historical recordings could be incorporated into the audio tour and provide a personal, human touch not present in the typical narrated form.

In addition to the existing interviews, several individuals depicted in the photos were still living in the area and they were contacted and interviewed regarding their memories of the pictured events.   These interviews were recorded and the digital sound files, along with the existing historical recordings, became an integral part of the final product.

Preparing the Audio

Because sound was the crux of the project, special care was taken to collect and capture the best possible audio in terms of both sound quality and content.    Three types of audio were designed into the tour and delivered via the VoiceXML document and the voice server.  First, the prepared script for each photograph described above was read and recorded as individual sound files.  To create sound of the highest quality a professional actor served as the narrator for the texts and the recordings were made in a professional studio.  In addition to the narration for each photo, the script also included introductory remarks and brief instructions on how to control and navigate through the automated tour.

The second type of sound file used in the tour was excerpts extracted from oral interview recordings.  Short, 30-60 second clips, directly related to individual photos in the exhibit were cut from the longer interviews and saved as individual sound files.  The new interviews, conducted specifically for this project, were recorded in the studio to ensure high sound quality. 

The older recordings, some dating back to as early as the 1940s, were also edited to provide short clips relating to the exhibit.  It was found that the age and recording medium of these older interviews often made the sound quality unacceptable.  In these instances, digital sound editing software was used to "clean" and improve the selected cuts for use in the tour.  The inclusion of these first person narratives turned out to be the key to the tour's success and gave the project its name: History Calls.

The third audio type used in the tour was not pre-recorded; rather the VoiceXML document and voice server produced the sound as a computer-synthesized voice.  The text for these computer generated comments was written into the VoiceXML code and then translated into audio by the voice server. 

Creating the VoiceXML Document

The VoiceXML document controlling the tour was created and stored on a local Web server along with the sound files.  Before beginning to write the VoiceXML code it proved very helpful and instructive to diagram the options and paths that would be available to the user.  This flow chart of the tour incorporated not only the introductory text and the information on each photograph, but also the pathways and use for several support sound files including: error/repeat warnings, redundant help messages, a looping sound file to indicate a "waiting" mode, and the introduction and instruction for the online user survey. 

The help and instructional messages were included as text messages within the VoiceXML document and were rendered by the voice server into computer-synthesized speech.  This convention helped distinguish aurally between the mechanics of the tour and the narrated exhibit.  The synthesized voice immediately indicated to the listener that he had exited the tour narration and was now interacting with the navigational/help module. 

The numbered sound files, reflecting the numbered photos in the exhibit and the accompanying navigational text messages, served as the foundation for constructing the flow chart.  Once the main trunk was complete, the ancillary branches were added.  These branches included connections to: help information, error warnings, prompt messages, and the online survey. 

Like a movie director's story board, the flow chart served as a step-by-step outline of the entire program.  The programmer's job was to create the paths and links in VoiceXML indicated by the flow chart.  Like most projects, not every eventuality was covered in the initial flow chart and the designer and programmer worked together to iron out details as the project came together. Because VoiceXML is designed for voice applications, the tags facilitate the creation of voice menu systems and their inherent support structure.  This project utilized many of the useful elements VoiceXML makes possible, for example:

  • The system signaled the user that she or he was connected by playing exhibit appropriate music.  Main menu choices or the help selection could be made over the music at any time.
  • Menu and other navigational decisions could be input either through voice or DTMF (phone touch pad).
  • If the user did not input within a predetermined waiting period the program would prompt him or her. After three different prompts, each more pointed than the last, the program returned the user to the main menu.
  • Voice inputs that were unrecognizable prompted another set of nested help messages, each asking the user to repeat their choice, concluding with the suggestion that the user now try the phone's touch pad.

BeVocal provided a toll free phone number for callers to experience the History Calls project.  The number will be available through July. To hear about the following two pictures, call: 1-877-454-0795

And enter either 10 for this picture:

Or 17 for this picture:

Visit http://www.li.suu.edu/onstage/ to see the entire set of pictures from the museum exhibit.

Linking to the Voice Server

The voice server for this project was provided by BeVocal, a VoiceXML hosting service.   BeVocal, like other commercial VoiceXML providers, offers various plans according to the size and complexity of the project.  In comparison to commercial applications, History Calls was a rather small operation and only required a minimal hosting package.  BeVocal provided 24/7 access to a voice server via a single toll-free number with a limit of five simultaneous callers.  They also provided limited tech support as well as several online tools for testing and debugging the History Calls VoiceXML document.

Once the final VoiceXML document was finished and saved on the local Web server, the BeVocal account was created and linked to the local document.  Once this link was established, the BeVocal gateway was immediately available to accept incoming calls from patrons at the photograph exhibit. 

Both a print and an online survey were made available to all patrons at the conclusion of their visit.  Like the tour, the online survey was an automated audio experience administered through the user's mobile phone.  The audio tour dialog contained several prompts inviting patrons to respond to the audio survey as part of their tour experience.  These prompts appeared at the beginning, during, and at the conclusion of the audio tour. Evaluation questions were delivered via a computer synthesized voice and patron responses were received either through voice or DTMF input.  The survey responses were automatically entered into a database for evaluation.  A unique feature of the online survey was an opportunity for the patron to offer an oral comment on the exhibit.  This voice message was recorded and saved to the database as a digital sound file.

A print survey was also made available to all visitors to the exhibit.  An advantage of the print form was that it reached patrons whether or not they took advantage of the automated phone tour.  In this way general data concerning patron reaction to the exhibit was received as well as important data relating to why they did or did not choose to try the automated audio tour.  

Evaluation of the History Calls project revealed several strengths and weaknesses.  As predicted, the upkeep of the tour was simple and very low maintenance.  Changes could be made to the server quickly and efficiently, and these changes were immediately implemented on the user side.  

The first person narratives within the audio tour were a big hit with patrons and received a great deal of positive feedback on the exhibit surveys.  Admittedly, this type of audio could be used in any audio tour system, but hearing the first person narration through their telephones added to the reality and immediacy of the patrons' experience in a way that would not be possible using the digital players currently in use.

Both Voice and DTMF input were tested by patrons.  In the "quiet" environment of the exhibit space most patrons were reluctant to use the voice input option and relied primarily on their phone key pad.  Of those patrons who used the phone tour and responded to the survey, 1/2 relied solely on DTMF to interact with the system and never used the voice interface. Though shunned by these patrons in the gallery, it is suspected that voice interaction would prove more popular in outdoor venues (zoos, historic sites).  Voice interaction is also a powerful accessibility tool and an important benefit to handicapped patrons with limited mobility.

Though not encountered in this test some structures are not conducive to mobile phone reception.   The difficulty of using mobile phones inside some buildings is another factor in favoring outdoor venues for mobile phone tours.

A valid concern understood from the outset of the project was the additional cost some patrons might incur by using their mobile phone as the audio tour playback device.  Only one respondent to the survey indicated that concern about cost prevented him from using the phone tour.   Three other respondents indicated that this concern limited their use of the tour.  Patrons seemed to have widely different service plans and overall concern over cost and roaming was limited.  At present, the expanding market and strong competition is tending toward lower costs and increasing minutes.  It is not unreasonable to expect that concerns over the cost for these types of phone services will diminish in the future as service plans become even more generous.  

Evaluation of the VoiceXML technology, and phone tours specifically, was very positive.  All of the respondents to the automated survey indicated that the tour had improved their experience at the gallery and that they would take advantage of similar mobile phone tours if offered at other cultural heritage locations.
Mobile phones, PDAs, MP3 players, and other wireless handheld devices, as well as new hybrids combining these technologies, are radically changing how many people communicate, interact, and participate in the world around them. 

Though this experiment was conducted in a museum/gallery setting, the VoiceXML system used in the project could easily be adapted to a host of other cultural heritage sites and applications.   The primary goal of the project, to use patron hardware in delivering an automated audio tour, proved successful and invites more research and trials of the use of VoiceXML and mobile phone-based tour systems.

Matthew Nickerson serves on the library faculty at Southern Utah University where he works as the special projects librarian and director of the university Honors Program.  His research interests include digital information systems, distance education through streaming media, and Victorian book design.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues