How Do You Say That?
If you’ve ever listened to a text-to-speech (TTS) system try to read documents out loud, you’ve probably heard some terrible, even laughable, mispronunciations when the system found a word or acronym it doesn’t know. This is funny when you’re sitting at your desktop, not so funny in the middle of traffic when your GPS tells you to turn right on some unintelligible street name.
To address this, most TTS systems include a dictionary of words whose pronunciations aren’t predictable from their spellings. Speech recognition systems also include dictionaries, but when they mispronounce words, it’s not so obvious—they just don’t recognize the word. Unfortunately, the formats of these dictionaries and the ways they describe pronunciations aren’t standard from vendor to vendor, so compiling dictionaries is a huge amount of work.
The Pronunciation Lexicon Specification (PLS) is a new standard from the World Wide Web Consortium that’s designed to address this problem. I talked to Paolo Baggia, director of standardization at Loquendo, the author of the PLS specification, about the goals of PLS and its features.
Dahl: Why do people in the speech community need to know about PLS?
Baggia: PLS can improve speech recognition and text-to-speech, especially when acronyms or difficult-to-pronounce words, such as proper names and locations that often are pronounced in unpredictable ways, are used.
Dahl: How will PLS make the development of voice applications easier?
Baggia: The PLS 1.0 specification defines the format of an XML document used to store pronunciation transcriptions. Then the PLS document can be included both in a grammar and prompt to customize the pronunciations when needed.
Dahl: Can PLS make internationalization/localization easier?
Baggia: I see many benefits that PLS can bring to internationalization/localization tasks. First, PLS resources might be available to simplify customization in a new language, especially because PLS is standard. Second, PLS can factor out the customization in a single document to be applied across different applications. Finally, PLS documents can be transparently used both in speech recognition and TTS without implementing different noninteroperable customizations.
Dahl: Does PLS work with any language?
Baggia: Yes, the PLS specification is for any language, including European-derived languages, Asian languages, Indian languages, African languages, or Oceanic ones. I don’t think an international use of PLS will raise big issues. At least I hope not.
Dahl: Are there particular languages for which PLS is especially useful?
Baggia: Yes, there are. For instance, languages like English might benefit from PLS more so than languages like Italian or Spanish. This is because English has so many irregular pronunciations. Also, English is always borrowing words or creating new words. In Italian, on the other hand, the rules for pronunciations are much easier and it is an exception to find words that you don’t know how to pronounce.
Dahl: How can a PLS lexicon be used with ASR/TTS/VoiceXML software?
Baggia: PLS documents are loaded in Speech Recognition Grammar Specification (SRGS) grammars and Speech Synthesis Markup Language (SSML) prompts by means of the lexicon element that is defined in both specifications.
Dahl: Who can create a PLS lexicon?
Baggia: To create a PLS document/lexicon is very easy. The language includes only six elements and a few more attributes. It is a simple language to use, and it’s very easy to expand an acronym into the words to be pronounced. For instance, U.S. can be spoken as is or recognized as the United States. Adding pronunciations is more complex because the author of the PLS document has to be accustomed to the phonetic alphabet of the target language.
Dahl: How can people learn more about PLS?
Baggia: My first advice is to read the PLS specification. It is very compact and full of examples. A book that describes PLS, Speech Processing for IP Networks: Media Resource Control Protocol (MRCP), by David Burke, was published in 2007. Also, France Telecom Orange Labs has released an implementation of PLS 1.0 under the Gnu General Public License version 3.
The PLS specification can be found at www.w3.org/TR/pronunciation-lexicon. To test the PLS or provide feedback, send comments to the Voice Browser mailing list at email@example.com.
Deborah Dahl, Ph.D., is the principal at speech and language technology consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group. She can be reached at firstname.lastname@example.org.