Speech Technology Magazine

 

The Name Game

The growth in applications for Text-to-Speech (TTS) in voice automated directory assistance and information and spoken directions in vehicles is challenging developers to produce solutions that deliver accurate and unambiguous responses. Dr. Caroline Henton breaks down the “pronunciation puzzles” in TTS and offers five steps for achieving working solutions.
By Caroline Henton - Posted Aug 25, 2003
Page1 of 1
Bookmark and Share

Two growing applications for text-to-speech (TTS) are voiceautomated directory assistance and information, and spoken directions in vehicles. As part of these interactions, it’s vital that TTS delivers accurate telephone directory listings, and gives clear, unambiguous directions. These scenarios present particular challenges for TTS, especially for the text analysis component, where non-standard spellings, foreign words and proper names can wreak havoc. Speaking the potentially infinite class of personal, company and product names correctly is difficult for any TTS system. Here, the focus is on grapheme-to-phoneme (G2P) rules, as well as the contributions of language identification algorithms, and productive morphological patterns in the generation of the most likely pronunciation(s) of these proper nouns. Differences according to perceived language origin, dialects and accents are also explored. By using such linguistic criteria, application developers should be able to evaluate more closely the coverage and accuracy of competing TTS systems. TELEPHONE DIRECTORY ASSISTANCE AND INFORMATION SERVICES Current data from North American telephone directory database service providers (VoltDelta and LSSi), shows the number of national residential, business and government listings to be between 150-170 million. Such databases can provide listings by name, address, telephone number and/or area code. Callers might request ten "Chinese restaurants" in San Francisco, Calif., or all the "Jones" in Cincinnati, Ohio. Concierge services answer inquiries for weather forecasts, movie listings, theatre locations, show times and search for menu items. It’s clear that the TTS-supplied answers need to speak personal names, proper names, foreign names and words, and city/place names (toponyms) both national and international. Whichever is the more accurate estimate of unique last names and business listings, it presents a daunting number for human beings to be able to pronounce them correctly. For a TTS engine — with little, if any, world knowledge — it can lead to paralyzed silence or incomprehensible mispronunciation. FOR MY NAME AND MEMORY In a test conducted on a sample of 10,000 last names, approximately 50 percent of Anglo-Saxon and Spanish names were pronounced intelligibly using relatively clean G2P correspondences (Henton, 2003). About the same number required hand-checking and tuning. Examples 1-10 show particular cases where G2P rules honed for pronunciation of ‘regular’ English words fail miserably for names. The difficult sequences are in bold. Pronunciations appear in standard International Phonetic Alphabet (IPA) transcription.
Observations in points 8-10 may be continued for many other languages, including Arabic, French, Cambodian, Chinese, Hebrew and Vietnamese. Space precludes a comprehensive examination of the tendencies for these ‘imports’ into U.S. English. PRODUCT NAMES As if the pronunciation of personal names were not difficult enough for any hapless TTS system, company and product names are equally unpredictable and linguistically dangerous territory. Product names are an infinite class fed by productive rules based in the morphologies of several languages. For legal purposes names and trademarks must be individuated orthographically; however it’s not possible to legally dictate how they’re pronounced. Consider the list of car model names, where the primary stressed syllable appears in bold face. The second pronunciation occurs mostly, but not exclusively, in U.K. English. Linguistic speculation accounts for these varying pronunciations by assuming that native speakers of English draw different analogies according to their perception of the morphological origins, and by regularizing with the stress patterns preferred in their dialect. U.S. speakers of English have a tendency to place the stress as far forward as possible, while U.K. English favors ante-penultimate stress. In Spanish ‘célica’ means ‘celestial, heavenly.’ This association escapes British owners of this model. Similarly, those who have the first pronunciation for ‘Infiniti’ compare it with the word ‘infinity’ — those who pronounce it in the second manner are drawing an analogy with the spelling pattern found in ‘graffiti.’ As for the pronunciation of the final car in the list, owners of the vehicle don’t seem to be in consensus, nor does this native speaker of English have an intuition because it is unclear whether the ‘rav’ is a shortened version of ‘travel,’ ‘ravishing,’ or ‘rave.’ Its origin may have been an acronym (Recreactional Access Vehicle, perhaps?) and the manufacturer’s marketing department morphologized internally — a new name was born. The realm of automobile manufacturers is a quagmire for unsuspecting speakers and TTS systems alike. The names of several makers contain unique (non-cognate) spellings, e.g. Buick, Cadillac, Chrysler, Chevrolet and Oldsmobile, and their models get more inventive each year. The following selection is most noteworthy. The final three names have no meaning in English. Even the first is confusing — intended to be associated with the large curved sword (but spelled ‘saber’ in U.S. English) — and bestowing a swashbuckling personality on its driver? Owners of the Sebring may remain ignorant of its association with the city by the same name in Florida, and also its pronunciation, unless they looked it up in a pronunciation dictionary (as the author had to do), or simply took the car salesperson’s pronunciation to be the authoritative version. Pronunciation of names (such as Citroën) containing dieresis (or umlaut) present their own set of problems — see below; ‘Xsara’ contains an initial sequence that does not occur in any Western language, and was created presumably for visual typographic appeal on the back of the car. The ‘Escalade’ may have been derived from an attempted blend of ‘escape’ and ‘esplanade’ (two rather odd nouns to combine semanticly). ‘Escalator’ and ‘marmalade’ are equally possible! The intended semantics of ‘Starion’ are even harder to find: ‘star’ + ‘stallion?’ Consider further the following list of automotive and other manufacturer’s names. Regarding possible pronunciations of the Korean car manufacturer Hyundai, Wells (2000, p. 375) states: "TV advertising has used different anglicizations at different times." If the marketers cannot work out which pronunciation is most appropriate to promote the cars to non-Korean speakers, the chances for TTS providers getting it right are slim. In the case of Ricoh, the Japanese conglomerate’s attempt to look like a European word has simply led to confusion for native speakers of English. A TEXT-ANALYZER’S WORST NIGHTMARES Most TTS engines require that all non-alphabetic characters are stripped away or ignored, and differences between upper and lower case are eliminated. In the inventive world of company and product naming, this practice causes more problems for any struggling TTS system. In naming a new company or product, it’s almost mandatory to combine upper and lower-case characters in one alphabetic string with no white space between them, or alter the spelling for "eye appeal," or even reverse characters, as shown in the following commercial enterprise and product brandnames. An asterisk indicates that the TTS text analyzer may omit this character, or produce an unpredictable or weird interpretation. Intermixing alphabetic characters with digits is likely to cause garbled pronunciation seizure or silence. Alternative spellings, and strings that break normal English phonological rules as a result of upper case usage being ignored (marked with # in the list) need to be entered into an ‘exception’ dictionary, together with their handcrafted pronunciation. The practice of gluing together names is paramount in the high-tech world, where novelty presides over morphological normalcy. Ironically, these companies and their products are common customers for TTS solutions and demand the highest quality and unfailing accuracy. The last item is a recent naming for a fashion accessory division created by Siemens. This native speaker of English doesn’t know whether the division is intended to be pronounced (refer to printed publication), where internal analogy for the latter presumably arises from graphic association with the English word ‘celebrity.’ From a marketing perspective, the name is not an immediate and transparent hit; from a TTS perspective it is an assured failure. Further TTS headaches come from the ‘upscale’ inclusion of accents and umlauts in names, and by the removal of hyphens. Once accents have been removed it is impossible to tell which syllable is stressed. Also, removing hyphens results in consonant sequences (‘tngl’ and ‘pnsh’) that would otherwise never occur in English. Portmanteau blends are common in product names, for example: AdvantEdge, Aristokraft, Cuisinart, Electrolux and Magiwipe. While native readers of English can easily parse the components of these names, hapless TTS systems do not have this knowledge at their disposal, and will gasp at such names. STEPS TO WORKING SOLUTIONS Here are some recommended steps to achieving a working, accurate and intelligible TTS system that has to cope with a large number of proper nouns and personal names: • Feed many thousands of names from a large (correct) database through G2P rules for English.
• Hand check for accuracy (<50 percent anticipated).
• Create a lexicon of names with phonemic transcriptions.
• Design the TTS engine to access the lexicon first, then fall back to G2P rules.
• Check pronunciation accuracy of names again. Steps involving language identification have been omitted. The reason is simple: it may be useful to know that a word has, for example, a Portuguese origin, but that knowledge and labeling will give little-to-no indication of how U.S. English speakers (with varying degrees of fluency in English as a native language) have chosen to pronounce that word. Many socio-linguistic variables come into play. Phonemes are anglicized to varying degrees; they may be added, or omitted. Word stress patterns differ according to speakers’ subconscious analogies, and according to their dialects/accents, their own native language, their experience with other languages and their differing perceptions of the origin of the word. CONCLUSION Commercial TTS systems must pronounce names containing sequences from their origins in inter alia Spanish, Chinese, Danish, French, Scottish Gaelic, German, Icelandic, Irish, Japanese, Korean, Norwegian, Russian, Swedish, Vietnamese and Welsh. TTS designers have to predict and provide alternatives for many names that are bifurcated in the way owners pronounce them. It is essential to have 100 percent accuracy in saying personal, place, manufacturers’ and product names — even if native speakers and advertisers are ambiguous about the correct pronunciation. In a move toward linguistic prescription, it is the creators and owners of those names who must determine the accuracy of the pronunciation, but only after linguists have advised them on the probability of the name taking root in native speakers analogy-driven lexicons. Footnotes 1. Taboo associations account for the respelling and marked pronunciation of the English name ‘De’ath.’ A similar spirit, founded no doubt in persistent Puritan values, applies to the U.S. English pronunciation of the German name ‘Koch’ as (refer to print version), to avoid any auditory resemblance to the word ‘cock.’ 2. Natives of Sacramento, California, routinely pronounce the name of their local park in this way. References 1. HENTON, C. (2003). Accuracy of personal names’ pronunciation in a limited TTS system. Unpublished. 2. WELLS, J.C. (2000). Longman Pronunciation Dictionary. 2nd. Edition. London, Pearson Education.


Dr. Caroline Henton is Founder and CTO of Talknowledgy.com. Dr. Henton can be reached at carolinehenton@hotmail.com or (831) 457-0402.

Page1 of 1