Text-To-Speech: E-Mail Readers--The Speech Connection
E-mail has been used in the U.S. for several years. However, much of Europe has been late in exploiting this efficient communication method, and as such, Europe can now be considered a new member of the Internet community.
Also, the explosion of the mobile phone worldwide has created a "mobile mania," demonstrated by an increasing number of mobile workers, telecommuters and virtual offices. Europe has a powerful GSM 900 mobile phone network with roaming in all countries. Mobile phone companies are now offering a multi-protocol subscription to enable users to keep their phone number when traveling to countries using other norms such as CDMA, GSM 1800 or DECT.
Those two significant changes impact the U.S. market. Many products used in Europe for unified messaging, data transmission and voice mail systems are integrated and distributed by American companies, those products having been used for years in the U.S. However, this new European market is re-launching challenges to fit into European requirements such as multiplicity of languages and country specific telephone networks.
The increase of exchanged e-mail is highlighting the lack of interoperability of e-mail and other media. You still need a computer and an internet connection to reach e-mail.
As increasing numbers of people are using e-mail, the market is demanding more and more from this medium. The mobile phone is becoming the universal front-end and classical information support such as fax or letter to e-mail and telephone is becoming obligatory.
In the fast-paced business environment today, there is a need for quick access and response to received e-mail characterized by easy access to e-mail by telephone, wherever it is stored, whatever it contains and wherever the user may be.
A merging system, which provides access to e-mail by telephone, might be a solution and as a result, companies are hastily jumping into the e-mail reader market with little overall analysis. The result is a long list of diverse products, all referred to as e-mail readers.
Two Approaches
An e-mail by telephone system requires 3 majors components. First of all, a way to retrieve e-mails from a mail server or a mailbox, then a system which parses and cleans the e mail in order to send readable text to the last component, a Text-To-Speech converter.
There are two approaches to providing e mail by phone: a personal solution, generally provided with a voice-modem telephone interface, and a centralized or server solution for telco's, ISP's or corporations.
A personal solution consists of a piece of software running on the users PC - which must be left switched on, and a voice modem waiting for incoming calls. The system checks user password, retrieves e-mail from the inbox and reads it aloud thanks to Text-To-Speech conversion.
Consequently, the personal solution system is responsible for regularly checking new messages on user e-mail accounts hosted by their Internet provider.
This kind of solution is pretty simple and cost effective, but has many limitations. All solutions on the market in this category are single language based - primarily English - and feature low-end TTS quality. Pre-processing is at a minimum and is generally unable to handle much in the way of exceptions, homographs, phone number formats, address formats or attachments.
The limitations of the personal solution means that it is best utilized by a small office, home office or personal use.
On the other hand, some ISP's and Telcos are starting to offer centralized e-mail by telephone as a part of their service. A Centralized or Server Solution will have many greater demands made on it. The system must be able to serve multiple callers simultaneously, reliably and with fine text-to-speech quality
Two technical approaches have been used in setting up centralized or server solutions.
Batch conversion: Each new e-mail is automatically converted into voice mail and forwarded to a classical voice mail system. Of course, users might experience some delay before having the e-mail accessible as voice mail with this approach.
This type of solution avoids using massive text-to-speech conversion resources and utilizes the existing voice mail system. This off-line approach is mainly used in unified messaging systems with pre-existing voice mail systems.
The real time approach provides better handling of e-mail, as the system works directly on a text version of the message making it possible to navigate between sentences or paragraphs. It is also possible to forward or reply to an e-mail since the system keeps track of the sender. The user may even be able to make changes in the synthesized voice by changing speed, pitch, or gender of the voice in real time.
Real-time synthesis avoids storing a large number of voice files and ensures the user a true real time access to his mailbox.
Enhancements are Keys to Success
There are several enhancements that represent the keys to success in this market. Among them:
An open system
To ensure market success of real time approach solutions, the system must be open, and support multiple protocols such as POP3, IMAP4 and X400. SMTP support provides a way to make remote forwards or reply by telephone.
A multi-lingual system
Although the US market is strongly dominated by the use of English language, there are pockets where use of Spanish or French language is significant. The situation is quite different in Europe, however, where an e-mail reader system must be able to read messages in numerous languages in addition to being able to automatically detect which language is used. In Europe, the six dominant languages are: English, German, French, Spanish, Italian and Dutch.
A parsing system
For maximum customer satisfaction, an e-mail reader must be good at parsing e-mail. The user must be able to easily access what he wants to listen to. As a result, some work has already been done on "intelligent pre-processing" to keep the meaning of a message when transforming it into voice by choosing the proper transcription.
A system with playback
Although many systems are currently unable to playback attachments, this feature could add tremendous value to an e-mail reader product.
A system which can respond
In any case, it is crucial that once the message has been read aloud, the user can easily respond or forward the message. Without this feature, the e-mail reader service is meaningless.
Future Challenges
Even if telcos are currently satisfied with today's somewhat clumsy e-mail reader products, which do not yet fulfill all customer needs, users will ultimately demand more and better service, such as virtual assistants or intelligent agents which sort user messages providing an overview of messages. This can be achieved with natural language based systems which are able to understand a spoken request and a search engine which is able to serve the request, such as "Do I have a new message regarding my meeting in London?" or "Did I receive a reply from Bob Martin?"
Inundated with information as we are today, the need to rapidly capture the essence of communications is becoming ever more important, and a summary feature will rapidly prove to be indispensable. Combined with natural dialogue, you might ask to your e-mail reader: "Hey, give me an overview of the message about e-mail readers!"
Obviously, this will require much more technology than is included in today's e-mail readers, but it should be appearing in a year or two.
As a final enhancement of the system, automatic translation will be an asset in the increasingly global market, and avoid the text-to-speech converter having to support multiple languages.
In any event, it is crucial that the system and its add-ons are highly reliable. Otherwise, the user could quickly become lost when the wrong e-mail is read aloud in a poor translation!
To be a major player in tomorrow's market, companies will have to team up with linguistic specialists, speech processing companies and experts in intelligent agents. The first step has been taken: e-mail by telephone is a technical success. But to become a market success we have to think in the way the user thinks.
Etienne Lamort de Gail is the Director of Marketing for ELAN INFORMATIQUE. Mr. Lamort de Gail has been with Elan Iinformatique for the past seven years where he has held technical and marketing positions. His current responsibilities include identifying customer needs, developing innovative Text-To-Speech technologies and applications, and implementing strategic partnerships in the field of natural man-machine interface. He can be reached at +33 561 36 07 89 or by email: elg@elan.fr - http://www.elan.fr
TTS Products
TexTalker from Real Time Strategies
Real Time Strategies, Inc., has recently announced a way for users to retrieve e-mail as voice over any cell phone, voice pager, voice mailbox, answering service or standard touchtone phone.
TexTalker is designed to interface with a variety of wireless and/or wire networks, and offers the end-user several methods of e-mail access. Through the product's text-to-speech translation technology, subscribers can have all of their e-mail forwarded to a voice mail box.
"With more than 80 million wireless subscribers, there is a great demand for products that help providers stay one step ahead of their competition," said Ring. "Designed to easily add on to any device (including an existing analog, PCS, cellular or paging network, TexTalker will help service providers control costs and still offer the cutting edge technology that brings in new business and keeps existing customers satisfied."
For more information, contact Real Time Strategies at 516 939-6655 or on the web at www.rtswireless.com.
PrimoVox from First Byte
PrimoVox is a text to speech synthesis system from First Byte that is particularly attractive when there is a need to speak variable data.
Typically, TTS is used for reading Email, proofreading documents, alert and alarm messages, helping with files and speaking any kind of database information.
PrimoVOX is a complementary product to First Byte's other TTS system, ProVoice which worked well in a desktop environment.
Some application sets, such as telephony, are constrained to low sampling rates and 8 bit output. These applications can be best served by PrimoVox, which is designed at the outset for clarity with low output bandwidth.
With PrimoVOX, FirstByte is now shipping its third generation Text-to-Phonetics front end with many enhancements that make the reading of text data more efficient and accurate.
A part of speech tagger has been added to permit disambiguation of words like "record" and "read" whose correct pronunciation can only be determined from the sentence context.
For more information, contact First Byte, 19840 Pioneer Ave., Torrance, CA 90503, or call 310 793 -0611.