Speech Recognition and Telegraphic Speech

What Is Telegraphic Speech?
Telegraphic speech is typically observed in language-learning toddlers and people who are re-learning to speak after having suffered some neurological trauma such as a stroke.   It is characterized by minimalistic utterances which often are no more than noun-verb combinations.   For instance, a baby might say, “give juice,” as opposed to a more grammatically complete and socially appropriate utterance, such as, “Can you give me some juice, please?” 

The phenomenon of telegraphic speech got its name long ago in the 20th century when the telegraph was the only way people could communicate quickly over great distances.   In those days, sending a telegram was expensive.  In fact, the cost of a telegram was determined by how many words it contained.   Thus senders were motivated to eliminate any written components that were not semantically critical and “telegraphic” writing was born.

Behavioral Efficiency
There is a great deal of trial and error when learning how to do something for the first time.   Consider the novice user of a speech application.   Everything that the system says and does is new to him.  He hears the system’s prompts and he responds tentatively, not being precisely certain of the consequences of his responses.  Response latency (the amount of time that passes between an application prompt offset and the onset of the user’s response) is a measure of the user’s uncertainty.  Response latencies for novice users are almost invariably exaggerated compared to those of veteran users.   Thus with repeated use, the system appears more predictable to the users, latencies decrease and the user becomes more efficient at using the system to complete his tasks.

But the user’s learning does not end here.  Over repeated experiences, people tend to discover more (if not actually the most) efficient ways to complete a task.  In a speech application for example, users eventually learn when and where they can barge in and their behavior is reinforced because they learn that barging in gets them where they want to go more quickly.

Relevance to ASR Applications
What does all this have to do with speech recognition applications?   The answer concerns speech recognition errors. 

The repeated experience of speech recognition errors can have a number of unfortunate effects on a user.  This is hardly news.  However, one subtle, poorly understood and possibly beneficial effect of repeated errors is that they can induce a type of telegraphic speech in speech application users.   The process is relatively simple and somewhat inevitable given the tendency of users to discover evermore efficient ways to perform repeated tasks.   An example should illustrate the phenomenon:

Fooled Me Once…
Let’s say that someone calls a particular employee at a particular company for the first time and has the following interaction with the company’s human-sounding Virtual Operator application:

System:  Acme Widget Company.  How may I direct your call?
Caller:   Good morning.  You could connect me with Fred Miller in the marketing department please.
System (Error 1):  I’m sorry.  Let’s try that again.  How may I direct your call?
Caller:    May I speak with Fred Miller in marketing please?
System (Error 2):  I didn’t quite get that.  Once more please.  How may I direct your call?
Caller:    I want to speak to Fred Miller in marketing.
System (Error 3):  I seem to be having a bad day.  Please say the employee’s first and last name.  Otherwise say, “department names” for a list of corporate divisions.
Caller:    Fred Miller.
System:    Fred Miller.  Is that correct?
Caller:   Yes.
System:   Your call is being transferred…

Now, let’s fast forward to the next time the caller needs to reach Mr. Miller:

System:  Acme Widget Company.  How may I direct your call?
Caller:    Fred Miller.
System:    Fred Miller.  Is that correct?
Caller:   Yes.
System:   Your call is being transferred…

The Silver Lining
While the example may seem contrived, the fact is that people will do whatever they get reinforced for doing.  If the virtual operator infallibly understood utterances like, “Good morning.  You could connect me with Fred Miller in the marketing department please,” users would continue to make such responses.   But the experience of speech recognition errors tends to punish users for saying such complicated things while simultaneously reinforcing them for saying whatever is minimally required to get them where they want to go.

This tendency to induce telegraphic speech must not be thought of as a “bad” thing.   On the contrary, the phenomenon should be better appreciated and better accommodated in VUI designs.

Walter Rolandi is the founder and owner of The Voice User Interface Company in Columbia, S.C. Rolandi provides consultative services in the design, development and evaluation of telephony-based voice user interfaces (VUI) and evaluates ASR, TTS, and conversational dialog technologies.  He can be reached at wrolandi@wrolandi.com .

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues