-->

Advanced Interfaces: Handwriting Recognition and the Human-Computer Interface

One of the challenges now facing designers of hand-held devices is to make them easier and more natural to use while shrinking their physical size. Both speech and pen input offer the means to eliminate the keyboard. In addition to allowing a physically smaller form factor, handwriting recognition is generally more congenial than a miniature keyboard. There are three major types of handwriting worth considering for a portable device:
  • Constrained character sets in which predefined character strokes represent each letter.
  • Natural printing, with white space separating each of the letters.
  • Cursive handwriting with some or all of the letters joining together.
Constrained characters
Predefined character strokes are designed to reduce the ambiguities in normal handwriting and allow the user to print rapidly with nearly 100% recognition. The characters are usually a single pen stroke, allowing the recognizer to do its work as soon as the user raises the pen. With a set of carefully designed character shapes, techniques can be applied to resolve any possible confusion between letters. Through the use of pen gestures or on screen buttons, changes to indicate upper case, special symbols, numeric inputs, accented characters and so on, are easily accomplished. Some examples of this approach are Allegro from fonix Corporation and Graffiti from 3Com. Consider the string 'abc' written using Graffiti and Allegro. [IMGCAP(1)] Allegro is an example of a constrained character set recognizer. It features a character set that users find easy to learn and to work with, primarily because the letters are natural in appearance. Most users find that they must learn a new way of making just a few of the letters. There are other character sets available and you need to consider how easy and natural they will be for your users to learn. The user effort to learn printing with a constrained character set and will be minimal if the character set is designed to be as natural as possible. Once the user has learned the character set, the payback on input efficiency is very large. In developing the Allegro technology, fonix learned that the minimum accuracy for single-character recognition needed to be above 97% for the users to feel "comfortable" in using the technology. This is comparable to the typical accuracy of a trained typist. With this type of recognizer, it is possible to do input "heads-up" - for example, a doctor might use such a system to take notes while maintaining eye contact with a patient. Constrained character set recognition is accurate, fast and small. On low power microprocessors character recognizers can easily keep up with an experienced user's handwriting, and the Allegro recognizer has been implemented in as little as 48KB of memory. Natural Printing
With natural printing the user enters individual characters with sufficient white space between the letters to facilitate determining where the letters start and end. Sometimes the user is also given a set of combs or boxes in which to write. Unlike constrained character sets, the user does not have to learn a form of writing prescribed by the recognizer designers. A wide variety of commonly accepted letter forms are recognized and most users will not have to learn any new letter shapes and can form the letters in their usual way. While the letters may be made up of multiple strokes, the pen must be lifted between each letter -- the characters cannot have "ink" joining them. Because of this, it is not too hard for the software to find the boundaries between letters, but many ambiguities can still occur with the letter shapes. For example, the letter "o" and the digit "0" are identical, and "c" and "e" are easily confused, especially if the user is not being very careful. With the use of a dictionary or other knowledge of language structure or what input is expected, the rate of such error can be reduced. Cursive Handwriting
With cursive handwriting the user can write in a natural style, with some or all of the letters in a word joining together. Cursive includes "mixed" writing, which is probably the most prevalent form of natural handwriting in the United States. In mixed writing, a word may be entirely printed, partially printed, or completely joined-up. While cursive handwriting places the fewest restrictions on the user, it is the most demanding on the system. Determining where letters start and end becomes a major source of error and ambiguity. To make matters even more difficult, cursive handwriting styles are more individualistic and as a result cause greater challenges for recognizer accuracy. Cursive recognizers work from the "ink" inputs and use a dictionary to construct a reasonable guess at what is being written. While recognition rates are the lowest with this type of recognizer, today's cursive recognition systems are nevertheless amazingly accurate, especially when users are prepared to write somewhat more neatly than they normally would. Even with a dictionary of twenty thousand words, word recognition rates well above 90% are quite common. Choices
The work to implement a product with good handwriting input technology is now pretty straightforward. The appropriate choices of processing power, screen display, digitizer and software need to be considered. The current generation of character recognition technology can achieve high recognition speeds with a 10 MIPS or greater processor and can be implemented in a small amount of memory. For cursive handwriting, a 30 MIPS processor would be more appropriate and memory must be allocated for a dictionary. For example, one cursive recognizer requires 200KB, which includes a 6000-word dictionary. Many current microprocessors can accommodate these performance needs. In selecting a display, the designer needs to consider the tradeoffs between the application information to be displayed, the area to be used for handwriting, and the overall brightness levels that will be desired. The current experience with handheld devices indicate that users are comfortable using devices with display areas at least 2.5" x 3" or greater. One of the most critical components of a successful handwriting implementation is the digitizer that generates the electronic "ink" for the system. As a lower bound for performance and accuracy, the digitizer should have a resolution on the order of 100 pixels per inch, with a sampling rate of 50 points per second, and the data needs to be free of spurious spikes and electronic noise. Below these values, important stroke details may be lost. Increasing the sampling rate above 100 points per second will not improve accuracy very much. The "feel" of the digitizer surface is also extremely important. Some digitizer surfaces require the users to press a little harder than pen and paper. If the system needs too hard a pressure, then the users will be making slight breaks in their strokes that will confuse the recognizer software. Next Steps
For many years the keyboard has been a desirable way to enter large amounts of information into desktop systems, and this is likely to continue. However, as computer enabled devices and handheld computers become more powerful and available everywhere, users will take advantage of much more flexible ways of entering information and interacting with systems, including speech and handwriting recognition. The technologies are available now. The challenge ahead is how to creatively blend these technologies into easy to use products. Doing this requires strong efforts in a partnership of the internal people, the suppliers and outside consultants. In these times of rapid change and quickly introducing products, having a team of people who have the experience and an understanding of the implementation challenges results in a much faster time to market, with fewer "first timer" mistakes. Pen input, as well as speech recognition, gives us the opportunity to build exciting, interactive products that are extremely easy and natural to use.
John Giudice is Vice President Marketing, Interactive Technologies and Brian Mottershead is Vice President Systems Integration at fonix corporation. They can be reached at 781-935-5656 or jgiudice@fonix.com.
SpeechTek Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues