Linda Drake, President, National Verbatim Reporters Association
Please give us some background on your work in the court reporting business.
Linda Drake I've been a court reporter since 1982, when I obtained certification in Georgia. I now co-own a freelance court reporting firm in Savannah, Ga. We report and transcribe depositions, hearings, grand jury proceedings, municipal and superior court proceedings, and various public hearings for governmental entities such as the Department of Transportation, Department of Natural Resources and the Small Business Administration. I became nationally certified by the National Verbatim Reporters Association (NVRA) in 1994. That means that I have dictated and transcribed, with at least 97 percent accuracy, three five-minute testings: a 200 word per minute literary selection, a 225 word per minute jury charge, and a 250 word per minute two-voice question and answer. Having served on the Board of Directors of NVRA since 1999, I became president of the association in August of this year. NVRA is a nonprofit, professional membership organization representing voice writing verbatim reporters. Members include official court reporters, deposition reporters, broadcast captioners, and providers of realtime communication services for the hearing-impaired. Voice writing verbatim reporters make realtime records of spoken words and actions using speech recognition and other related technologies. Additional information about NVRA and voice writer certification can be obtained by calling (601) 582-4345 or visiting the NVRA Web site at www.nvra.org.
How long has speech technology been used in court reporting?
LD I first became aware of the possible use of a speech recognition engine (SRE) with my profession in the mid-90s and purchased my system in 1997. At that time, my computer didn't have a fast CPU, yet I was able to see words appear on the screen as I reported depositions. It immediately became an invaluable tool for my occupation as it reduced, by a very good percentage, the total volume of typing and editing required to produce court and deposition transcripts.
How do the accuracy rates with speech technology compare to other types of recording?
LD Voice writers have always enjoyed higher accuracy rates compared to their stenotype- and pen-based cousins in our field, based upon pure physiology. We must first understand that court reporting is a very high-volume and high-throughput task where delay between identification of sound waves' meaning and the production of their English language equivalents must remain as small as possible. The route taken by an attorney's cross-examination goes from his or her mouth, to my ear, through my brain, then to my "inner" voice. This form of repetition is naturally effortless; it's what we all do in our daily conversation. So the most natural extension of this process is to psychologically switch the repetition mechanism from "inner voice" to "spoken voice." Therefore, we minimize the introduction of cognitive overhead in our task of routing the spoken word to its permanent destination as printed English. This streamlined process means that we can achieve greater than 98 percent accuracy at speeds as high as 350 words per minute, sustained for five minutes. The other forms of reporting add an additional mental and physical layer pertaining to the correct representation, placement and order of material printed by hand, which requires yet another post-production layer of translation to English. The example above, five minutes at 350 words per minute, which is NVRA's annual National Speed Champion test, obviously illustrates non-speech recognition engine production. Court reporters have their own definition of "realtime," which simply means that reporters' production of English is simultaneously transmitted to the reporter's computer screen, the judge's bench and the attorneys' tables. In this mode, using ScanSoft's Dragon Naturally Speaking or IBM's ViaVoice, a voice writer produces English text scrolling on screens throughout the courtroom at sustained speeds varying between 180 and 200 words per minute, with at least 96 percent accuracy. This defines the requirement for our Realtime Verbatim Reporter (RVR) certification.
Has speech recognition improved your performance, and how?
LD My dictation style has become much more clearly enunciated and, by incorporating more punctuation as I dictate rapidly, my accuracy has improved. This is the case with all realtime court reporters I know. The proofreading time for transcript production has been significantly reduced, allowing me more time for additional court or deposition work, which has positively impacted my business' bottom line. My production volume has increased at least 50 percent since I started using an SRE-equipped, computer-assisted transcription (CAT) program.
How is acceptance with court reporting professionals in regard to speech technology?
LD Court reporters have been searching for a controlled means of automating the process, going all the way back to the turn of the 20th century. For years, NVRA's voice writers have known that voice recognition's viability was a merely a matter of applying sufficient computing muscle. Shorthand machine reporters have enjoyed realtime-like automation for almost 20 years, and the majority of voice reporters are eager to assume their role as state-of-the-art players. Those who shun "new" technologies will always be with us, and our profession has a few. But knowing that today's judges and attorneys, who were yesteryear's Commodore 64 and SuperMario users, are comfortable with technology, reminds us that our entire field is moving forward in harmony. The minimum standard in the courtroom is becoming realtime and, in the freelance world, a reporter daily hears, "How quickly can we have that transcript?" Many experienced reporters are interested in or have purchased speech recognition programs designed for court reporters, and students are being trained to report using realtime-oriented (for simultaneous display in the courtroom or deposition suite) speech recognition at the outset. This generation of court reporters will be full participants in the digital streamlining of the judicial process.
What are the barriers for using speech recognition in this process and what can be done to improve usage among court reporting professionals?
LD The greatest barrier I can see is interestingly generational, in that the youngsters are considerably more comfortable than their elders where the immediate display of their words is shared among far-flung video screens. Seeing one's words appear on screen in realtime can be a fascinating and captivating experience for those new to the realtime, "instant messaging" world. It can also add to the stress of a reporter who desires perfection, yet knows that the trial or live television captioning event takes place so quickly there's literally no time to make corrections - and it's happening in front of a "live audience." The ubiquity of voice-based and realtime consumer products and services, such as Sony's Aibo, Sprint PCS phones, Honda/Acura's VR-enabled systems, instant messaging and video, has already increased younger reporters' comfort level with realtime, so we expect a natural shift to VR usage. Our association is full of technology enthusiasts and they are adopting VR at a very good pace, leading to the creation of new educational programs across the 50 states. A vacuum has just been created by the expected funding of the Telecommunications Act of 1996, which requires 75 percent of all new TV programming to be captioned by 2006, and 100 percent by 2008. Captioners and court reporters do exactly the same thing, which has led to a stampede to fill this new market with realtime-based technologies. We know that people will go where the money is, which has led us to begin certification programs for this new area of reporting.
Please provide your general thoughts on the future of court reporting and the role speech technology will play.
LD Court reporting has been in a state of flux throughout its existence. We fully understand that multimodal biometrics will define the new human-to-machine interface. In fact, we are living examples of the commercial application of this evolution in today's (not "some time in the near future, but today's) marketplace. We live in a "realtime" world where instantaneous translation, e-transcripts, streaming text and video, instant messaging technology are in constant demand, and where untold numbers of new applications are now forming in the minds of our students. The emergence of multimodal communications confirms that human interaction is carried over many distinct channels, or wavelengths. Life-and-death situations, contentions over millions of dollars, interpersonal disputes which spill over into litigation are matters which were born of multiple-wavelength, human-to-human interaction, and over which humans will try to convince other humans who was right and who was wronged. The English language's complexities notwithstanding, we know it will take many decades to reach Star Trek capabilities. Humans will always be required to determine the meaning of what they try to communicate, and they will always seek another human to mediate and ferret out meaning. While speech recognition may be rapidly nearing levels of accuracy amenable to general consumer acceptance, the legal world demands perfect understanding of communications where real capital is on the line. Any recording system which processes only one aspect of human communication is insufficient to determine the true meaning of what was communicated. Thus, we believe the judiciary will always seek to place a competent human as the responsible guardian of a true and accurate record of human communication. Now that speech recognition is a reality and high accuracy rates can be achieved, it rapidly being applied to meet the nationwide shortage of professional court reporters and to train captioners and computer access realtime translation (CART) providers for the hearing-impaired population. It is generally accepted that demand exceeds supply. In this regard, we see applications where an SRE solution may be deployed in lieu of an absent human reporter. But in every case where reporter-less recording is being done, state judiciaries still place a human in the work process to certify the final record, ensuring that some person is held responsible. We see speech technology eliminating the national shortage of court reporters in well under 10 years. We also see it as an enabling force for the rapid expansion of the newly-created CART and captioning fields, manned by court reporters or individuals certified to produce these services.
What technology providers are used for court reporting?
LD There are three vendors who have designed computer-aided transcription systems around either ScanSoft's or IBM's SREs for the court reporting profession: AudioScribe, StenoScribe, and Voice-to-Text. Two vendors of stenotype machines have now incorporated speech recognition in their software, Eclipse and ProCat. They are anticipating the needs of non-realtime stenotype reporters who view voice as their avenue to achieving realtime-level incomes. There are also vendors who provide separate direct and Web-based streaming text applications, and the standard cadre of providers who service computer-based occupations.
What can speech vendors do to increase the use of speech recognition among court reporting professionals?
LD Provide speed and duration! Court reporters sometimes dictate at speeds which presently exceed the software and for long periods of time - hours of depositions or hearings without a break (or lunch). Some reporters say that the accuracy deteriorates over time. My experience has been the converse. I find that my computer begins to become more and more "compatible" as a long day progresses. But speed is definitely an issue when you're trying to repeat someone's words very, very rapidly with hardly a breath in between. Specializing the dictionary for use by our profession would also be a change I'd recommend. Since we repeat every word verbatim, we don't use abbreviations. ScanSoft, we hope, is working on an option to disable the use of abbreviations and retain the use of contractions and numbers. We note that 64-bit CPUs are just starting to hit the streets, although they're not quite running in full 64-bit mode. The delay is probably Wintel-inspired, as Microsoft's 64-bit consumer operating system is not yet available. However, 64-bit Linux is on the scene, and so we believe speech systems can reap the benefits of this computing power before other custom applications. IBM's Linux port does not get good reviews, probably because they view speech applications as extensions of the consumer electronics space. Speech is tailor-made for a 64-bit environment, and we'd like to see it happen sooner than later. We're engaged in Transcript-XML and, with Linux's huge lead in internationalization and the overwhelming world trend, we believe it's not unreasonable to expect serious Linux ports.