The Transcription Prescription
Speech Technology Gives Doctors a Shot in the Arm

The medical profession has always found ways to use new technology to make great strides. In the early 1800s, external physical exams were the only diagnostic tools used to prevent and treat diseases. In the early 1900s, doctors began doing blood tests and internal exams to view organs and tissue. Now, physicians have begun to examine human genetic makeup to determine appropriate treatments. And throughout this whole process, the world's population has grown to the point where hospitals are overcrowded and doctors are seeing scores of patients a day.

Dictation has been around in the medical practice for years in its simplest form - doctor dictates notes to assistant. With the advent of the PC, desktop dictation soon became popular among general practitioners. Today, hospitals, clinics and private practices are starting to see the need for new dictation technologies including server-based transcription. So the same doctors who have relied on the pen-and-paper approach for years are starting to use server-based systems to transcribe patient records, insurance letters and other correspondence.

Austin, Texas-based Expresiv Technologies (formerly known as MD Productivity) has 70 employees and more than 3,500 customers including Yale University Medical Center and the Baylor College of Medicine. Expresiv Technologies' MD One allows physicians to dictate their records or reports over practically any device, including a simple telephone. The voice data is then sent to the server and transcribed electronically, after which it is edited by a professional transcriptionist and sent back to the physician via e-mail. The system is 98% accurate and transcriptionists are seeing 8-10 times productivity since editing and formatting are the only tasks that remain.

"With MD One, physicians do not have to learn anything new or change their habits. They simply dictate as they have for the past 20 years - our technology takes care of everything else," said Newt Hamlin, chief executive officer of Expresiv Technologies. "By incorporating voice technology, we are able to make more efficient use of a limited resource - transcriptionists - while enabling physicians to work more productively."

But let's take a step back for a minute. The first speech systems ran on large UNIX servers and processing requirements were high. First, it required special digital signal processing (DSP) hardware to assist in the speech recognition process, as most systems in those days simply were not powerful enough to support computationally intensive speech recognition. It required several hours of "training," and you also had to talk using a style known as discrete speech, where you inserted short pauses between your words ("you...had...to...talk...like...this"). Even though it was not the most natural way to speak, for those people who routinely dealt with large volumes of text (in particular, doctors and lawyers), it was a significant breakthrough.

Improved speech recognition technologies and the development of specialized acoustic models to handle multiple input sources (PDA, phone, desktop, etc.) were the biggest breakthroughs in developing this capability for the medical field. Building upon the success of speech-enabled contact centers, the medical community has begun to implement speech recognition systems using simple interfaces such as the telephone. Transcription is simply an extension of that and there has been a growth in popularity in server-based transcription among medical professionals.

Here's how it works. In hospitals, doctors finish a clinical procedure and immediately reach for a phone and dictate a patient report. A server-based system then automatically transcribes the doctor's words. While a medical transcriptionist still must listen to the doctor's voice to correct any errors, it reduces the time it takes to type the average medical report from scratch. This is an important development since skilled transcription resources (specifically in the medical field) are expensive and hard to find.

That being said, server-based transcription is poised to have a major impact on the medical industry. An automated transcription server can produce transcribed documents at a 20-40% lower cost than a human transcriptionist and allows for more specialized fields such as radiology (an area where speech recognition technology is being used extensively). It also enables transcriptionists, still a vital part of the transcription process, to increase productivity by using their time to edit a higher volume of documents. Transcription companies that service the medical field know that speech technology is the future. Previously, they would send data overseas (particularly to India), where there is a huge market for transcription at lower costs. However, turnaround time was usually 3-4 days since the original voice data was generated on tape, not computers. With the advent of digital voice recorders and a growing need for automated transcription in the U.S., American companies are quickly adopting this technology.

The medical profession produces a lot of time-consuming paperwork, and reducing the amount of time spent in this area is critical. The largest uses of automated transcription right now are form-fill for insurance companies, office visit records and hospital patient status reports. The primary advantage of server-based transcription is that multiple users can employ multiple methods of inputting data. Doctors can use a digital recorder, call in over the phone, use microphone with a desktop or laptop computer, etc. Soon, systems will be developed that will allow doctors to create audio files on handheld devices, which would then be fed into a transcription server and the text returned to the handheld over a wireless connection.

Doctors aren't the only medical professionals that could benefit from this technology. For example, emergency medical technicians (EMTs) could dictate the accounts of a situation right at the scene of an accident - a benefit for organizing patient records and for legal purposes. This helps minimize errors caused when EMTs write notes after an incident, as certain occurrences may be forgotten if the documentation is completed later.

Certain inhibitors, such as language barriers, enrollment process and doctors who are unfamiliar with today's technology, are being addressed as well. Language barriers are no longer an issue since most server-based transcription systems are equipped with multiple languages, including French, Italian, German and Spanish, as well as both US and UK English. The enrollment process for speech recognition systems has traditionally been time-intensive. However, as speech recognition technology improves, the need for enrollment is being significantly reduced. Expresiv Technologies doesn't require its doctors to enroll, but instead use a sample of their voice. This makes it easy for doctors to use the system, even if they have never used a computer before.

The market for speech recognition in the medical industry is in its infant stages, yet it is already booming. According to statistics provided by Tern Consulting, in 2002, the total potential market (medical professionals using dictation) in automated and manual transcription is estimated at $300 million. The actual software license market today (automated server-based transcription) is $41 million. Between now and 2006, there is expected to be a 25% per year growth rate in the adoption of speech recognition technology in the medical profession.

So what does the future hold? The integration of other voice technologies like speaker identification (when you have multiple speakers) has the promise to be a key application for doctors. Systems will be able to identify when the speaker changes, which occurs most often during conference calls. Another future application is data mining of medical records. For example, let's say one doctor prescribes one medication and another doctor prescribes a different medication. Since voice data is recorded and inputted immediately on to a server and converted to text, doctors could use a data mining function to quickly recognize potentially harmful drug interactions and even have the information sent to a handheld device. Another use is addressing insurance requirements. Medical professionals are meticulous about documenting records for insurance purposes, and data mining is a way of extracting documents from a system that can be hard to find if not catalogued properly. Yet another application is secure clinical messaging, which can provide system-generated automatic reminders to clinicians as well as physician-generated notes to colleagues and support staff. It can flag a record as a personal reminder or for another physician to pay attention to a patient's history.

As we continue to create new advancements in speech recognition technology, automated transcription should make its way more into the mainstream. As more medical professionals begin seeing the benefits of speech technology, they will realize the time and cost savings involved with having such services. These services can provide a means by which physicians can spend more time tending to patient care, while off-loading the cumbersome task of data collection.

Toby Maners is the program director of voice solutions for IBM Pervasive Computing Division. She can be reached at maners@us.ibm.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Transcription Prescription
Speech Technology Gives Doctors a Shot in the Arm

SoundHound Partners with Acrelec

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

Vuzix Introduces LX1 Smart Glasses for Warehouses

The Transcription PrescriptionSpeech Technology Gives Doctors a Shot in the Arm

SoundHound Partners with Acrelec

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

Vuzix Introduces LX1 Smart Glasses for Warehouses

The Transcription Prescription
Speech Technology Gives Doctors a Shot in the Arm