Captioning and COVID-19 – Who’s Not Getting the Message?
With the COVID-19 pandemic forcing so many people to stay indoors, there is a huge surge in online and streaming media activity. Countless people are working from home or tuning in to get the latest health and government updates. However, not everyone is getting the vital news and information they need. The deaf, hard-of-hearing, elderly, and those dealing with language barriers are currently at even more of a disadvantage than usual.
The dangers associated with COVID-19 compel equal access to information, some of it needed on an emergency basis. That many platforms lack closed captioning, an especially vital bridge to large segments of the public, creates a significant communication gap. According to the National Institutes of Health, nearly 15 percent of Americans (37.5 million people) are deaf or hard-of-hearing and need access to critical content during this time.
In the current long-term crisis, we're all relying on myriad channels for remote work, online medical appointments, critical news and updates, religious service broadcasts, or just stress-relieving entertainment. Data released in early April by business technology review site Trustradius showed that interest in telemedicine increased 616 percent since the pandemic began; interest in web conferencing is up 445 percent; and interest is video platforms increased 327 percent.
These channels should be prepared to support their entire audiences, including deaf and hard-of-hearing individuals. Yet the rapid expansion of content, especially due to smaller video conferences and one-on-one meetings, cannot all be served by human captioning services. There simply aren't enough human captioners to offset the amount of communication that is taking place.
Speech Recognition Technology Offers Solutions
That situation isn't going to change any time soon. Training for human captioners ranges from three months for re-speakers (professionals who repeat spoken content into voice recognition software) up to two years for stenographers. It is simply not possible to ramp-up an adequate supply of trained workers to respond to the current volume of need. Consequently, there is an urgent requirement for an alternative that can scale quickly.
Automatic speech recognition (ASR) offers a viable and immediate opportunity. Given the ongoing closing of the divide between human and ASR caption quality, ASR is now the only workable option to offset the shortage we're experiencing.
ASR uses advanced computing technology and machine learning models to process the human voice into computer text. While it's still an evolving science, continued advances in artificial intelligence algorithms and computer processing technologies now make it possible for ASR to be significantly more powerful and accurate. Because it works by being trained on a specific vocabulary and continually learning based on new inputs, the more an ASR system is n the field working, the more its capabilities improve.
Of course, both human captioning and ASR need to work hand in hand to enable the necessary accessibility. But ASR can readily take the lead in what relates to audio and video content that is less encumbered by issues like loud background noise, cross-talking, and poor audio properties that can impact transcription output.
It's important to note, however, that not all ASRs are created equal. There are many factors that can vary the efficacy of ASR solutions. For one, the scientists developing and continually improving ASR need to be attentive to the many nuances of both speech and its textual representation, including punctuation, capitalization, speaker changes, natural breaks in speaking, and latency.
Another critical factor is the data corpora (customized sets of text) that are used to train the ASR system. For specific domains like weather, sports, religions, technologies, dialects and the like, high-quality and extensive corpora will yield higher quality and more accurate ASR transcriptions.
A third and equally important factor is the method used and the frequency of testing and refreshing the ASR's performance against the present vocabulary. If we take that in the context of the current situation, we are all regularly using words that either didn't exist or were not often used in common discourse even a few months ago. For captioning purposes, the ASR needs to be able to deliver key content that's relevant to what's in conversations today. For example, does a given ASR system know COVID-19, coronavirus, or the name of the latest drug? Does it recognize public leaders' names like Cuomo or Fauci? Can it differentiate between masques and masks?
With so many attributes and variables in play, there are several key indicators integral to measuring an ASR's performance. Look for the timing and formatting of word display, the accuracy of the words transcribed, correct punctuation, and recognition that the speaker has changed.
Fortunately, ASR-based tools available right now can work alongside any communication platform, including popular services like Zoom or GoToMeeting, to produce captions immediately. Even if a full captioning capability isn't yet integrated into the platform itself, opening and using these ASR-based tools in a separate browser means that with minimal effort captions can be provided for all sorts of content that is currently neglected.
While any ASR is not going to be fool-proof, it is the best solution for scaling captioning to where human capacity just can't reach in a situation like the current one. At this unusual moment of fear, uncertainty, and doubt, it's imperative that everyone has equal access to the information that's critical to keeping us all safe. Powerful tools exist to start remediating the shortcomings today. It's time for companies providing the communication platforms on which we're extra-dependent right now to step up to the challenge and bring underserved audiences fully into the information flow and cultural dynamic.
Juan Mario Agudelo is a captioning strategist and vice president of sales, entertainment and media, at AppTek.
AppTek and Gallaudet University are working together to bring better transcription and captioning to web and videoconferencing applications.