2023 Speech Industry Award Winner: Speechmatics Inches Closer Toward a Universal Translator
Speechmatics, a provider of automatic speech recognition software based on recurrent neural networks and statistical language modeling, is on a mission to make its speech-to-text technology usable by 70 percent of the world’s population in the next three years.
The company, which is based in Cambridge, England, added 14 languages to its speech-to-text engine, bringing the total number of languages supported from 34 to 48. It says its technology now supports more than half of the world’s population.
The 14 new languages were Bashkir, Basque, Belarusian, Esperanto, Estonian, Galician, Interlingua, Marathi, Mongolian, Tamil, Thai, Uyghur, Vietnamese, and Welsh.
“At Speechmatics, we are continually developing our speech recognition technology to ensure it can be used by as many people as possible, as easily as possible, and it understands everyone as accurately as possible,” said Ricardo Herreros-Symons, a Speechmatics cofounder and vice president of sales, in a statement.
The additional languages mark the largest expansion in Speechmatics’ history, which dates back to its founding in 2006 as Cantab Research.
“While we’re responding to customer demand by adding highly requested languages to our speech-to-text API, we’re also identifying other popular languages with fewer speakers that have yet to be included. Our aim has always been to understand every voice, and so it’s vital that we also capture languages that may not be as well recognized,” said John Hughes, accuracy team lead at Speechmatics, in a statement at the time.
To that end, the company has developed a special pipeline for releasing new languages rapidly, according to Hughes.
A big part of that new pipeline comes from a partnership that Speechmatics forged this year with NVIDIA to supercharge its machine learning capabilities and to train larger artificial intelligence models. Its speech-to-text API is built using self-supervised machine learning that requires huge datasets for training and large amounts of computation power.
The company broke ground this year by launching real-time speech translation and transcription together in a single application.
Speechmatics developed real-time translation that offers 69 language pairs to and from English, including German, Spanish, and Vietnamese. The all-in-one API can also translate multiple languages in one request. For example, a single audio stream can provide real-time English transcription and translation from English to Japanese, French, Hindi, Mandarin, and Korean simultaneously.
Speechmatics’ real-time translation provides a sliding scale to enable customers to tailor the speed and/or accuracy to meet their needs. Because real-time transcription and translation are combined in one API, processes and workflows are streamlined and sped up.
Generating translated speech in one API lets users caption live stream content and news for viewers from around the world. Similarly, contact centers can scale operations to handle multiple languages using automation and offer improved customer experiences in native languages.
“This is a landmark development for speech recognition technology, and we are proud to remain at the forefront of innovation, demonstrating the commitment to our mission to understand every voice,” said Damir Derd, head of sales engineering at Speechmatics, in a statement. “This new offering opens up a truly global market for our customers with almost instant translation from the spoken word. As demand from viewers in different regions increases for TV shows and broadcast, sports, events, podcasts, game streaming, YouTube, and social media videos, the need for captioned videos in multiple languages has too.”
Ken Frommert, president of ENCO, a provider of software for radio and TV broadcasters, says Speechmatics “provides the most accurate speech-to-text on the market for prerecorded files and live streams. Adding real-time translation to its all-in-one API is game changing for live broadcast captions.”
SyncWords, a provider of live captioning for virtual and hybrid events, this year began offering AI-powered live captions and subtitles in 48 languages, backed by Speechmatics’ automatic speech recognition. The technology will be available on more than 40 virtual event platforms for virtual, hybrid, and in-person town halls, classrooms, webinars, commencements, Zoom sessions, Webex meetings, social media streams, podcasts, keynotes, product launches, and others.
SyncWords, in a statement, called Speechmatics’ live ASR capabilities “best-in-breed.”