2022 Speech Industry Award Winner: Speechmatics Makes Speech Recognition Truly Automatic
Speechmatics, based in Cambridge, England, develops speech recognition software based on recurrent neural networks and statistical language modeling. The company, which was founded in 2006, recently collected $62 million in funding to advance its global expansion and infrastructure improvement plans.
Ed Stacey, managing partner at IQ Capital, one of the investors in this latest funding round, called Speechmatics “a world-leading company that is revolutionizing how speech technology is being used, continuously driving [artificial intelligence] and machine learning breakthroughs while rolling out a product that simply works in every industry use case in which it’s been tried.”
Fellow investor Robert Whitby-Smith, a partner at AlbionVC, called the company a “category leader in applying deep learning to speech, with category-defining accuracy and understanding across industry use cases and requirements.”
Even before the additional funding flowed into its coffers, the company had been doing cutting-edge work to improve speech recognition, and in April it solved one of machine learning’s biggest challenges, adding formatting of numbers, dates, addresses, and more to the Autonomous Speech Recognition (ASR) engine it released last October.
Using inverse text normalization, Speechmatics’ ASR software can interpret how numbers, currencies, percentages, addresses, dates, and times should appear in written form. In so doing, Speechmatics essentially made such entities easier for speech-to-text systems to read, requiring less human editing.
In the same way, Speechmatics is working on language packs for specific industries so its speech-to-text engine can recognize terminology, including abbreviations, acronyms, and jargon, unique to those industries. The first domain-specific language pack was released in July for financial services. Speechmatics is reportedly working on similar products for other industries, like medicine and law.
The company is also working on other languages and dialects and recently added support for French Canadian and Brazilian Portuguese in its global speech recognition language packs.
And on Aug. 31, the company launched Language Identification, allowing users to automatically determine the predominant language in any media file. Language ID removes the manual step of selecting which language pack should be used when the language is not explicitly stated on the file and adds useful metadata about the language. It works with up to 12 languages (English, German, Spanish, French, Hindi, Italian, Japanese, Korean, Mandarin Chinese, Dutch, Portuguese, and Russian) and adds a confidence score to show the certainty of the predominant language.
Further improvements to Speechmatics’ technology have come by way of partnerships. The first partnership involved Personal.ai and resulted in technology that allows users to capture conversations, spoken notes, reminders, meetings, and more, regardless of accent or dialect. Personal.ai captures and transcribes the voice of its creators through Sync Speech. Wrapped in a messaging interface, this personal AI, secured by blockchain technology, can recall any piece of spoken information. Accuracy will be heightened with Speechmatics’ ASR.
The second partnership involved ENCO, a provider of automated broadcast and audio-visual production workflow solutions, incorporating its ASR technology into ENCO’s enCaption5 automated captioning and transcription solution.
“Speechmatics’ core technology is without a doubt the most accurate of the AI speech-to-text engines available,” said Ken Frommert, president of ENCO, in a statement.