Sensory Releases TrulyNatural Deep Neural Net Speech Recognition
Sensory today released TrulyNatural, a natural language processor and large-vocabulary speech recognition platform that also incorporates deep neural networking for greater speech-recognition accuracy and flexibility.
"Over the years, Sensory has pioneered many speech technology trends, most recently with our always-on, always-listening speech solution TrulyHandsfree, a technology that many in the industry thought was impossible to achieve because of the difficult requirements of both high accuracy and low power consumption. We are confident that we have another groundbreaker on our hands with TrulyNatural," said Todd Mozer, CEO of Sensory, during a press conference.
Mozer called TrulyNatural "the state of the art in deep neural network recognition." It's the result of more than 50 man-years of technology development, he added.
In designing the technology, the real challenge was to make it robust enough while keeping it small enough to run on chips inside the most basic consumer electronics, Mozer said. "TrulyNatural is very flexible, both in size and features," he added. It can be used in smaller devices requiring vocabularies of a few hundreds words, in less than one megabyte of memory, to smartphones, cars, and robots that require more natural language interfaces capable of recognizing a million words and phrases.
Sensory's neural networks also employ the most recent breakthroughs in speech feature extraction to produce superior accuracy in real-world, noisy environments. Combining this with a finite state transducer (FST) enables accurate processing of multiple large search domains, according to Mozer.
TrulyNatural, he says, offers an error rate of less than 8 percent.
And because all of the speech recognition and processing takes place in devices on which TrulyNatural has either been installed or embedded, requests are processed faster, with less power, more accuracy, and at a lower cost, according to Mozer. And they can be processed with or without an Internet connection.
TrulyNatural currently supports U.S. and U.K. English, Mandarin Chinese, Korean, Japanese, French, Spanish, and German; Italian, Portuguese, and Russian will follow later this year. The application also supports Android, iOS, Windows, Linux, and other leading platforms.
Sensory's technology has shipped in more than 1 billion consumer electronics devices, according to Mozer. Of those, mobile and wearable devices are a big market for the company, as are the home security and automotive industries.
Sensory also announced today that it has signed Jibo, creators of the world's first social robot for the home, as its first licensee of TrulyNatural. Jibo robots are planned for release in 2016
Jibo will also use Sensory's TrulyHandsFree technology for voice triggering and speaker identification and verification.
"Sensory has a proven reputation for quality speech recognition," said Roberto Pieraccini, head of advanced conversational technologies at Jibo.
"Working with Jibo as our premier customer for TrulyNatural is particularly exciting because they have some of the world's leading speech technologists on their team. We know TrulyNatural will be in the hands of experts who can fully utilize the many advantages of our breakthrough approach," Mozer said in a statement.
Jibo will also use some speech technologies developed in house. Its text-to-speech capabilities, for example, are proprietary because the company wanted to equip the robot with its own unique voice and personality, according to Pieraccini.
The partnership with Sensory, though, "brings together two great teams working together for a first-of-its-kind product," he added.
New speech extraction techniques in TrulyHandsfree 4.0 allow spoken commands to cut through real-world noise.
Quick, accurate hands-free calls can now be made from an Android smartphone.
Synopsys and Sensory deliver am ultra-low power voice control solution for mobile, automotive, and consumer applications.
Intel is just one of the firms to implement the voice trigger and speaker verification technology.