Speech Technology Magazine

 

Sensory Adds Deep Learning to TrulyHandsfree Platform

New speech extraction techniques in TrulyHandsfree 4.0 allow spoken commands to cut through real-world noise.
By Leonard Klie - Posted Aug 6, 2015
Page1 of 1
Bookmark and Share

Sensory today released TrulyHandsfree 4.0, the latest version of its embedded small-footprint voice user interface platform for mobile devices and consumer electronics. TrulyHandsfree 4.0 features deep learning that, according to Sensory's internal testing, offers improved performance and a 60 percent to 80 percent decrease in word error rates. 

"This is an accuracy-focused release," says Todd Mozer, CEO of Sensory. "We've made a lot of accuracy improvements."

TrulyHandsfree 4.0 offers new phrase-spotting techniques and a neural networking engine that supports deep learning acoustic models, dramatically improving speech recognition accuracy in real-world noise. Sensory uses a unique form of neural network with deep learning to achieve acoustic models that are much smaller than other offerings.

"We had resisted using deep learning in the past because it would have made our solutions bigger," but new advancements in the technology have removed that concern, Mozer says.

TrulyHandsfree 4.0 was redesigned to improve the platform's overall performance and accuracy without effecting its ultra-low-power consumption.

The latest neural networks also employ the most recent breakthroughs in speech feature extraction—including advancements in the filterbank and mel-frequency cepstral coefficients—to produce superior accuracy in noisy environments, ensuring that the core part of the user's spoken request can be recognized in the middle of speech or when surrounded by ambient noise.

To achieve this, Sensory's engineers did a lot of analysis of failure rates related to reverberation, echoes, and background noise, according to Mozer. "We figured out how to overcome them without harming the speech recognition in normal environments," he says.

TrulyHandsfree 4.0's performance enhancements include:

  • the addition of smaller deep learning acoustic models;
  • new algorithms that overcome reverb and harsh acoustic environments;
  • advanced filterbank features that improve front-end speech feature extraction;
  • compatibility with TrulyNatural that enables seamless hand-off from TrulyHandsfree to TrulyNatural processors; and
  • enhanced architectural scalability, which allows for low-power digital signal processor (DSP) implementations with secondary accuracy improvements at the operating system level. 

This fourth iteration of TrulyHandsfree—the product made its debut in 2011 —represents a substantial improvement, Mozer says. "It's a really big deal for us."

"TrulyHandsfree 4.0 takes performance to a whole new level with an accuracy, footprint, and power consumption that others just can't touch," he says.

More than a billion products using Sensory's TrulyHandsfree have shipped in the past several years from manufacturers such as BlueAnt, Hallmark, Huawei, LG, Mattel, Motorola, Plantronics, Pantech, and Samsung.

TrulyHandsfree 4.0 supports U.S. and U.K. English, French, German, Italian, Japanese, Korean, Mandarin Chinese, Portuguese, Russian, and Spanish. The TrulyHandsfree software development kit is available for the Android, iOS, Linux, QNX, and Windows operating systems. Additionally, TrulyHandsfree is available for DSP and microcontroller unit IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys, and Verisilicon, as well as for integrated circuits from Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, Quicklogic, Realtek, STMicroelectronics, TI, and Yamaha.

And even with version 4.0 just out, Mozer is already excited about what Sensory has on tap for version 5.0. "We have some really neat stuff planned around speaker verification," he says.

Speaker verification is at the heart of Sensory’s TrulySecure solution

Page1 of 1