Fujitsu Brings Emotion to Text-to-Speech

Imagine a voice assistant that soothes panicked listeners, or, conversely, a voice that rises in intensity when there is danger nearby.

Thanks to scientists at Fujitsu Laboratories, new speech synthesis technology can accurately convey a tone of voice to match a particular scenario instead of a robotic voice assistant that is flat in tone regardless of the circumstance.

The scientists employed parameters and an algorithm with machine learning to pluck out characteristics of the voice, but would not give specifics on how this was done.

"The conventional synthesis methods used thus far involved connecting large volumes of prerecorded speech waveforms," the company said in a statement. "To make the synthesis process more flexible, Fujitsu Laboratories used a method to synthesize speech in which multiple characteristics, such as voice quality, intonation, and pauses, are skillfully captured and converted into parameters."

Since this technology is about speech synthesis, automatic speech recognition (ASR) is not used. "However, it is possible to combine this speech synthesis technology with ASR to make a speech dialogue system as an application," stated Kazuhiro Watanabe, research manager at Fujitsu's speech and language technologies lab, in an email.

The Kawaski, Japan–based company said the technology needs just a small amount of recorded speech to be synthesized, and can do so in roughly one-thirtieth of the time previously needed.

Watanabe says scientists can match emotions with situations by "predetermining the vocal level beforehand in sync with the various scenarios.

"[The technology] can vocalize the message in accordance with the given situation" Watanabe says."In addition, by linking the data from sound sensors with predetermined vocal tone levels, it will be able to vocalize a message with an optimum level of volume [for] the particular noise environment."

Some potential applications for this technology are voice-based work support solutions in factories and other work environments, natural disaster–related broadcast solutions, car navigation systems, text-to-speech services for online content, text-to-speech email solutions, and automated-messaging services.

While the technology is not commercially available, Fujitsu spokesperson Rishad Marquardt said the company will continue to improve speech synthesis with the aim of bringing it into application later this year.

In another announcement, Fujitsu said it has teamed up with ImageWare Systems to offer multimodal biometric cloud identity management and authentication services.

"We are seeing a growing need within the mobility, retail, healthcare, financial services, and banking industries to securely identify and authenticate mission-critical operations, data, and transaction information, while also addressing IT demands for utilizing easily deployed, scalable, cloud-based solutions," said David Berry, executive vice president of infrastructure services at Fujitsu America, in a statement.

Fujitsu's cloud-based offering uses ImageWare Systems' GoCloudID, a biometric identity management platform that provides access to biometric enrollment and verification.

The system is hardware- and algorithm-agnostic. It allows plug-and-play connectivity and can be deployed globally as an end-to-end or modular solution in just a few hours. GoCloudID also features a software development kit for integration into existing company programs and applications. In addition, GoCloudID uses ImageWare's GoMobile Interactive, a biometric-enabled mobile platform that offers multifactor authentication.

"With more and more mobile devices being designed with biometric features and worldwide mobile device payments expected to reach $400 billion by 2015, it is critical we work with leading technology companies like Fujitsu to effectively and holistically address legacy identity and access management security challenges facing businesses and consumers in a big-data, smart mobile world," said Jim Miller, CEO of ImageWare Systems, in a statement.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Fujitsu Brings Emotion to Text-to-Speech

Are We Underestimating the Role of Emotion?

Beyond Verbal Debuts Moodies Emotions Analytics App

Getting to the Bottom of IVR Abandonment Rates

Beyond Verbal Technology Decodes Human Emotions Through Raw Voice

Voice Deepfake Fraud Surged 1,300 Percent

Sanas Unveils Simultaneous Real-Time Speech-to-Speech Translation

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API