Facial Analysis Unites With Speech Technology to Gauge Human Emotion

A unique and interesting combination of video streaming, facial analysis, and speech technology is producing results for which corporations are guaranteed to clamor. Upstart companies like Satya Media Group, based in Florida, are harnessing the power of these three technologies to produce an application no one knew they needed: a streaming media combo, soon to be available as an API for other developers, that analyzes how people react—in major languages—to the quality of customer service, questions on marketing surveys, content of corporate training, conference speeches, and other events that use camera and speech.

Facial analysis software can determine with surprising accuracy the emotions expressed by a human face—or multiple faces—appearing on camera. Facial analysis software from Affectiva, integrated into Satya’s media streaming, measures participants’ happiness, confusion, anger, engagement, and disengagement, among other states.

Satya builds WebRTC videoconference streams that include facial analysis, voice translation, and automatic text storage. When people join a conference stream by turning on their camera, their faces will be “mapped” for analysis by Affectiva. The software analyzes each opt-in attendee’s reaction on a frame-by-frame basis, or 30 times per second, then saves the reactions to a database. As words are spoken, the words are transcribed and saved in a database in groups of 5 to 20 words, rather like subtitles, along with a timestamp. A translation can also be saved.

For businesses, the value is clear: Reactions to a message can be tracked. As customer service, market research, and sales move into a face-to-face video environment, the reactions of targets will be analyzed in a consolidated manner. With corporate training, instructors will know if recipients understand the message or are confused, are listening or typing emails.

While the subjects’ faces are being analyzed, the words that elicit their facial reactions are recorded and saved in a database through speech-to-subtitles technology. After an event, managers can export reports that clarify the exact words that generated each shift in emotional reaction. At a click, the manager can view video of the event itself, along with a translation if the event was in another language.

This globalization of multilingual reporting—the ability to receive study results in 25 languages—is made possible because speech-to-text has reached such a level of quality that words spoken by participants in all major languages can be basically understood and transcribed. The software logs the results into a database, and these subtitles can be coordinated with the facial analysis timestamps.

“With this enhanced media streaming, companies can host a live event within their own Web site, communicate with buyers, reach a worldwide network of offices and suppliers and merchandisers, and communicate globally with their customer base in multiple languages,” says Mary Lee Weir, CEO of Satya Media Group. “Facial analytics will add new dimensions when they conduct market research, perform customer service, [do] job interviews, and train employees.”

Gabi Zijderveld, Affectiva’s vice president of marketing and product strategy, notes that its facial recognition software has accumulated a rich trove of data. “We’ve analyzed more than 3.8 million faces from 75 countries, analyzed frame by frame for various emotion metrics, and now have more than 40 billion emotion datapoints. A subset of that data is being used to train our algorithms or our classifiers and is being made available to developers as SDKs and APIs.

“This is the next generation of AI that is ‘emotionally aware’: artificial intelligence technology having the ability to sense human emotion,” she continues. “We see human biometrics as being all the things that we express as humans. Not just the cognitive and the spoken word in terms of what we believe we say, but also the subconscious emotional aspects, the understanding of which adds value to the mix. Combining the two—emotions and spoken words—is very powerful.”

Sue Reager is CEO of @International Services, a company specializing in translation and localization for technology and marketing, and president of Translate Your World, developers of software for across-language speech communication. She can be reached at sreager@internationalservices.com.