Facebook Continues Voice Experimentation

Facebook yesterday gave a sneak peek at a new research initiative called Visual Q&A, a voice-enabled system that incorporates artificial intelligence to enable people to ask spoken questions and receive audible answers about photos on the social media site.

Yann LeCun, director of Facebook's artificial intelligence research group, demonstrated the app at MIT Technology Review's EmTech conference in Cambridge, Mass.

Visual Q&A combines Facebook's Memory Networks (MemNets) technology, a type of deep learning that combines natural language understanding and short-term memory, with image recognition

The app, Facebook executives say, will be particularly useful for the blind and visually impaired. "We're able to give people the ability to ask questions about what's in a photo. Think of what this might mean to the hundreds of millions of people around the world who are visually impaired in some way. Instead of being left out of the experience when friends share photos, they'll be able to participate," Facebook chief technology officer Mike Schroepfer wrote in a blog post.

Facebook isn't releasing the Visual Q&A app to the public just yet. "This is still very early in its development, but the promise of this technology is clear," Schroepfer wrote.

LeCun's research group is mostly focused on using deep learning to help machines better understand speech and recognize objects in images. LeCun believes that it will soon allow computers to grasp the many nuances of language carry on basic conversations with humans.

Facebook is also running a small test of a mobile artificial intelligence assistant called M that can purchase items, arrange for gifts to be delivered, and book restaurant reservations, travel arrangements, appointments, and more.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Facebook Continues Voice Experimentation

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Conversational AI to Reach $41.39 Billion by 2030

Deepgram Launches Voice Agent API