With ToyTalk, Speech Is Child's Play
What do a chief technical officer and an electrical engineer know about kids' likes and dislikes? If they're Oren Jacob and Martin Reddy, plenty. Veterans of Disney's Pixar Animation Studios, the duo are the founders of ToyTalk, which develops technology and professional tools that run kids' apps.
Jacob worked at Pixar for more than 20 years as the chief technology officer and director of the studio tools group, which worked on films such as Toy Story, A Bug's Life, and Finding Nemo. Reddy was responsible for Pixar's film-making software for movies such as Finding Nemo, The Incredibles, Cars, and Ratatouille.
The two left Pixar in 2011 to start ToyTalk, which shortly thereafter released two interactive, animated conversational character products: an iPad app dubbed The Winston Show and SpeakaZoo.
ToyTalk's most recent release, SpeakaLegend, has created quite a buzz. Launched in September, it is what the company says is the first talk-and-touch speech recognition app. SpeakaLegend has an average engagement time of 35 to 40 minutes a week.
"SpeakaLegend is about the combination of conversation and touch," Jacob says. "We previously released products that were about conversation only. We wanted to create a play experience where children can both touch and talk to characters."
The company does not have its own automatic speech recognition (ASR) solution, instead using the technology from partners. "Everything else, such as the runtime infrastructure in the cloud, natural language processing, the artificial intelligence engine that decides what to say back, and the content authoring tools themselves, is from us."
Jacob says creating the technology was not without its challenges. One issue is interruption. For example, an 8-year-old playing with SpeakaLegend might be talking to a character when someone grabs his hand. That could affect the conversation. "How do you make that [interruption] feel natural and entertaining at the same time and in the right way?" Jacob says.
"The ASR is responsible for giving us the text translation of what was said vocally. But that has its own challenges," Jacob says, since children speak at different pitches, cadences, and frequencies. In addition, there is a large recognition issue around context.
"You're not in Google Maps talking to geography. You're not sending text messages. You're not looking for a restaurant recommendation from Yelp," Jacob says. "You're talking to dragons, unicorns, and fairies. The topics that we bring out are broad by design."
Another obstacle is the fact that the spoken language of children is fundamentally different than that of adults. Synonyms, homophones, antonyms, idioms, grammatical constructions, and verb tenses are different in children's dialogue, which adds complexity when designing a natural language interface for children's apps.
"When we started the company, no one offered a children's recognition solution, probably because they didn't have enough data to start building speech language models for kids," Jacob explains. "We came to market a year ago and we have pulled together a large—if not the largest—collection of children's audio recordings. That data allows us to begin to build custom-tuned recognition for children now for the first time."
The technology is also important since young children often can't read or write, making typing and a text-based back-up impossible. "This is a large advance for all party-goers in speech," Jacob says. The technology also has significant educational and search implications, he adds.
Jacob calls ToyTalk's primary technology platform, PullString, "the first authoring environment for creating and crafting two-way conversations with the audience." He sees PullString having uses on the enterprise side, where it could be integrated into third-party products and services to enable collaboration using funny and entertaining characters.
"Many companies that people know and love have reached out to us so that they can use PullString to build their own conversational characters for their own audiences," he says.
Sprite Kids' Speaktacular Sight Words features Nuance's voice recognition technology.