Growing the Pool of Speech Developers

The explosion of mobile voice applications leaves our industry with a need for speech programmers. Universities need to go beyond the standard computer science courses to give students the skills to navigate the tools, understand the architectural and design options, and address the challenges of natural language processing (NLP) and dialogue. At Brandeis, we're doing this with an "experience is the best teacher" approach.

YoWakeUp!, FridgeBay, and Jeeves are just three of the mobile voice applications developed by students in Brandeis' intensive summer "experiential learning" program. Seventeen students, mostly undergraduates, mastered the material, formed teams, and embraced agile programming methodology to create minimal viable products (MVPs).

Brandeis students aren't new to building speech apps. WorkoutLog won the 2014 AVIOS Student Contest, and past awards have gone to others in the Computational Linguistics master's program. However, this summer was the first time we attempted to apply experiential learning in this field for undergraduates.

The first five weeks was "boot camp": classes four days a week and a lab on the fifth. The Application Development course included Javascript, AngularJS, Node JS, JSON, CSS, MongoDB, and PhoneGap. Spoken Dialog Design covered speech recognition, synthesis, dictionaries, grammars, statistical language models, NLP, discourse structure, user interface design, and error recovery.

In the second five weeks, students formed "start-ups," with weekly agile programming cycles and daily stand-ups. Three professors, myself included, helped keep students on the right path. One student summed up the effects of the program best when she said it taught her how to learn on her own.

The results showed the students' creativity and dedication. After 10 weeks, all five groups had working MVPs that included speech recognition:

Jeeves is a voice-powered personal assistant that lets users control the delivery of their morning information flow, from news and weather to their latest email.
RAMA is an interactive tour guide to the Brandeis Rose Art Museum that enables users to ask questions about individual artists and pieces.
FridgeBay provides a platform for students to sell their dorm refrigerators or buy textbooks while staying within the campus community for ease of transactions and safety.
B-improved aims to improve the campus by letting students file a problem report via voice and photos, which can be reviewed by Facilities.
YoWakeUp! marries social media with a user's alarm clock by texting or calling a designated person to wake him. And no faking—the user has to solve a math problem to turn it off.

To introduce students to the industry, we took field trips to nearby Nuance Communications and BBN Technologies. Students also heard from CEOs and CTOs of start-ups Talker, Viobi, and YouVisit. A high point was Siri founder and Brandeis alum Adam Cheyer's presentation on the development of Siri, from the SRI research lab to students' iPhones.

The course gave these students a sense of the scope of what they need to know to work in the mobile voice industry. From a pedagogical view, however, it was clear that getting speech working on a phone is easier than getting it to do what you want. As the industry transitions from a grammar-based VXML approach, where the "semantics" is embedded in the grammar, to statistical language models that just produce the words, there is a big gap that we blithely label NLP. While the students could run recognition in the cloud with AT&T's Mashup and on their phones with CMU's Pocketsphinx, then test the performance of those recognizers with NIST's ScLite, there were no easy packages for NLP. There is even less available for dialogue management. Making these underlying technologies accessible to undergraduates in 10 weeks was difficult and limited. Our challenge for summer 2015 is finding accessible paths to these capabilities.

The popularity of Siri and the explosion of mobile applications is both creating a need for people trained in the art and attracting students to the field who need hands-on training. The Brandeis summer program is one place where both needs are met. You might see some of these apps in the AVIOS student contest at this spring's MobileVoice conference. And when they graduate, these students might be coming to find you, with "built a really cool voice app" on their resumes.

Marie Meteer, Ph.D., is a consultant with expertise in speech recognition, natural language processing, unstructured data analysis, and search. She is an adjunct professor in Brandeis University's Computational Linguistics master's program and executive director of the Speech Technology Consortium.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Growing the Pool of Speech Developers

HTML5 Is Live

AppTek Sells Its Machine Translation Tech to eBay

Natural Language Processing Market To Grow Rapidly

IBM Releases Granite 3.3 8B Speech Recognition Model

SoundHound Releases Amelia 7.0

Nari Labs Launches Dia TTS Model

SoundHound AI Partners with Tencent to Bring Conversational AI to Auto Brands