Speech Tech University

Behind every voice user interface, every grammar, every text-to-speech engine, and every voice biometrics system are highly skilled, carefully trained, and well-educated human beings who make their livings creating them.

But who are these men and women—these engineers, programmers, designers, developers, and linguists? And more importantly, how did their career paths lead them to the world of speech technology? Forging a career in the speech industry—while often a circuitous and accidental process—begins where one might expect: in the classrooms and halls of some of the best universities and colleges around the world.

And while no defined course of study has been created for individuals looking to work in the speech industry—no one majors in speech technology or attends speech school—some very good programs and professors are out there with a surprisingly wide variety of academic disciplines that can ultimately deliver students a career in speech.

According to Jim Larson, co-chair of the World Wide Web Consortium’s Voice Browser Working Group and an adjunct professor at Portland State University in Oregon, developers of speech applications should be well-versed in three key fields of study: computer science, linguistics, and psychology.

“I think those three things are essential,” he says. “A speech application developer, especially the voice user interface developer for speech applications, needs to understand how people think, how they go about doing things, how they speak, and how they respond.”

However, Larson is quick to point out that someone pursuing a career in speech shouldn’t limit himself only to those three fields. He suggests students take a variety of courses—including classes in communications theory and even drama.

“If you take a drama class, you learn how to deliver and present verbal text in a way that’s exciting and interesting, gets people’s attention, and is meaningful,” Larson says. “A lot of information that we convey when we speak to each other is in how we say what we say. And so I think some background in drama and communications theory is useful for people who develop voice applications.”

Roberto Pieraccini, chief technology officer at SpeechCycle, has a background in electrical engineering. He says most people working in the industry today can be divided into two camps:

VUI designers—people with what he calls a “user-centric design background” and training in general design, user interface design, and cognitive psychology; or
speech scientists—people with advanced degrees in electrical engineering or computer science, and significant experience using algorithms, statistical and probabilistic theory, and machine learning.

Pieraccini, who says he hires people primarily from the second group, notes there is not always a lot of “speech science in speech science.”

“The fact that [these people] work on speech, I believe, is quite inessential. Today’s technology does not require [one] to understand a lot about speech to work on speech because it’s mostly data-driven,” he says. “It is more like machine learning, data processing, algorithms, and computer science.”

Within the guts of a speech recognizer today resides little or no linguistic knowledge, he adds, noting that instead one will find a lot of statistical, machine-learning, and pattern-recognition knowledge.

Juan Gilbert, an associate professor of computer science and software engineering at Auburn University in Alabama, also sees a variety of academic disciplines as pertinent to speech, noting that, unlike psychology, for example, the path to a speech career is not always clear.

“We don’t actually have a major,” says Gilbert, who directs the Human Centered Computing Lab at Auburn and will soon be starting in a new position at Clemson University in South Carolina.

This lack of a defined path is something to which Antonio Rico-Sulayes—a doctoral candidate in the linguistics department at Washington’s Georgetown University—can attest.

Rico-Sulayes, who won this year’s AVIOS Student Voice Application Contest for a clinical appointment management application he created, found speech late in his academic career. Prior to attending Georgetown, he earned undergraduate degrees in linguistics and literature and graduate degrees in applied linguistics and lexicography.

Since arriving at Georgetown, Rico-Sulayes—who wants to ultimately teach and provide students with the kind of hybrid background they will need for careers in speech—has taken courses in programming, natural language processing, and spoken language interfaces. Studying spoken language interfaces really galvanized his professional interest in speech technologies.

“As a linguist it’s kind of complicated because if you are lacking in the computational side, the beginning is really rough,” he says, noting that he had to devote a year to studying computer science. “It’s a longer path. The problem, I guess, is that no one’s complete. There is no one field or one major that prepares you. It’s a very hybrid area.”

Perhaps due to the hybrid nature of the field, Gilbert views speech education and training on a spectrum, identifying:

people working with hardware or algorithms as having backgrounds in electrical engineering, computer engineering, and signal processing;
people writing software applications with voice as having backgrounds in computer science and information technology; and
people designing voice user interfaces (VUIs) as having the most difficult educational paths.

It’s Not Linguistics

“There is no designation or program for [VUI design],” Gilbert says, adding that most universities offer little in that area. He also maintains that despite what seems to be a common misconception, the field has little to do with linguistics.

“It doesn’t really apply,” Gilbert says. “Linguistics has nothing to do with voice user interface design.”

This is a statement with which Blade Kotelly, a visiting lecturer at MIT and author of The Art and Business of Speech Recognition: Creating the Noble Voice, agrees. Kotelly—who says he takes a different approach than most in the industry—thinks many students come to speech by accident, starting in computer science or linguistics. And this, he says, creates problems for many VUI designers.

“The linguists don’t know anything about design,” he says. “They know about the fact that people have aspirated Ts or nonaspirated Ts. They don’t know anything about designing a system. It’s the worst way of getting in because you’re at an immediate disadvantage.”

A background in human factors—some sort of design where you create things that people have to use—is the most desirable, he adds. “And to be a good designer you have to understand that design is design no matter what it is. The same principles of good design apply to speech as they do to cars as they do to blu-ray players as they do to sliding glass doors.”

Kotelly, who has a background in human factors and music, argues that VUI designers don’t have to limit themselves academically.

“Good speech designers are always musical,” Kotelly says. “With that kind of background, you realize [that] if you ask a question and people have to respond, that means you have had to hold their attention the whole time. We’ve had to hold their attention by the techniques we [use] in real life, which is varying the way we speak, the pauses and tones, so it sounds like it’s live and not like it’s recorded.”

Rather than imposing a rigid course of study on students, Kotelly—who notes that one of his best designers initially studied biology—identifies four abilities that are key to VUI design: One must be communicative, be analytical, have a business bent, and be able to break down a problem and challenge underlying suppositions.

“That’s where some computer science experience is useful,” he says. “But that could be done from a logic background or from a good philosophical discussion background. That could be someone who was in a seminary studying religion and tried to figure out all the aspects of an argument. It doesn’t have to be computer science, but computer science kind of forces you to do that.”

A Philosophical Debate

Kotelly says VUI designers must think about things differently and ask questions—something he doesn’t see happening today in the speech industry.

“Most people who are doing this kind of work were never trained to challenge the basic suppositions, and you don’t get trained as a computer scientist to challenge basic suppositions,” he says.

Unfortunately, Kotelly thinks VUI designers are going to get worse, and an over-reliance on technology isn’t going to help, either. But, despite this trend, he does point to some designers who are doing good work.

One such person is Joe Cerra, a user experience architect at Vlingo who received an undergraduate degree in computer engineering from Tufts University in Massachusetts and a master’s degree in computer science with a focus on human computer interaction. Cerra says he has more of an engineering background than the typical VUI designer—which has thus far worked to his advantage.

“I think that’s a really great marriage, because it basically lets you understand technology but use your design background to make it useable by everybody,” says Cerra, who agrees with Kotelly about the importance of a diverse course of study.

Cerra’s studies at Tufts, which included a few classes taught by Kotelly, focused not only on VUI design, but also on design marketing and branding.

“I think Blade’s class was really just an inspiring class,” he says. “It was really just a great learning experience, and that’s where I kind of formulated my beliefs that great designers don’t necessarily have to be purists, and they could have a technology background.”

With so many options and no strictly defined curriculum or course of study—particularly for VUI design—young people entering college or graduate school today might have a difficult time deciding on what or where to study as they pursue a career in speech.

Where to Go

Gilbert says there are only a handful of schools with real VUI design programs. Among them are Auburn, Clemson, Portland State, Tufts, MIT, the University of Texas at Dallas, and the Georgia Institute of Technology.

Larson, who agrees about the lack of true VUI programs, adds the University of Ulster in Northern Ireland to the list.

“The topic is not taught very much in schools,” he says, noting that for the engineering and science side of speech a lot more options exist in terms of schools and programs.

Among the best schools for speech, according to Pieraccini, are New York’s Columbia University and New York University, Rutgers University in New Jersey, Johns Hopkins University in Baltimore, MIT, Carnegie Mellon University in Pittsburgh, and the Oregon Graduate Institute of Science and Technology. He also cites many good programs in Europe and Asia, including a trio of universities in Italy (Tornio, Rome, and Trento), as well as England’s University of Cambridge and many schools in France, Japan, and Hong Kong.

And while it might be difficult for prospective students to select the right college or university and the most useful course of study, there is no shortage of speech-related hot topics and challenges being addressed in classrooms around the world.

For scientists and engineers, Larson cites natural language understanding and the depiction of emotions in speech applications as important topics—now and in the future. For developers, he thinks more emphasis needs to be placed on designing user interfaces.

“Too many of our graduates are great computer scientists but know zilch about how user interfaces work or how they should work,” Larson says.

In looking to the future, Pieraccini anticipates an increased emphasis on Web programming, and he also stresses that people pay attention to infrastructure and implementation.

“Today most of the things that we build run on the Web—Web services, Web programming, cloud computing,” he says. “Building a speech application is about voice interface, about speech science, but it’s also about implementation. And the speech applications are Web applications today.”

Gilbert also sees the future of speech education changing as everyone looks for the next great killer application.

“Everyone’s in search of the killer app,” he says. “We have not seen a killer app for voice user interface yet.”

According to Gilbert, many people thought the IVRs and VUIs that were created to replace touch-tone would be that killer application. But because those early interfaces were often badly designed, Gilbert says they failed to gain significant traction with the public.

“The killer app of the past never came to be,” he says. “People are still searching for the next killer app that will give voice user interface visibility.”

When offering advice to today’s students, Gilbert recommends they diversify their backgrounds and skill sets as much as possible. “The job market is extremely tight, but at the same time companies are trying to save a buck, so by implementing VUIs you can save tremendously,” he says. “Learn design, implementation, and evaluation as much as you can. And then be able to market yourself to do all that stuff in that domain.”

Kotelly offers students a different perspective, stressing that an understanding of core design principles will allow them to then move toward a focus on speech.

“To get into the speech industry, make sure you are armed to get out of the speech industry,” Kotelly says. “That is to say, you should understand design at its core. Because it is a small industry…make sure you’re armed to design anything.”

Kotelly urges students to diversify their backgrounds and pay attention to the world around them and how speech and sound function within that world.

“And for speech, take a music course, understand about sound,” he says. “Play around with the tools people use in recording studios so you know how sound is manipulated. Learn about how people listen to other people, how meaning is communicated. Listen to political speeches, particularly from people you don’t like, and understand how the argument was conveyed, why the audience responded to it, and when the audience responded to it, and did [the speaker] do a good job or not.”

Right for the Job

And while it’s important for students to learn the basics, diversify their backgrounds, and ask questions, Kotelly suggests that employers have a responsibility to hire the right people.

“[Employers] have to look for people who are creative and not think that the words ‘creative’ and ‘design’ mean ‘flighty’ and ‘touchy-feely,’” he says. “If they feel that the words creative and design belong in the same category as flighty and touchy-feely, they shouldn’t be hiring a speech designer. It’s capital-D design, not pretty-pictures design. It’s about looking for someone who understands the business, the technology, and people well enough.”

Larson also takes issue with the hiring process at many companies, noting that too much emphasis is often placed on a knowledge of programming languages and computer systems.

“That’s not the real useful thing because those skills will become obsolete in three or four years to be replaced by new systems, new programming languages, and new techniques,” Larson says. “A basic understanding and the ability to bootstrap into new areas as they become available—that’s what I think employers should be looking for.”

But despite the shortsightedness of employers, the many challenges facing students, and the failings of so many designers, Kotelly thinks it all comes back to education and training.

“I believe you can teach people to become better designers,” he says. “I do believe it’s possible. It is such a hard thing to find people who are in the space of understanding technology and humanity, which is why we have so few great designs in life.”

Speech Tech University

Eltropy Expands Voice Authentication Ecosystem with Illuma, IDgo, and Pindrop

Modulate Expands Velma with Voice-Native Real-Time Conversation Intelligence

Corti Launches Symphony for Speech-to-Text

Why Voice AI’s Next Big Challenge Isn’t Accuracy. It’s Relationship Design.