Articles by Judith Markowitz
Hands-On: An Interactive Display
Lab sessions gave companies the opportunity to showcase their latest products.
Sharpening a Cloudy Vision
Clarifying definitions depends on users' perspectives.
Getting to an SIV Module for VoiceXML
The draft standard is out there; now let's provide the feedback.
Sakhr's Arabic Language Buddy
This is an iPhone application that gets it right.
Crossing Species Boundaries
Identifying speech patterns in dairy cows challenges researchers.
You Be the Judge
New sessions at SpeechTEK will offer hands-on training.
Speech for Distance Learning
Part II: Speech recognition develops a foreign tongue
Speech for Distance Learning
Part I: Speaker verification ensures the identities of students.
Make Your Life Easier
Do your part to develop, certify, and support industry standards.
Lines of Communication
Effective communication with end users can stop problems before they arise.
Speech Solutions in China, Part II: Network Solutions
China is fertile ground for speech, but it needs proper cultivation.
Embedded Speech in China
The Chinese market for embedded speech is no longer wishful thinking
Speech Recognition for the Warehouse Comes of Age
SR is seen as a critical tool for enhancing worker safety and productivity and improving the bottom line.
Speaker Authentication: Exploding Some of the Myths
Speaker authentication is still so new that many misunderstandings and myths abound.
Biological and language factors should be considered to improve the performance of speech processing systems.
Passwords and PINs are not enough as a single layer of defense
Hold the Pickle II
Array microphones provide cleaner audio signals
Hold the Pickle
Over the years, I'd heard about planned deployments of automatic speech recognition (ASR) and/or text-to-speech (TTS) in drive-through facilities of fast-food restaurants.
Morphing for Online Games
"Online communities that form around these imaginative activities are some of the most vibrant on the Web. For these players, games are not just entertainment but a vehicle for self expression," stated Will Write in the article "Dream Machines" (Wired magazine, April 2006).
Review of SpeechTEK 2006
According to Judith Markowitz, technology editor, Speech Technology Magazine, "The core theme that wove itself into the fabric of SpeechTEK 2006 was improving customer service."
You Want a Soda With That?
TTS and Personalities: Expressing True Attitude
In order for expressive TTS to be effective, the voice, script, and affective tone all must support the intent and match the customer.
Voice Ideas - Behavioral vs. Physical
Show Some Emotion
Humans are emotional beings. We are highly skilled at crafting voice output - on the fly - that expresses a broad spectrum of emotions and intensities that range from minor irritation and mild amusement to tumultuous outbursts. Managing the expression of emotion in the speech applications that our customers build is also part of our work as speech-technology professionals. Today, we rely on trained actors (voice talent) to craft emotion-laden speech, but some of our
Gaining Acceptance Through Impressive Results
There are many reasons why speech recognition (ASR) and speech synthesis (TTS) are moving strongly into the mainstream. While they are not perfect, the quality of these technologies is impressive.
Listening to the Court of Appeals
My recent article on transcription (STM March/April 2005) prompted appeals for more information about voice writing. Here are answers to some of your questions.
An Interview with James L. Flanagan
Dr. James Flanagan's groundbreaking research in all facets of speech processing and related areas earned him the prestigious IEEE Medal of Honor. This year, he's retiring from Rutgers University. I had an opportunity to interview him about his work. JM: Of all the things you've worked on, what was the most exciting for you?
Speech in the Warehouse
ASR solutions for the warehouse have been available since the 1980s. Yet, when I attended ProMat, the premier warehousing conference, I found that only two of the 700-plus ProMat exhibitors specialize in automatic speech recognition (ASR) solutions: Vocollect and Voxware.
James L. Flanagan, Recipient of the 2005 IEEE Medal of Honor
James Flanagan's idea of the important research frontier is the challenge of creating a multi-modal interface that will enable people in remote locations to collaborate and communicate as naturally as possible. It would be an interface where you can not only exchange information by voice, but where the spoken word is supplemented by gesture, eye movement, facial expression, etc."
Transcription: What's It Good For Anyway?
Human-computer dialogue systems have claimed center stage in the speech industry bowinghappily to the kudos of an accepting marketplace. This is the inverse of the state of affairs that existed in the mid-1990s when PC-based dictation using statistical processing was clearly the industry's star. Shrink-wrapped general dictation products were awarded extensive shelf space in computer stores and the speech industry had visions of a speech revolution built atop boxes of Dragon NaturallySpeaking® (DNS) and ViaVoice.
I have for a long time considered customers amazingly creative in their approaches to speech deployments. Maybe it's because early in my career I was a customer myself. I, therefore, know from personal experience that creativity comes from the need to tackle demanding problems whose solutions are constrained by set requirements. It's a survival technique that I like to call pragmatic creativity.
Automating the Tower of Babel
When we think about how speech recognition is used we generally envision it operating in a human-machine interaction. That's not surprising because that's the context in which speech recognition operates today. This human-machine paradigm is not, however, how speech technology will be used in the future; one of the most exciting areas of automated human-human communication is speech-to-speech machine translation.
Crossing Species Boundaries
Not long ago I wrote a column about a Japanese researcher whose technology was used to develop Bow-Lingual, a product that was advertised as being able to translate dog barks into English. Although that product (along with its correlate for cat meows) could offer pet owners some assistance, it is primarily intended to provide fun.
Pluses and Minuses
Going to the Dogs
Dr. Matsumi Suzuki, president of Japan Acoustic Lab and a developer the science incorporated into Bow-Lingual, was kind enough to grant an interview to Speech Technology Magazine
Voice Biometrics - Are You Who You Say You Are?
Driven in part by the 9-11 attack, a growing number of companies are incorporating speaker authentication and, in some cases, speaker identification into their offerings. Leading independent analyst Dr. Judith Markowitz looks at security applications and the increasing awareness of speaker authentication.
Speaker Verification for Community Release
In a case study, Dr. Judith Markowitz interviews City Judge William Dupont of Iberville Parish in Plaquemine, La. Judge Dupont uses speaker verification to track juvenile defenders in the community release program.
I See What You Are Saying
There's no doubt that speech recognition is an assistive technology. Most of us are familiar with the use of dictation and voice-controlled desktop navigation tools by people with repetitive stress injuries (RSI). I've also seen a myriad of voice-activated implementations for people with limb paralysis and weakness that have included hospital beds, wheelchairs, environmental control systems and a complete feeding system (it was experimental and hadn't resolved problems related to the administration of liquids). There are also command-and-control systems for people with severe visual impairments, such as a voice-activated photocopier developed at Pitney Bowes.
Ergonomics of the Voice
Ergonomics is not simply good computing practice for individuals, it's also good business.
Toys That Have a Voice
One of the most interesting and informative sessions at SpeechTEK 2002 was the Real Solutions Seminar presented by Dave Peterson, vice president of Technology Acquisition at Hasbro. Peterson described two toys that Hasbro released this year: Aloha Stitch and R2D2. Both toys are based on popular movie characters and both interact with a child using speech recognition.
Bin Laden Speaking
In November, an international stir followed Al-Jazeeras release of a tape purported to be of Osama bin Laden. Al-Jazeera is the satellite television news network based in the Persian Gulf nation of Qatar.
The For-Real Story
Although September 11th focused attention on biometrics, people still ask me whether speaker authentication is "for real." I decided that one of the best ways to answer that question was to provide a sample of the variety of ways speaker authentication is being used in real, everyday operations.
To Tell the Truth
Did you ever want to ask: Does my car really need to have its engine rebuilt? Is the check actually in the mail? Can you really sell me that bridge?<@SM>
This is the second in a series of columns dealing with misunderstanding and misrepresentations of speaker authentication. In the first column (March/April 2002) I discussed the confusion regarding the difference between speech recognition and speaker verification and I decried the practice of marketing slightly-modified speech recognition as speaker authentication.
Speaking in Tongues
We live in a global world where it is no longer unusual for even a small business to market its goods and services globally. Wireless networks are expanding the global reach of business by making it possible to provide telecommunications services to areas that could not have wired services.
Truth in Advertising
Biometrics are hot—including voice-based biometrics. For those of us who have been in the industry for a while, it is like the beginning of spring after a long, hard winter.
The Meaning Should Be Clear When Choosing the Words
When you examine the individual words that make up the phrase you are likely to decide that it refers to the ability to hear a voice and say, "That sounds like Susan!" When machines can do this it is called speaker identification or speaker recognition. These phrases focus not on the person but what is being examined and recognized is, indeed, that person's voice.
Listen Up: How Natural-sounding Speech Engines are Changing the Industry
After a 15-year adolescence, text-to-speech technology is coming of age. Every TTS vendor's goal - a truly natural-sounding, voice-activated computer interface that can read text aloud like a human being - is now within reach of the development community. Industry observers all along have said TTS would have to make a quantum leap before it could achieve anything near the natural-sounding speech necessary for broad market acceptance. Today's synthesizers make that leap possible by using new processing and linguistic models to convert computer text into speech that is nearly indistinguishable from actual recorded human speech. TTS is speaking and the market is finally taking note.
Combining Technologies I
The idea that technologies can be used together to produce more flexible, accurate, powerful and friendly systems is certainly not new. It has been receiving increasing attention in the computing industry as a result of the availability of multimedia devices and fast, powerful processors.
What Is New at IBM Voice Systems?
Approximately one year ago, IBM Voice Systems assumed a new identity. It became part of Application & Integration middleware in the Software Solutions division of IBM. This move was a clear indication that IBM viewed speech technologies, in particular speech recognition, as viable for mainstream application.
Elixirs and Potions
Not long ago, a speech-recognition consultant sent me a sample packet of yellow salve with accompanying documents that proclaimed the salve to be an antidote for vocal strain. I accepted the packet of salve and the promotional literature as testimonials to the growing commercial success of speech recognition.
AVIOS Bounces Back
Defying reports of its demise, AVIOS held its 18th annual meeting in San Jose, Calif., May 24-26. Rumors of the end of the AVIOS conference were further put to rest by the size and quality of this year's conference. There were 249 paid conference attendees in addition to 150 others who visited the exhibit hall only.
The Outlook for Voice Biometrics
Having just completed an industry report on voice biometrics, I decided to take this opportunity to talk about some things that came out of my research.
Intel Previews Pentium III
Intel Corp. (Santa Clara, CA) assembled more than 250 exhibitors and as many members of the media in San Jose recently to preview the release of its next-generation, Pentium III chip. The new chip, code-named Katmai, is designed specifically to support the transmission of advanced multimedia technology over the Internet. It contains 70 new instructions, new architectures for floating-point and streaming operations, a unique ID code, and an enhanced cache layout that supports faster processing of large quantities of data. All of this runs at 450 to 550MHz.
Life After the Killer App.
I never fully understood the term "killer app." It sounds ominous. Does it roam the countryside attacking minor characters in science fiction movies? Does it lurk in dark alleys?
Speech at CES; Test Driving Dragon's Naturally Mobile
The 1999 Consumer Electronics Show (CES) was held in Las Vegas in January. For the first time in its long history CES offered a session on speech recognition and had a speech pavilion in its exhibition.
Who Is Talking About Speech?
Now that my responsibilities with SpeechTEK have ended, I want to take the opportunity to talk about four other conferences I attended in the September/October time frame, and one conference scheduled for January. These conferences only begin to suggest how widespread interest in speech and voice technology has become.
To Tell the Truth
Does it seem that we are immersed in an atmosphere of half-truths, boldfaced lies, deception, and willful manipulation?
Surveying the Territory
Are organizations combining speaker verification with other biometrics? Are they combining it with speech recognition? Where is the industry headed?
A Cornucopia of Standards
Whoa! How many standards does this industry need? Seems like I receive e-mail on a weekly basis about a new standard. -- An application developer responding to the recent announcements of biometric API standards.
It All Depends
I have always been fascinated by the ability of purely descriptive terms to assume strong positive or negative connotations. In the mid-1980s, I was intrigued when the term "isolated" (as in "isolated-word recognition") acquired negative overtones and was replaced by the less pejorative and more oblique (dare I say more discreet?) word "discrete" as the way to describe systems that require users to pause between words.
May We Speak Privately?
Privacy is a large and complex concept that encompasses a broad spectrum of freedoms and protections of person, property, and information. When privacy is applied to the use of computing technologies, concerns generally involve the collection, use, and dissemination of personal information.
Biometric Standards: Why We Need Them
Biometric-based technologies, such as speaker verification and live-scan fingerprinting, are the only fully automated methods available for verifying that a person is who she or he claims to be. As such, they are powerful tools for security and other operations that require authentication or identification.
A Rose by Any Other Name
One of the most bewildering aspects of learning about an industry or technology is navigating through a jungle of unfamiliar terminology. If you are entering the world of voice ID with existing knowledge about speech recognition, the terminology pitfalls become even more treacherous because seemingly identical labels can have incompatible meanings.
Banking on Voice ID
This is the first of a series of columns that Judith Markowitz will be writing on voice ID applications, markets, technology and products.
Industry reports: are they your best information source?
Staying ahead of the pack in the rapidly emerging speech recognitionindustry requires up to date, reliable information.