Biographical Information
Judith Markowitz
Principal
J. Markowitz, Consultants
judith@jmarkowitz.com
773-769-9243
Dr. Judith A. Markowitz, Ph.D., is recognized internationally as the leading independent analyst in voice-based biometrics (speaker verification and identification) and as one of the leaders in speech processing. For more than 20 years, she has provided technical and strategic consulting to organizations with from two-person startups to 100,000-employee corporations. Dr Markowitz is the president of J. Markowitz, Consultants and technology editor of Speech Technology Magazine. She’s a member of the editorial review board of the International Journal of Speech Technology, co-chair of the VoiceXML Speaker Biometrics Committee, an invited expert to the W3C Voice Browser Working Group’s Speaker Identification and Verification working group, and liaison to ANSI M1 biometrics committee.
Articles By Judith Markowitz
Lab sessions gave companies the opportunity to showcase their latest products.
Clarifying definitions depends on users' perspectives.
The draft standard is out there; now let's provide the feedback.
This is an iPhone application that gets it right.
Identifying speech patterns in dairy cows challenges researchers.
New sessions at SpeechTEK will offer hands-on training.
Part I: Speaker verification ensures the identities of students.
Do your part to develop, certify, and support industry standards.
Effective communication with end users can stop problems before they arise.
China is fertile ground for speech, but it needs proper cultivation.
Posted 01 May 2008
-
May 2008
- by
Judith Markowitz
SR is seen as a critical tool for enhancing worker safety and productivity and improving the bottom line.
Speaker authentication is still so new that many misunderstandings and myths abound.
Biological and language factors should be considered to improve the performance of speech processing systems.
Posted 01 Jun 2007
-
June 2007
- by
Judith Markowitz
Array microphones provide cleaner audio signals
Over the years, I'd heard about planned deployments of automatic speech recognition (ASR) and/or text-to-speech (TTS) in drive-through facilities of fast-food restaurants.
"Online communities that form around these imaginative activities are some of the most vibrant on the Web. For these players, games are not just entertainment but a vehicle for self expression," stated Will Write in the article "Dream Machines" (Wired magazine, April 2006).
According to Judith Markowitz, technology editor, Speech Technology Magazine, "The core theme that wove itself into the fabric of SpeechTEK 2006 was improving customer service."
Posted 15 Aug 2006
-
- by
Judith Markowitz
In order for expressive TTS to be effective, the voice, script, and affective tone all must support the intent and match the customer.
Humans are emotional beings. We are highly skilled at crafting voice output - on the fly - that expresses a broad spectrum of emotions and intensities that range from minor irritation and mild amusement to tumultuous outbursts. Managing the expression of emotion in the speech applications that our customers build is also part of our work as speech-technology professionals. Today, we rely on trained actors (voice talent) to craft emotion-laden speech, but some of our
There are many reasons why speech recognition (ASR) and speech synthesis (TTS) are moving strongly into the mainstream. While they are not perfect, the quality of these technologies is impressive.
My recent article on transcription (STM March/April 2005) prompted appeals for more information about voice writing. Here are answers to some of your questions.
Dr. James Flanagan's groundbreaking research in all facets of speech processing and related areas earned him the prestigious IEEE Medal of Honor. This year, he's retiring from Rutgers University. I had an opportunity to interview him about his work. JM: Of all the things you've worked on, what was the most exciting for you?
ASR solutions for the warehouse have been available since the 1980s. Yet, when I attended ProMat, the premier warehousing conference, I found that only two of the 700-plus ProMat exhibitors specialize in automatic speech recognition (ASR) solutions: Vocollect and Voxware.
James Flanagan's idea of the important research frontier is the challenge of creating a multi-modal interface that will enable people in remote locations to collaborate and communicate as naturally as possible. It would be an interface where you can not only exchange information by voice, but where the spoken word is supplemented by gesture, eye movement, facial expression, etc."
Posted 01 Apr 2005
-
- by
Judith Markowitz
Human-computer dialogue systems have claimed center stage in the speech industry bowinghappily to the kudos of an accepting marketplace. This is the inverse of the state of affairs that existed in the mid-1990s when PC-based dictation using statistical processing was clearly the industry's star. Shrink-wrapped general dictation products were awarded extensive shelf space in computer stores and the speech industry had visions of a speech revolution built atop boxes of Dragon NaturallySpeaking® (DNS) and ViaVoice.
I have for a long time considered customers amazingly creative in their approaches to speech deployments. Maybe it's because early in my career I was a customer myself. I, therefore, know from personal experience that creativity comes from the need to tackle demanding problems whose solutions are constrained by set requirements. It's a survival technique that I like to call pragmatic creativity.
When we think about how speech recognition is used we generally envision it operating in a human-machine interaction. That's not surprising because that's the context in which speech recognition operates today. This human-machine paradigm is not, however, how speech technology will be used in the future; one of the most exciting areas of automated human-human communication is speech-to-speech machine translation.
Not long ago I wrote a column about a Japanese researcher whose technology was used to develop Bow-Lingual, a product that was advertised as being able to translate dog barks into English. Although that product (along with its correlate for cat meows) could offer pet owners some assistance, it is primarily intended to provide fun.
Dr. Matsumi Suzuki, president of Japan Acoustic Lab and a developer the science incorporated into Bow-Lingual, was kind enough to grant an interview to Speech Technology Magazine
In a case study, Dr. Judith Markowitz interviews City Judge William Dupont of Iberville Parish in Plaquemine, La. Judge Dupont uses speaker verification to track juvenile defenders in the community release program.
Driven in part by the 9-11 attack, a growing number of companies are incorporating speaker authentication and, in some cases, speaker identification into their offerings. Leading independent analyst Dr. Judith Markowitz looks at security applications and the increasing awareness of speaker authentication.
There's no doubt that speech recognition is an assistive technology. Most of us are familiar with the use of dictation and voice-controlled desktop navigation tools by people with repetitive stress injuries (RSI). I've also seen a myriad of voice-activated implementations for people with limb paralysis and weakness that have included hospital beds, wheelchairs, environmental control systems and a complete feeding system (it was experimental and hadn't resolved problems related to the administration of liquids). There are also command-and-control systems for people with severe visual impairments, such as a voice-activated photocopier developed at Pitney Bowes.
Ergonomics is not simply good computing practice for individuals, it's also good business.
Posted 01 Aug 2003
-
- by
Judith Markowitz
One of the most interesting and informative sessions at SpeechTEK 2002 was the Real Solutions Seminar presented by Dave Peterson, vice president of Technology Acquisition at Hasbro. Peterson described two toys that Hasbro released this year: Aloha Stitch and R2D2. Both toys are based on popular movie characters and both interact with a child using speech recognition.
In November, an international stir followed Al-Jazeeras release of a tape purported to be of Osama bin Laden. Al-Jazeera is the satellite television news network based in the Persian Gulf nation of Qatar.
Although September 11th focused attention on biometrics, people still ask me whether speaker authentication is "for real." I decided that one of the best ways to answer that question was to provide a sample of the variety of ways speaker authentication is being used in real, everyday operations.
Did you ever want to ask: Does my car really need to have its engine rebuilt? Is the check actually in the mail? Can you really sell me that bridge?<@SM>
This is the second in a series of columns dealing with misunderstanding and misrepresentations of speaker authentication. In the first column (March/April 2002) I discussed the confusion regarding the difference between speech recognition and speaker verification and I decried the practice of marketing slightly-modified speech recognition as speaker authentication.
We live in a global world where it is no longer unusual for even a small business to market its goods and services globally. Wireless networks are expanding the global reach of business by making it possible to provide telecommunications services to areas that could not have wired services.
Biometrics are hot—including voice-based biometrics. For those of us who have been in the industry for a while, it is like the beginning of spring after a long, hard winter.
When you examine the individual words that make up the phrase you are likely to decide that it refers to the ability to hear a voice and say, "That sounds like Susan!" When machines can do this it is called speaker identification or speaker recognition. These phrases focus not on the person but what is being examined and recognized is, indeed, that person's voice.
After a 15-year adolescence, text-to-speech technology is coming of age. Every TTS vendor's goal - a truly natural-sounding, voice-activated computer interface that can read text aloud like a human being - is now within reach of the development community. Industry observers all along have said TTS would have to make a quantum leap before it could achieve anything near the natural-sounding speech necessary for broad market acceptance. Today's synthesizers make that leap possible by using new processing and linguistic models to convert computer text into speech that is nearly indistinguishable from actual recorded human speech. TTS is speaking and the market is finally taking note.
The idea that technologies can be used together to produce more flexible, accurate, powerful and friendly systems is certainly not new. It has been receiving increasing attention in the computing industry as a result of the availability of multimedia devices and fast, powerful processors.
Approximately one year ago, IBM Voice Systems assumed a new identity. It became part of Application & Integration middleware in the Software Solutions division of IBM. This move was a clear indication that IBM viewed speech technologies, in particular speech recognition, as viable for mainstream application.
Not long ago, a speech-recognition consultant sent me a sample packet of yellow salve with accompanying documents that proclaimed the salve to be an antidote for vocal strain. I accepted the packet of salve and the promotional literature as testimonials to the growing commercial success of speech recognition.
Defying reports of its demise, AVIOS held its 18th annual meeting in San Jose, Calif., May 24-26. Rumors of the end of the AVIOS conference were further put to rest by the size and quality of this year's conference. There were 249 paid conference attendees in addition to 150 others who visited the exhibit hall only.
Having just completed an industry report on voice biometrics, I decided to take this opportunity to talk about some things that came out of my research.
Intel Corp. (Santa Clara, CA) assembled more than 250 exhibitors and as many members of the media in San Jose recently to preview the release of its next-generation, Pentium III chip. The new chip, code-named Katmai, is designed specifically to support the transmission of advanced multimedia technology over the Internet. It contains 70 new instructions, new architectures for floating-point and streaming operations, a unique ID code, and an enhanced cache layout that supports faster processing of large quantities of data. All of this runs at 450 to 550MHz.
I never fully understood the term "killer app." It sounds ominous. Does it roam the countryside attacking minor characters in science fiction movies? Does it lurk in dark alleys?
The 1999 Consumer Electronics Show (CES) was held in Las Vegas in January. For the first time in its long history CES offered a session on speech recognition and had a speech pavilion in its exhibition.
Now that my responsibilities with SpeechTEK have ended, I want to take the opportunity to talk about four other conferences I attended in the September/October time frame, and one conference scheduled for January. These conferences only begin to suggest how widespread interest in speech and voice technology has become.
Does it seem that we are immersed in an atmosphere of half-truths, boldfaced lies, deception, and willful manipulation?
Are organizations combining speaker verification with other biometrics? Are they combining it with speech recognition? Where is the industry headed?
Whoa! How many standards does this industry need? Seems like I receive e-mail on a weekly basis about a new standard. -- An application developer responding to the recent announcements of biometric API standards.
I have always been fascinated by the ability of purely descriptive terms to assume strong positive or negative connotations. In the mid-1980s, I was intrigued when the term "isolated" (as in "isolated-word recognition") acquired negative overtones and was replaced by the less pejorative and more oblique (dare I say more discreet?) word "discrete" as the way to describe systems that require users to pause between words.
Privacy is a large and complex concept that encompasses a broad spectrum of freedoms and protections of person, property, and information. When privacy is applied to the use of computing technologies, concerns generally involve the collection, use, and dissemination of personal information.
Biometric-based technologies, such as speaker verification and live-scan fingerprinting, are the only fully automated methods available for verifying that a person is who she or he claims to be. As such, they are powerful tools for security and other operations that require authentication or identification.
One of the most bewildering aspects of learning about an industry or technology is navigating through a jungle of unfamiliar terminology. If you are entering the world of voice ID with existing knowledge about speech recognition, the terminology pitfalls become even more treacherous because seemingly identical labels can have incompatible meanings.
This is the first of a series of columns that Judith Markowitz will be writing on voice ID applications, markets, technology and products.
Staying ahead of the pack in the rapidly emerging speech recognitionindustry requires up to date, reliable information.