Crossing Species Boundaries
Language, defined as the systematic use of vocalization to express meaning, is one way we humans have traditionally distinguished our species from “lesser” creatures. It has been a source of species-linked pride. Unfortunately for us, pride goeth before the fall because research in bioacoustics is demonstrating that we are not unique in our ability to use vocalizations to communicate. We just haven’t been able to understand what other creatures have been trying to tell us all along.
Effective bioacoustic analysis, especially on a large scale, requires multiple continuous recordings of animals producing a range of vocalizations. This is difficult and time-consuming, unless it is automated. Developing an automated bioacoustic system is not trivial, either: There is a virtual absence of bioacoustic corpora; microphones and acoustic analyzers designed for humans might not provide proper coverage; signal-to-noise ratios are poor or highly variable; the communication medium might be different (e.g., water); equipment must be unobtrusive, durable, and weather resistant; unlike humans, animals vocalize only when they feel like it; and so far, nobody understands the language of animals.
Bioacoustic researchers have been working to overcome these challenges. They are using animal vocalizations and noise production to differentiate the creatures in one or more locations, quantify population sizes, identify and track individual animals, and interpret animal vocalizations. The Dr. Dolittle project (speechlab.eece.mu.edu/dolittle) developed a hidden Markov model (HMM) algorithm for vocalizations by African elephants that does small-set speaker identification (with 82.5 percent accuracy) and distinguishes among five vocalizations (croaks, rumbles, revs, snorts, and trumpets) with a 94.3 percent accuracy. Researchers are now working on assigning meaning to the vocalizations by correlating them with accompanying behavior.
A few years ago, I wrote about a fun product called Bow Lingual, which purported to translate dog barks into English. A consortium of researchers from Germany, Japan, and the Netherlands is developing a far more serious speaker identification, vocalization recognition, and speech-to-text translation system for dairy cows. The goal is to develop an inexpensive “call recognizer” capable of monitoring the health and needs of each animal, especially on farms with large animal populations.
The initial challenge was to develop a corpus of utterances and their meanings. This required bringing animals to well-defined physical conditions and then recording their utterances. The five conditions captured, to date, are hunger, giving birth, claw/foot trimming, heat, and delayed milking.
Analysis of the utterances revealed that the anatomical similarities between dairy cows and humans were more significant than the differences. For example, the frequency spectrum of vocalizations fell below 5,000 hertz (within that of humans). These similarities made it possible to use standard recording equipment, a source-filter model developed for human speech, and HMM representations of the vocalizations.
Testing was done in a farm environment with a noisy background. The call recognizer had no problem identifying which of four cows produced each of 31 vocalizations. It also performed well in classifying three kinds of utterances (delayed milking, hunger, and giving birth) for one of the cows but stumbled when asked to classify utterances from the others.
This research demonstrates that humans are not the only creatures that can be identified based on their voices, nor are we alone in our ability to use vocalizations in a meaningful way.
This work also reveals that the most significant barrier to advancing bioacoustics is the absence of large corpora containing thousands, rather than a few hundred, samples. For example, to create decent, speaker-independent HMMs, it is absolutely necessary to have a minimum of hundreds (preferably thousands) of samples from hundreds (preferably thousands) of animals.
This hurdle is identical to the one faced by developers of speech recognition technology prior to the mid-1990s. That problem was resolved through extensive funding of corpus development and the establishment of organizations, such as the Linguistic Data Consortium and the European Language Resources Association, that clean, annotate, and share those corpora with researchers around the globe. Unfortunately, it isn’t clear whether a strong impetus to fund corpus development of animal vocalizations exists to move this research forward and support more generalized results.
Judith Markowitz Ph.D., is president of J. Markowitz Consultants and a leading independent analyst in the speech and voice biometrics field. She can be reached at judith@jmarkowitz,com.