July 8, 2004
By Judith Markowitz Principal - J. Markowitz, Consultants
Forward Thinking

Crossing Species Boundaries

Not long ago I wrote a column about a Japanese researcher whose technology was used to develop Bow-Lingual, a product that was advertised as being able to translate dog barks into English. Although that product (along with its correlate for cat meows) could offer pet owners some assistance, it is primarily intended to provide fun. Unlike the Bow-Lingual, most efforts to apply speech-processing algorithms to animal vocalizations are part of serious research. The two most notable are for environmental research and monitoring of animals in the wild and for health management of domestic animals. Environmental Monitoring Environmental monitoring of animal migration, population sizes and other aspects of biological research rely heavily on bioacoustic analysis. The data come from continuous recordings made by devices that are distributed throughout a target habitat. Those devices capture a tremendous amount of field data that human experts need to analyze. For example, a recent wildlife survey of a 20-mile segment of /> Louisiana ’s Pearl River Wildlife Management Area utilized 12 recording units that operated for 53 days and produced more than 15,000 hours of data. Even larger studies are applied to whale movements and other animal populations. Traditionally, analysis of those recordings requires a labor-intensive and error-prone analysis of spectrographs by multiple experts. Increasingly, researchers have turned to automated analysis techniques because, according to Christine Erbe of Australia’s Institute of Ocean Sciences , "Only acoustics has the potential for long-term automatic and objective censusing without recurring and ongoing costs of personnel and ship or plane time." ¹ In the early 1990s, researchers in bioacoustics began to examine the viability of utilizing speech-processing technologies to automate analysis of field data – notably Hidden Markov models, dynamic time warping and neural networks. They reasoned that, like human communication, animal patterns of vocalization contain consistent patterns that can be used to identify species. Furthermore, animal communication is generally simpler than human language making it easier to create dictionaries and word models for those communication systems. In 1998, Kogan and Margoliash applied both DTW and HMM to field recordings of two species of birds. They determined that "excellent performance was achieved under some conditions. Nevertheless, further development is needed to improve the automated recognition of short duration bird song sounds, and to improve performance under adverse conditions." ² Mellinger and Clark had more consistent success identifying blue whale calls by using a combination of HMMs and acoustic filters in a noisy Arctic environment.³ In 2003, Clemins and Johnson reported on a study using HMMs to analyze vocalizations by African elephants. They achieved 83 percent accuracy in the classification of vocalizations and 88 percent accuracy in speaker identification and concluded that "because of their flexibility, speech systems provide an adaptable standard framework that can be applied to other animals."⁴ Other researchers have gotten excellent results by training neural networks to differentiate vocalization patterns of different species. Grigg, et. al., used a Kohonen self-organizing neural network to separate the vocalizations of false killer whales (Psendorca crassidens) from other patterns;⁵ Schwenker, et. al., employed another type of neural network to analyze field recordings of 35 different species of crickets;⁶ and Deecke, et al., found that neural networks could differentiate "dialects" in killer whale calls.⁷ Animal Husbandry Acoustic patterns are also important for monitoring the health of individual farm and herd animals. Both experts in animal husbandry and farmers support this assertion. According to researchers at Germany ’s Federal Agricultural Research Centre, "Ethologists and farmers are convinced that sounds produced by animals allow them to recognise single individuals and to characterise their state."⁸ Furthermore, that information can help improve animal welfare. When a herd is small it’s possible for a farmer to personally monitor those acoustic cues. As the herd size increases, automated tools must be used to capture the information. The FAL researchers are engaged in a project designed to apply speech recognition and speaker identification technology to provide acoustic information about the health of individual animals. Those techniques can improve the efficiency of animal husbandry. To date, the results of their research "demonstrate that the hidden Markov models are, in general, able to distinguish different animal conditions based on their utterances."⁹ Outlook The work I’ve just described is very encouraging, but research on the application of speech and speaker recognition to animal vocalizations is still very much in its infancy. Field data, whether drawn from the wild or farms, is noisy. It contains overlapping voices that may need to be separated and the HMMs and neural networks need to be carefully trained. All of that parallels the challenges that face speech and speaker recognition with humans. A number of the researchers also admit that work on animal vocalizations has one additional challenge: "Nobody understands the language of animals so far."¹⁰

References

1Erbe, Christine "Census of Marine Mammals" 2000 http://www.speechtechmag.com/Admin/cgi-bin/udt/www.coml.org/%20scor/2000/erbe/erbe.html. 2Kogan, Joseph A. and Daniel Margoliash. Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden Markov models: A comparative study. Journal of the Acoustic Society of America . Vol. 103, No. 4, April 1998. P. 2185. 3Mellinger, David K. and Christopher W. Clark. Bioacoustic transient detection by image convolution. Paper presented at the 125^th meeting of the Acoustical Society of America, May 1993. 4Patrick J. Clemins and Michael T. Johnson Application Of Speech Recognition To African Elephant (Loxodonta Africana) Vocalizations. International Conference on Acoustics, Speech, and Signal Processing. 2003. Vol. I p. 487. 5Grigg, G., A. Taylor, H. Mc Callum, Graeme Watson, Monitoring frog communities: an application of machine learning. Proceedings of the Eighth Innovative Applications of Artificial Intelligence Conference, Portland , OR , 1996. pp. 1564–1569. 6Schwenker, Friedhelm, Christian Dietrich , Hans A. Kestler, Klaus Riede, and Gunther Palm Radial basis function neural networks and temporal fusion for the classification of bioacoustic time series. Neurocomputing 51 2003. pp. 265 – 275. 7Deecke, V.B., J.K.B. Ford, P. Spong. Quantifying complex patterns of bioacoustic variation: Use of a neural network to compare killer whale (Orcinus orca) dialects. Journal of the Acoustical Society of America . Vol.105 i4 1999. pp. 2499-2507. 8Ikeda, Y., G. Jahns, W. Kowalczyk, and K. Walter. Acoustic Analysis to Recognize Individuals and Animal Conditions. Proceedings of the XIV Memorial CIGR World congress. November-December, 2000. P6 and 8206. 9Ibid. P. 6. 10 Ibid. P. 8206

Dr. Judith Markowitz is the associate editor of Speech Technology Magazine and is a leading independent analyst in the speech technology and voice biometric fields. She can be reached at (773) 769-9243 or jmarkowitz@pobox.com .

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Crossing Species Boundaries

Voice Deepfake Fraud Surged 1,300 Percent

Sanas Unveils Simultaneous Real-Time Speech-to-Speech Translation

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API