IBM Eyes Speech to Spot Mental Maladies
As part of its “5 in 5,” an annual list of five groundbreaking scientific innovations that could change the way people work, live, and interact during the next five years, IBM Research advanced the theory that speech technology, combined with artificial intelligence, will soon be able to open a window into people’s mental health.
IBM Research predicts that in five years, patterns in speech and writing analyzed by cognitive systems will provide clinicians tell-tale signs of early-stage mental and neurological conditions.
At IBM, scientists are using transcripts and audio inputs from psychiatric interviews, coupled with machine learning techniques, to find patterns in speech to help clinicians accurately predict and monitor psychosis, schizophrenia, mania, and depression. It takes only about 300 words to help clinicians predict the probability of psychosis in a patient.
Today, analysis of language is a labor-intensive process of manually interviewing and recording multiple lengthy sessions with a patient. There is no way to quantify or codify these sessions, resulting in a massive Big Data problem, according to Guillermo Cecchi, research staff member in biometaphorical computing at IBM Research.
A tool that could in near real time analyze and codify a sample of the patient’s speech would drastically shorten the time it takes for doctors and caregivers to predict and diagnose patients, he explained.
To help in that regard, IBM is building an automated speech analysis application that runs off a mobile device. By taking approximately one minute of speech input, the system uses text-to-speech, advanced analytics, machine learning, natural language processing, and computational biology to provide a real-time overview of the patient’s mental health.
IBM has already been working with Columbia University psychiatrists to predict who among a population of at-risk adolescents would develop their first episodes of psychosis within two years. The analysis was flawless, with 100 percent accuracy.
In other research with Pfizer, Cecchi and his team are using only about one minute of speech from Parkinson’s patients to better track, predict, and monitor the disease. The research is seeing nearly 80 percent accuracy.
IBM Research anticipates that in the future, similar techniques could be used to help patients with Parkinson’s, Alzheimer’s, Huntington’s disease, post-traumatic stress disorder (PTSD), and even neurodevelopmental conditions such as autism and ADHD. Cognitive computers can analyze a patient’s speech or written words to look for tell-tale indicators found in language, including meaning, syntax, and intonation. Combining the results of these measurements with those from wearable devices and imaging systems and collecting them in a secure network can paint a more complete picture of the individual for health professionals to better identify, understand, and treat the underlying disease.
Such research is already under way in some other circles.
SRI International and New York University’s Langone Medical Center recently collaborated on a study that showed, with 73 percent accuracy, that speech could be used to provide some tentative indicators of PTSD, which affects more than 30 percent of veterans who have spent time in war zones and about 8 percent of the civilian population. The study also showed speech could be used to diagnose other mental health issues, including depression, suicide risk, and childhood trauma.
According to SRI, based on prior research, speech may be influenced by emotional and mental health. Depressed people, for example, stereotypically speak with a flat affect or monotone. In this clinical study, SRI focused on the prosodic characteristics of speech, such as speaking rate, pitch, energy or intensity, and pause duration, as well as other acoustic features.
Bruce Knoth, senior software engineer at SRI, points out that assessing PTSD through speech has several advantages. Speech, he explains, is natural, noninvasive, inexpensive, and can be obtained via phone for remote analysis. It can also be used for triage purposes or to monitor treatment progress.
In another unrelated experiment, Waveform Communication and Methodist Sports Medicine have been compiling a large database of impaired speech and acoustically processing it to help diagnose concussions.
The two firms have collected speech from 45 people diagnosed with concussions. Each subject was recorded on the initial visit where he was clinically diagnosed with a concussion and recorded at each additional office visit until medically released from care. The Waveform Communication-Methodist Sports Medicine database contains more than 3,800 wav files. Each recording was acoustically analyzed every six milliseconds, providing a data set consisting of thousands of rows of acoustic data.
The Waveform Model of Vowel Perception and Production has been developed into an automatic speech recognition algorithm achieving greater than 99 percent accuracy. Waveform has taken the data and developed a concussion identification tool named Cobweb.
Additional reporting by Phillip Britt