Voice Privacy and Security Need Greater Attention
Most of the time when we talk about standards, especially technical standards, we think of the goal as making disparate systems interoperable. But another, more important role of standards is safety. Building codes, traffic rules, and manufacturing standards keep us safe by ensuring that the physical objects that we use don’t harm us or make it possible for other people to harm us.
Standards for safety in our digital lives are vital, too. The most common concerns are the security and privacy of digital data, such as keeping financial data secure or ensuring our online activities are not available to anyone without our permission. Unlike technical standards, security and privacy standards are frequently enforced by legislation, such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA), and the Health Insurance Portability and Accountability Act of 1996 (HIPAA), which focuses on protecting health information.
General privacy laws apply to all data, including voice and natural language data. Yet possibly because voice interaction is less common than graphical interaction, considerations about voice data are sometimes overlooked in digital privacy discussions. For example, the World Wide Web Consortium has recently published “Privacy Principles for the Web”, and while it’s a thoughtful document, it fails to mention voice data.
The many differences between traditional graphical applications and voice applications make it important to consider the special case of voice data. Tools like smart speakers and other voice applications bring incredible convenience to our lives, but by their nature, they involve recording our voices. In addition, hands-free, always-on voice access means the device is always recording, even if you’re not talking to it. And the prevalence of cloud-based speech recognition means that in most cases, speech goes to cloud servers for processing. Local processing of wake words doesn’t necessarily mean that speech isn’t going to the cloud; sometimes the device needs cloud-based processing to figure out if it has in fact heard the wake word.
In contrast to text data, browsing history, or graphical user interface (GUI) data collected when you interact with a website, your voice is uniquely associated with you. And how is consent to recording and storage of voice data obtained in voice-only applications? Reading lengthy privacy policies to users of voice-only applications isn’t practical.
In addition, speech is invisible and transient and in many cases leaves no record of when it was used. Your voice also conveys information about you that is for the most part unrelated to your goals. While your goal in talking to a smart speaker might be to turn on the lights, your voice also contains information about your gender, age, ethnicity, health, emotional state, and native language. When you talk to other people, they notice these things, too. But speech systems can identify things that people can’t detect, such as illnesses. It might seem useful for your smart speaker to be able to tell you you’re sick, but it takes little imagination to see the kind of advertising that would follow if a smart speaker knew the state of your health.
Simple recording of people’s voices without their consent has long been covered by legislation. Laws about wiretapping have existed for pretty much as long as information has been sent over wires. But these concerns were codified many years ago, when only raw, unanalyzed recordings were possible. Now we have extremely powerful voice analytics that can transcribe and detect patterns in recordings on a vastly larger scale than human analysis ever could.
Privacy and security standards for voice technology and applications will require a lot of work. Here are four use cases:
- unauthorized analysis of speech data that reveals users’ identities, demographics, health, or emotional state;
- unauthorized voice activation of IoT devices—unlocking doors, opening windows, turning off security lights;
- unauthorized speech capture by third parties while a user is interacting with a speech app; and
- unauthorized “eavesdropping” on conversations between people.
There are a few current efforts aimed at voice security and privacy. The Open Voice Network (OVON), a directed fund of the Linux Foundation, has a strong emphasis on voice privacy and security. OVON has published white papers titled “Ethical Guidelines for Voice Experiences” and “Privacy Principles and Capabilities Unique to Voice.” Another effort toward voice privacy and security is spearheaded by the Stanford Open Virtual Assistant Lab, which has created Genie, the “open, privacy-preserving virtual assistant.”
It’s essential for the speech industry to scrutinize voice use cases, clearly understand the potential for misuse of speech technology, determine how existing privacy and security standards apply, and support these ongoing efforts toward voice privacy standards.
Deborah Dahl, Ph.D., is principal at speech and language consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group. She can be reached at firstname.lastname@example.org.