April 7, 2022
By Deborah Dahl Principal - Conversational Technologies
Standards

Natural Language Interfaces That Everyone Can Use

Natural language interfaces, in the form of chatbots, interactive voice response (IVR) systems, and intelligent virtual assistants (IVAs), are becoming increasingly popular. In fact, SmartCustomerService.com reports that the conversational AI market is predicted to reach $18.4 billion by 2026.

For most people, interfaces that employ natural language make self-service much simpler than traditional user interfaces. But this isn’t true for everyone. Successful natural language interaction can only happen if both the user and the system can perceive and understand each other’s utterances. That sounds obvious, and it works out just fine for most users, most of the time. But people differ widely in their ability to express themselves and understand what others are saying.

Designers need to take these differences into account. Because of the prevalence of other channels—websites, screen readers, human agents—it’s easy to think that every user can get the information they need through one channel or another, and so designers don’t have to worry too much about ensuring that natural language interfaces are accessible. But there’s no guarantee the same information is available through all these channels, and some important alternative channels, like human agents, may not be available all the time.

The Accessible Platform Architectures Working Group of the World Wide Web Consortium has published important standards for improving web page accessibility. It has also recently considered the accessibility of conversational natural language interfaces, and it has published a draft requirements document, “Natural Language Interface Accessibility User Requirements” (NAUR). This work is at an early stage and isn’t an official standard at this point, but it’s full of insight on how to ensure that natural language interfaces, including chatbots, IVAs, and IVRs, are accessible.

Conversational interfaces need to be usable for people with visual or hearing disabilities, but some of the most valuable advice in NAUR addresses accessible conversational interfaces for people with cognitive disabilities. Cognitive disabilities include attention deficit disorders, autism, intellectual disabilities, and memory loss, among others. Cognitive disabilities are very common—the CDC estimates that 16 million adults in the United States have some cognitive impairment.

NAUR’s guidance regarding accessibility for users with cognitive disabilities includes these points:

Some users need a system’s speech to be slower or louder so that they can understand it. Users should be able to ask the system to speak more slowly or more loudly. If there’s a visual modality available, onscreen controls should let users easily change the speech’s volume or rate.
If users have difficulty understanding how the conversational interface works, they should be able to ask for help. If there’s a visual component to the application, commands and menus can be displayed.
Users should not have to learn specific commands to operate the system; there should be a variety of ways to ask the same question.
Some users might need to hear or see information more than once to understand it, so systems should make it possible to request repetitions, or even better, to request repetitions in simplified language.
In a text-based chatbot, users with memory limitations need to be able to scroll through the dialogue’s entire history.
Sometimes users need extra time to decide how to respond to a system’s utterances. NAUR recommends allowing timeouts to be extended on request, or if the user isn’t responding.
Clear language, including terminology, is also important to users with cognitive disabilities. If it’s necessary to use language that could be unfamiliar to some users, make definitions available. Ensure that units of currency and measurement and formats for dates and times are localized.
If words in a speech interface are mispronounced, they can be misunderstood. Sometimes text-to-speech systems can mangle unusual proper names or mix up words with the same spelling but different pronunciations. Many speech interface platforms can be guided by pronunciation standards like the W3C’s Speech Synthesis Markup Language. Developers should use these tools to ensure that text-to-speech systems pronounce spoken words correctly.
If both voice and visual interfaces are available, some users will find information easier to understand if it’s presented both ways simultaneously.

These accommodations are also clearly useful for people without cognitive disabilities. Everyone has occasional temporary cognitive limitations—they can be tired, distracted, or taking medication. In those cases, all users will benefit from designs tailored for people with cognitive disabilities.

These are just some examples of the excellent suggestions provided in the NAUR document. I encourage all conversation designers to review it (https://www.w3.org/TR/naur/). The document also includes some ways for the public to comment. Your thoughts and suggestions are very welcome.

Deborah Dahl, Ph.D., is principal at speech and language consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group. She can be reached at dahl@conversational-technologies.com.

Natural Language Interfaces That Everyone Can Use

Deepgram Launches Streaming Speech, Text, and Voice Agents on Amazon SageMaker AI, Integrates with Amazon Connect

Wispr Raises $25 Million to Build Its Voice Operating System

Curantis Partners with nVoq

Read AI Introduces Operator Mobile and Desktop Apps