Will Speech Technology Keep Speech Free?

The editors of Speech Technology asked me and other pundits to offer predictions for 2019, which I did. My prediction about censorship in speech may seem gloomy; let me expand on it a bit.

I’d like to offer two measurements of speech technology. The first is mundane vs. magic; the second is utopia vs. dystopia. The two combine in somewhat unpredictable ways.

First, let’s consider mundane vs. magic. Speech technology has not yet hit a wall; whether through improved recognition engines or more complete integration of ancillary information, speech recognition grows better each year.

Mundane vs. magic, as a scale, changes over time. After all, smartphones provide a magical experience—I can pick up my phone here in Chicago and place a video call to a friend in Australia, and if that doesn’t amaze you when you think about it, you’ve become too jaded. With the slow abrasion of time, telephony has lost its luster and is now mundane, along with video; I am not certain what magical communication will replace them. In the long term, we all want speech recognition to fade into a (profitable) background expectation, but for now we need to provide magical experiences to lure money out of its hiding places.

Utopia vs. dystopia almost explains itself. In 2019 China will be well on the way to becoming the first fully automated police state, with universal surveillance not only of its citizen-prisoners’ finances, associates, and daily peregrinations, but also of anything they say in any forum. A few algorithmic steps beyond lie calculations of political reliability from what food they eat and what clothes they wear.

The U.S. is not immune to this disease; the Transportation Safety Administration will likely be the first agency to spy on speech as it continues its push towards complete biometric tracking of travelers in airports and beyond. Neither are the U.K. and the countries of Western Europe, all of which suffer from lack of protections for free speech.

Where does this leave us? Let’s consider the four possible scenarios.

First, the worst case of all: magical technology in a dystopia. Highly accurate recognition with the ability to discern lies, indirect references, intents, and coded phrases allows the government to crush opposition because universal surveillance becomes affordable. Conspirators can speak plainly only in the most limited of circumstances and must assume everything they do is under the watchful eye of the government.

Second, mundane technology in a dystopia. This is China’s current path; today’s speech recognition and other biometrics do not provide sufficient accuracy. Their secret police will drown in a sea of false positives—while their leaders criticize them from shore and urge them to swim harder. This blundering should not cheer up China’s citizen-prisoners: The easiest way for the police to show progress will be to ignore the possibility of error and treat all output from their surveillance devices as definitively correct.

Third, mundane technology in a utopia. This is the current state of affairs in the democracies of the developed world. My phone and tablet have speech recognition and do a tolerable job of actually recognizing. The companies I call have speech recognition with good enough accuracy and, as they revamp their systems, better voice user interfaces. The in-home devices from Google, Amazon, and others—the ones I’ve tried, at least—work reasonably well. I expect them to improve (but without my help, since I value my privacy too highly).

Fourth and finally, magic technology in a utopia. This future is easy to imagine: Watch any movie or TV show that portrays the life of the well-to-do at the beginning of the 20th century. The upper class have servants to handle details and trusted employees to handle tactical decisions; the master and mistress of the household focus on strategic decisions. Dinner party tonight? Make the decision and hand off the details. In this future, one may have to wash one’s own laundry, but fairly soon a reasonably intelligent virtual butler will screen phone calls and visitors at the front door.

As for the next few years, where you live will determine your experience. China will continue to impose authoritarian censorship. Western Europe and parts of the British Commonwealth will use recognition to enforce speech codes. In the U.S., social media platforms will continue to automate and expand their censorship, government agencies will implement biometric tracking programs, and defenders of free speech will continue to fight back. Here’s hoping that the defenders—and therefore all of us—win that crucial battle. x

Moshe Yudkowsky, Ph.D., is the president of Disaggregate Consulting and author of The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions. He can be reached at speech@pobox.com.

Ethics and Algorithms—Exploring the Implications of AI

Concerns have been voiced about how AI and speech technologies are now being used, but solutions are not clear-cut

10 Jun 2019

Speech Technology Predictions for 2019

We asked our regular contributors and columnists what speech technology trends and stories will dominate the headlines in 2019. Which of these predictions do you think will come true in the year to come, and which do you think we'll still be waiting for in 2020?

19 Dec 2018

Top Trends in Speech Technology for 2018

At a time when the pace of change in speech technology evolution and adoption seems to be on hyperdrive, a few key trends are pointing where the industry will move in the coming months.

17 Sep 2018

Will AI-Powered ‘Microservices’ Bring Back Services of Old?

The Victorian era had butlers and clerks; we have speech recognition and bots.

29 Aug 2018

Will Speech Technology Keep Speech Free?

Ethics and Algorithms—Exploring the Implications of AI

Speech Technology Predictions for 2019

Top Trends in Speech Technology for 2018

Will AI-Powered ‘Microservices’ Bring Back Services of Old?

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

VoicePing Releases VoicePing 3.0

Voiskey Officially Launches

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

DeepL Acquires Mixhalo

The Voice Can Sound Right, and the Video Can Still Be Wrong

Canary Speech Partners with NeuroLexIQ

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Voicelyt Launches Voice Score

DXC Partners with ElevenLabs

Nabla Launches Dictation for Mac

Fish Audio Raises $52 Million in Seed Funding

Deliverect Partners with SoundHound AI

OrcaRouter Launches OrcaDub