Post-Retirement, Speech Tech Projects Await

As of the end of May, my wife is a professor emerita. My own retirement is not currently imminent, but I certainly can hear the footsteps behind me getting louder all the time.

What will I do if I stop work? I mean, aside from a multiyear project to clean out my office and storage shelves? (And how many oscilloscopes do I really need?)

Well, there’s a slew of interesting projects that will no doubt keep me busy; in fact, some are in progress already. I’m going to list a few of them here, and perhaps a friend, acquaintance, or enemy will come up with some even more enticing and let me know about them.

As I type this, there’s a handheld two-way radio on my desk. I have it on during all my waking hours. My local Jewish community—in fact Jewish communities all across the formerly safe haven of the United States—continues to experience anti-Semitic speech, actions, and even deadly attacks. I’m part of a local group of volunteers that patrols, responds to problems, and guards local synagogues and events.

The more I work with this group, the more I wonder how to integrate automatic speech recognition into this effort. Dedicated radios are better than smartphone apps: With only one button to push, there’s little confusion under stress. The problem is that voice on the radio is ephemeral: Sometimes I don’t hear the original or subsequent radio calls, or I simply wasn’t able to memorize the entire report. A text version would be great, if it’s accurate enough. Again, the key issue here is simplicity. If the radio becomes modal, it’s no longer simple enough for emergency use. A real-time display of the text would be wonderful, and perhaps dire threats would keep the phone designers from using the screen to add text-based features.

Other projects beckon. My very favorite text-to-speech voice disappeared years ago, and I’ve got a plan to use an API-driven collection of systems to revive it. It won’t be commercial—probably—but it will be immense fun.

On that same topic, I believe I’ve mentioned before that with just a minute of training on my voice, I generated an online TTS version that sounds just like me—according to my wife, who is an expert in that specific topic. When I created a video on how to use the radios mentioned above, I quickly realized that it was far easier to use TTS of my voice dubbed into the video instead of trying to narrate live.

I also was able to use a short video of myself speaking to generate a new, lip-synced version of myself saying something else entirely. I don’t have any current uses for that particular technology; my main worry is that someone else will.

Last year, as part of an experimental upgrade project, I attempted to transform a classical interactive voice response system into one that used artificial intelligence to manage telephony interactions. I tried twice: With one vendor, the AI tended to drift away from instructions—hallucinations, perhaps, but in the middle of a phone call! Another vendor’s offering required so much work—all entered into online forms and with no overview available—that I abandoned it as well. But that was last year, and every few months AI continues to improve—and I do have a few other personal projects in mind, transitioning some older technology to a better and easier-to-maintain version.

I realize that an “AI” “personal assistant” is available on smartphones and tablets. Google persistently nudges me about enabling their assistant; Samsung has their Bixby; and so far my Apple laptop has been mercifully silent, for the most part, about Apple’s incarnation. But I want my own personal assistant—one that works for me and not for the benefit of Google or Apple. Or for that matter for the benefit of LG, who manufactured my TV and asked for permission to overlay content-based advertisements (permission refused). I expect that we’ll start seeing individual, open-source AI assistants sometime in the near future. Now that sounds like a useful hobby!

Moshe Yudkowsky, Ph.D., is president of Disaggregate Consulting and author of The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions. He can be reached at speech@pobox.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Post-Retirement, Speech Tech Projects Await

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API