July 20, 2021
By Adam Sypniewski Chief technology officer of Deepgram
Industry Voices

Four Pitfalls to Avoid When Building Compelling Voice Experiences

Voice experiences are all around us, from smart technology in our homes to commands in our cars. Voice is an emerging and more human way to interact with the environment around us and is powering the next wave of consumer and employee experiences. In fact, Opus Research found that 80 percent of respondents believe the pandemic accelerated the adoption of automated speech recognition.

As developers take advantage of this new mode of communication, it can be tricky to navigate all of the different components that go into creating a robust user experience. In my role as a chief technology officer immersed in the artificial intelligence communications space, I speak to many customers and understand what they need to make their voice experiences successful. Here are four key things to avoid when creating robust voice experiences:

Ignoring the quality of your audio source. Voice-based experiences depend heavily on the quality of the audio input and the transcription abilities incorporated into the experience. As a developer, it's important to advocate for high-quality voice data because low-quality voice audio is not only difficult to hear but can also make developing your voice experience that much more difficult. Be sure you're using good speech formats and embracing standards like FLAC and Opus, which offer good compression and optimization. Good ASR providers will be able to work with whatever you have, but your data will be much cleaner if you avoid low sample rates like 8 kHz and instead use higher-quality sample rates, such as 16 kHz. At the end of the day, it's important to engage with your team to make sure your audio recording requirements are heard.
Relying on an ASR with rigid architecture. Too often, I see developers who get locked into off-the-shelf solutions that provide minimal flexibility. Finding a provider that offers low-cost solutions and easy-to-navigate real-time features are realistic goals, but there’s no one-size-fits-all solution. Developers need to know exactly what they're looking to get out of their audio data so they can choose a technology that will pull the most relevant insights to analyze. It's important to look for providers that offer deployment flexibility and speed, high accuracy, real-time capabilities, scalability, and custom training. The ranked importance of these features will vary based on your use case, but picking an ASR technology that does a great job with each of these will make your voice experiences much better down the line as they change and grow with end users' needs.
Ignoring the context in which your app will be used. If you intend your voice experience to run on a computer, you won't need to worry as much about connectivity and bandwidth. On the flip side, if your team members primarily use mobile devices where connectivity issues can occur, you should pick the best-fit audio codec that is optimized for low bandwidth so you're not hogging users' network connections. It's also good to be wary of closed-source audio codec since it doesn't need to be standardized. Whenever you can, try to work with open-source audio codecs.
Not leaving room to experiment and fail. All companies are going to have different needs and wants for their voice experiences—sometimes without a clear idea of how those will work in real time and under real-life conditions. Many things can go wrong as you're building your application or API, so it's critical to build your system with robustness and flexibility in mind as you get closer to finding something that meets the needs of your enterprise.

As a developer, you want to create the best voice experiences possible for whichever audience you're serving. It's more important than ever to make sure your voice data is high quality, that you understand what your underlying ASR technology can do, and that you create an agile back end experience that can handle exactly what you need. The time for voice experiences is now, and by putting in the proper APIs early on, your voice experience will flourish and adapt to your customers' needs.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Four Pitfalls to Avoid When Building Compelling Voice Experiences

Triton Digital Partners with ekoz.ai on Voice-Cloned Podcast Ads

Soul App Launches Full-Duplex Voice Model

Mistral Unveils Voxtral Open-Source AI Voice Model

Vonage Partners with AWS for AI Voice Agent Integration