Software Inside the Hardware: Unlocking the Skills in a VUI

Article Featured Image

Voice assistants are increasingly ubiquitous in American households. More than 50 million households have some form of a smart speaker and it’s predicted to rise to as many as 100 million in the next 1-2 years. Part of the reason for this is the simplicity with which users can ask simple questions of the devices and get accurate responses.

Like the best technologies, it integrates into daily life seamlessly. But these systems only know what they are programmed to know. Specifically, through Skills (Alexa) and Actions (Google). Developers create these skills to perform specific tasks, interact with branded databases, or otherwise capture and share data through a voice assistant.

The Role of Skills in the Voice User Interface

If an Amazon Echo or Google Home is the personal computer, skills are the software within the operating system—the tools that let you harness the power of that computer to get things done. And to date, there are tens of thousands of these skills designed by brands, startups, and individual developers to perform unique voice queries.

Capital One jumped into the fray early with the first banking skill for Amazon Alexa back in 2016. It allowed customers to check their balances, pay their credit card bills, and more. Others have built on this to allow users to directly pay for goods and services through skills on their devices.

Other skills are more generic. Simple questions that users might ask about the weather, the news, recent sporting events, or even a basic Google search are all much more commonly used with voice assistants. At the same time, while Amazon has programmed a substantial baseline of skills into the system, there are hundreds of new skills added every month that identify a new potential query and that can subsequently be picked up by big brands.

In fact, generic skills are far more used by the owners of these devices than branded tools. A recent ComScore survey found that while more than 50% of smart speaker owners use their devices to ask general questions, check the weather and stream music, only 11% order products, only 16% interact with local businesses, and only 22% stream news or content published through these skills.

Much as developers rushed to put new and exciting apps onto Apple’s App Store in 2009, we’re seeing a bit of a gold rush to develop new skills for voice assistants. But the rules have changed. Developing for a VUI is an entirely different challenge than traditional GUI development and it shows with a wide range of quality in skills.

How Skills Unlock the VUI

For those that want to develop new skills for voice-assistants, it’s important to step back and prepare a plan that clearly identifies what the user hopes to achieve. Voice is becoming an important player in web search and soon commerce, but it’s still just one player, so it’s important to recognize when someone would use a voice interface over a more traditional mobile or desktop GUI. Your designs should make the process easier. Skills that exist for the sake of existing are subsequently rarely used and can often be more confusing than they are helpful.

On the flip side, voice skills that remove friction and make it easier to obtain information in a diverse array of locations are immensely useful. Because they sit atop a vast trove of databases, including web search, a company’s own resources, and the Alexa platform, skills can rapidly pull and share information with a user anywhere within a wireless connection. That alone makes it a powerful resource for customer service, account lookup, news queries and more.

Development of a skill is different than a mobile app for several reasons. Not only is the

technology still new and in many ways untested, but there is no visual component. So, it needs to be very clear what a skill will do, why it will do it, and when it will be used. At the same time, the skill needs to be flexible enough to understand voice commands that may not come in exactly the same each time spoken. Specifically, there are three aspects of a voice command that need to be accounted for:

  • Intent – What is the user trying to do with this command? Are they playing music, setting a timer, adding something to a calendar, looking up information on the internet, or buying something? The words that trigger intent (e.g. play, buy, lookup, remind) are vital to understanding how a skill will be used.
  • Utterance – Those triggers to intent are important, but they aren’t universal. Colloquialisms, turns of phrase, and even children who may not speak clearly all have different ways of asking the same question. Some might ask. Most will demand. A skill developer must understand what these variances look like and how to account for them. Nothing is more frustrating than a skill that will only work with a specific phrase.
  • Slots – These are optional elements of the request that nonetheless influence what actually happens. Consider your music playlist. If you told Alexa to “play music” it would, but probably not from the app or the playlist you have in mind. A slot is needed to define what will be played. The same is true of calendar entries, reminders, or locations. Sometimes these can be made required and follow-up questions from the device are needed.

An Effective Skill is Intuitive but Simple

Skills are designed to solve problems in elegant ways. They shouldn’t be more complicated than pulling out a phone and tapping in the query. It should be faster, easier, and more compelling to use.

For that reason, VUI design is a significant challenge. Developers need to do more with less.

Part of that is keeping communication simple, requiring the bare minimum to make a request and anticipating errors. It also means building a robust workflow to identify and manage errors.

Nothing is more infuriating for a user than listening to Alexa repeatedly say “I’m not sure” to what they feel is a simple question. With a full understanding of what Alexa, Home, and Siri can support as platforms, developers are creating compelling new voice experiences, but we are still in the early stages. The same as mobile apps in the early 2010’s, voice skills are learning how to make lives easier, brands are scrambling for a foothold in a quickly growing landscape of user resources, and developers are finding creative new ways to do things. With the explosive growth of VUI-enabled devices expected in the coming years, skills are the next wave of computing interface and exciting things are likely to come from it.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Voice: The Most Accurate Way to Detect Human Emotion

Voice can expose even the best poker face. When we turn our complete attention to voice, it turns out that we can more accurately predict other people's emotions simply by listening.

Using AI to Boost Agent Performance and Customer Experience

AI allows companies to retrieve 100% of the audio from contact center calls without compromising quality and accuracy. With this knowledge, companies can improve CX, reduce effort and increase brand loyalty.

Video: How to Make Your VUI Inclusive

Grand Studio Lead Designer Diana Deibel discusses best practices for culturally inclusive access in voice UI design in this clip from her presentation at SpeechTEK 2019.

Voice Assistants Are Changing Shopping—Are You Ready?

Voice shopping will only become more prevalent going forward, and retailers must take an interest in the technology or risk being left behind

Five Tips for Managing Voice Data in the GDPR Era

As the UK 's Information Commissioner's Office orders the nation's tax authority to delete 5 million voice recordings under GDPR, we offer 5 tips for staying out of trouble with privacy regulators.