• December 15, 2022
  • Q & A

Q&A: Nigel Cannings, founder and chief technology officer of Intelligent Voice

Article Featured Image

In your opinion, what has been the best/most innovative technology in the speech tech space in the past year?

OpenAI continues to astonish the market with a range of advances to existing technologies. While GPT3 and GPT 3.5 have hit the headlines (the latter, also known as ChatGPT, being capable of generating article-length content), the release of Whisper has caught the attention of the speech community. Whisper is probably the most capable speech recognition algorithm available today, judging by the breadth of language cover and accuracy. While not state-of-the-art in some languages, it provides a really interesting benchmark of where current transformer technologies can go. It is certainly not ready for a production-type deployment, but it is of enormous interest to researchers.

Have any changes in regulations (such as the online safety bill) impacted how companies can use speech tech or AI?

The U.K. Information Commissioner's Office has come in a bit from left field in late October with its analysis of biometric technology, into which it has lumped a wide variety of technology from gaze tracking to wearables to sentiment analysis. The analysis was scathing, stating "Developments in the biometrics and emotion AI market are immature. They may not work yet, or indeed ever," which damns a huge swath of technologies, including the widely used my-voice-is-my-password technologies used by banks, and even, by extension, the entire U.S. immigration system, which relies on facial biometrics to streamline entry via U.S. airports to dramatically reduce queues. The ICO promises further guidance in the spring, but notwithstanding its stated position as an advocate for genuine innovation and business growth, the language so far is hostile toward an industry that has demonstrated genuine advances.

What do you see as the biggest potential issue that speech tech can solve next year?

Consumer speech tech is in turmoil at the moment, with Google and Amazon laying off large swaths of employees, and an analysis of Amazon's Alexa business showing that it will lose $10 billion this year. However, one bright spot might be the ability to reduce Zoom fatigue. Innovations in the past 12 months have allowed us for the first time to accurately summarize whole Zoom and Teams meetings, making it easier for people to engage properly in long meetings without the need to keep taking notes or to not engage at all and then just read the summary.

What are your predictions for AI innovations in the next year?

There will be a lot more of the same, that is for certain. There is something of an arms race going on in the Large Language Model space, the latest iteration of this being the release of ChatGPT (or GPT-3.5). However, this is all more of the same. True innovation means we have to look at different types of networks, and I see that the next stage of evolution is in biologically inspired networks, like spiking neural networks. These are a form of self-organizing networks that solve problems in an organic fashion. This could be as simple as noise reduction in an audio signal, to solving the infamous travelling salesman problem. By using fewer but more complex neurons in an artificial network, we can break away from needing networks trained on vast amounts of data and maybe pave the way toward a more general form of artificial intelligence. If we combine this with current technologies, like transformer networks, we can take the best of the current range of knowledge networks but give them a little more actual intelligence.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues