Where Speech Is Going and Why Security Will Be Integral in the Coming Years
Ten years ago, the most exciting thing about speech technology was the ability to dictate text without the need of a secretary to transcribe it. But even then, the results could be varied. You undertook the task fully aware that you'd spend as much time proofing and editing as if you'd just typed the document in the first place. And if you happened to have any kind of accent, you really hadn't a hope without significant training.
Since then, voice technology has become more mature and more mainstream, but it is still relatively limited in its application and its availability to ordinary developers. We have seen vast improvements in accuracy coming out of research done by the likes of Facebook and Google, but the cost for a developer to access more than small amounts of transcription remains very high.
And, crucially, it is not always possible to manage the security of speech that is being processed in the cloud:. This has even led to some companies, like Google, offering differential pricing if you allow them to use your data for training.But as a consumer, how much protection would this offer you if a cash-strapped start-up used the cheapest option without your knowledge?
So, what can we expect to see from speech tech in the short to medium term? And why does security matter?
The potential for the application of speech technology has broadened significantly in the past few years, particularly in the wake of the COVID-19 pandemic. Here are three key areas likely to experience the most significant uptake of the tech in the short to medium term:
VR/AR meeting rooms and gamification
Powered partly by COVID-19, virtual reality, augmented reality, and workplace gamification have evolved in the past year. No longer hypothetical scenarios, they have all been adopted as a working reality to various degrees. With homeworking becoming the norm, businesses have sought new ways to enhance employee engagement, host meetings, and bring virtual offices to life. As a tool, speech tech plays a role in this, primarily as a note-taking device, removing the pressure from members of staff. It simplifies and takes away a potential pressure point. It also allows for companies to monitor their staff: This can be seen as benign, to ensure that employees are healthy and not suffering from work-related stress, but clearly can also, in the wrong hands, be seen as a Big Brother monitor of performance.
Fraud costs the insurance industry around 1.3 billion pounds in the United Kingdom alone, according to ABI.org. And it's a figure that is believed to have increased throughout the pandemic. Insurers are employing a whole range of strategies to reduce and prevent fraud, including in-depth risk assessment, data sharing, and analytics. But one of the most exciting areas is speech recognition technology. Fraudsters often share a distinct set of verbal characteristics that are really difficult for busy call centre operators to detect, especially when callers are handled by multiple operators throughout their claim. Speech recognition and behavioural analytics can take over the burden of examining what is being said both after the event, and increasingly in real time, to enable genuine claims to be processed quickly and riskier claims to be flagged reliably and early in the process.
Perhaps in a few years we will have fully automated treatment provided by speech-driven avatars, but that is probably still in the realms of science fiction. However, in the near term, the massive increase in telehealth driven by COVID-19 means that there are real opportunities to streamline healthcare provision. One simple approach is to capture all of the data that comes out of a consultation, for example, treatments and symptoms. Not only can these be used to populate patient information platforms, they can also be used to speed up the delivery of treatment, and (one of great concern to insurance companies) billing. However, over time, the increased sophistication of behavioral systems will help doctors identify vulnerable patients (eg dementia sufferers who might not be able to articulate their symptoms or understand their treatments), or those suffering with undiagnosed mental health conditions
Why Security Must Play an Integral Role in the Future of Speech
We're in the midst of a security revolution. The public is becoming more savvy about what they should and shouldn' put online. But while most of us understand that very personal information, like bank details andhealth records,; shouldn't be given away freely, there is still a lack of awareness about how data is gathered and used.
We have already seen scandals involving the (mis)use of voice information, such as Alexa voice data being retained even when it was supposed to be deleted, and as pointed out above, cost-conscious developers could end up using cheap cloud providers because of the cost of speech recognition technology.
And let's face it, for an industry already worth billions of dollars annually, dedicated to compiling personal data, voice data could prove the richest seam yet, with a whole new layer of nuance that can be gathered from it.
This means we have to design privacy into speech recognition from the ground up. Edge-based recognition takes us some way down that road, but that is impractical for a lot of IoT applications that do not have the computational power (or storage space) to handle this. So we have to look to the next generation of confidential computing (such as AMD Secure Encrypted Virtualization) as a starting point to allow us to take highly confidential data and process it in the cloud, while at the same time using other encryption schemes or techniques, such as cryptonets, to ensure we build a completely encrypted end-to-end pipeline for sensitive data processing.
Nigel Cannings is founder and chief technology officer of Intelligent Voice. He has more than 25 years' experience in both law and technology and is a regular speaker at industry events.