Speech Applications Security: Protecting Your Business and Your Customers From Hackers
As speech applications have become more complex, they will become increasingly attractive targets for hackers. Yet, when the topic of security comes up today with regards to speech applications, the conversation generally gravitates toward speaker verification as if access by voice to an application were the only area of possible compromise. Wrong!
Speech and text-based systems are not identical when it comes to the issue of security. With personal information collected (e.g., names, account numbers, social security numbers) and transactions completed with speech automation, new opportunities abound for security breach within a speech recognition system's infrastructure. Those considering or currently managing a speech system must be aware of the security nuances associated with speech. VPN's for remote connectivity, database access protocols, and other fairly standard IT security technologies and procedures will simply not address a number of speech specific points of risk. Three common risk points are:
Speech introduces a new paradigm for IT professionals who have generally not been exposed to concepts such as tuning, grammar building, and acoustic model modification. The foundation of many of these efforts will be recorded or transcribed to log files containing potentially sensitive information. In many cases, companies use third party voice user interface designers and speech scientists for expertise in optimizing a system. Yet, many of these companies are not aware of what is being logged in their systems and how someone outside or even within their organization might access it.
Security Questions to Ask When Choosing a Speech Vendor
Are background checks done on employees?
Are employees cleared?
Are separations of duties employed where warranted?
Do employees sign a confidentiality agreement?
Is a dedicated security team in place?
Are the security policies of the provider documented?
How often are their policies updated?
Are international framework standards used to guide the evolution of the policies?
What governance procedures are in place that control policy updates?
Asset Classification, Inventory, Patch Management, and Vulnerability Scans
Does the provider have a documented asset inventory and patch management?
How are patches tested before putting the application in production?
How quickly are security-relevant patches tested and applied?
Are audit procedures and tools in place to prove historical compliance with policy?
How often does the provider conduct vulnerability scans and with what tools?
Does the provider have secure baseline configuration standards?
Does the provider classify assets according to the value of the information they process or store?
Disaster Recovery and Business Continuity Planning
Does the provider have a formally documented change management procedure?
Are changes peer-reviewed?
How much advance notice is given for changes?
How are emergency break fixes handled?
What governance is applied to changes?
What cryptographic controls are in place?
Are Message Authentication Codes in place?
How is Key Management handled?
Does the provider submit to penetration tests and ethical hacks by reputable third-party security experts?
How quickly does the provider correct deficiencies found during penetration tests?
Are the tests conducted in an environment that mirrors production?
Does the provider have sufficient operational controls in place for the rapid detection and prevention of denial of service attacks?
Does the provider have network Intrusion Detection System (IDS) and Intrusion Protection System (IPS) solutions?
Does the provider have host IPS protections?
There are other more insidious areas where information such a PIN or Social Security numbers could be retrieved if secure access to a speech system were compromised. DTMF input of a PIN number might seem to be a safe alternative, but each key needs to be "recognized" by the system. Tracks are left in most systems when log files identify the keyed units, leaving a potential path of vulnerability.
Another potential problem along the same lines can arise when concatenated audio is used for confirmation of dynamic data spoken by a caller. Similar in scope to DTMF recognition, concatenated audio is stored as units of recorded speech. During playback these units are requested, assembled, and output over the telephone. Not only is the information vulnerable over the network, but also the speech system itself creates a log of which audio units were requested, presenting an additional security risk.
- Distributed Components
One of the benefits of a VoiceXML system is that it offers the ability to distribute components of the solution for performance, manageability or departmental control. There is always a chance, however, that sensitive data in audio, text, or log files are traversing local area networks in unencrypted formats. In many cases the answer is not simply to apply encryption, as the cost in terms of performance could adversely affect what a caller hears.
Callers start to notice delays when unexpected silence becomes greater than 250 milliseconds (depending upon context). Text-based systems have a higher level of user tolerance for delays because screens can be gradually filled with text and graphics without the customer losing patience. But dead air in a speech system means a hang-up. The awareness of this issue may be low for IT or security personnel who might simply insist on applying the same technology solutions to speech that they have used for text-based solutions in the past, without considering the impact on the caller.
Some Protection Guidelines
Computer telephony integration (CTI) can make a transfer from the speech system to an agent more seamless by passing information retrieved from the caller to the agent. However, transition points between the speech system and the agent desktop present the opportunity for unauthorized access to sensitive information. It's not simply technology that either illuminates or provides a solution to a possible risk. Transition points create questions about who's responsible for security; managers of the speech system or the agent pool? Encrypting data transfers between a speech solution and agent desktop is a good solution. Finger pointing between managers of each group make it irrelevant if it never gets done until a breach is discovered. A comprehensive approach to securing the entire caller experience, not just a technology component, is required.
Government and Business Regulations
It certainly makes sense to be aware of areas where a speech recognition solution might present a security risk. Working with a speech hosting provider with known security expertise or with in-house security experts managing a premise-based solution will help to protect customer data. However, in a growing list of cases this is not simply good business practice, it's becoming the law.
Some examples include the Gramm-Leach-Bliley Act, which applies to financial institutions and requires a comprehensive information security program. The internal control requirements of Sarbanes-Oxley are being applied to security issues. In the private sector, the Payment Card Industry (PCI) Data Security Standard has direct applicability for organizations taking credit card information. There are a number of others for specific industry segments that insist on maintaining security, and could certainly be interpreted to apply to any speech system.
Securing Sensitive Information Within a Speech System
Securing a speech system involves much more than simply being concerned with network connections. Awareness of some key components of a speech solution is the first step to take for IT/Security Managers who previously have not been exposed to DTMF input, concatenated audio confirmations, and data collected for "tuning." Organizations deploying a speech system either on-premise or as a hosted speech solution must be aware of the risks that speech introduces.
Securing sensitive data within a speech system includes considering the management team, the established processes and the audio and text data being stored, logged, or transmitted. At the highest level there are three broad categories to consider:
People - Is there a dedicated security manager in place that is familiar with speech recognition? Is there a process for background checks on all necessary personnel? Is there separation of duties according to security best practices? Etc.
Process - Is a written security policy in place? Are plans established for the event of a security breach? Are security audit and patch standards established? Etc.
Technology - Is speech logging suppression available? Is LAN and WAN encrypted and mutually authenticated? Do security standards consider the link between speech and agents' systems? Etc.
Although there are many similarities between speech and text-only systems from the perspective of network security, there are also significant differences. An IT or security professional may be quite familiar with the physical connections of the network and telephony hardware, yet not know how audio is maintained or converted or transcribed to text within a speech system. Overlooking components that are fundamental to speech can create opportunities for hackers who suspect that the speech system is vulnerable.
A combination of technology and procedures must be employed to support speech solutions within a secure environment. These high-level goals must be balanced with applicable security regulations and best practices to ensure a secure speech application environment.
Securing a speech system requires knowledge of where the potential for a security breach exists and implementing solutions to thwart unauthorized access by either external or internal threats. Organizations considering a speech solution must evaluate security as a top priority with the understanding that there are issues specific to speech alone that must be addressed.