Limitations of Data Loss Prevention Solutions

Data Loss Prevention (DLP) solutions and their Limitations

Have you ever attempted to balance rocks?

Stacking rocks on top of each other requires a great deal of skill, patience, and it is just a matter of time before the pile will collapse. The slightest gust of wind or an incorrect movement of the hand when placing one rock on top of another, and you will have to start all over again.

You are probably wondering what stacking rocks has to do with Cybersecurity and more specifically Data Loss Prevention. Well, they are very similar in nature!
Cybersecurity requires specialized knowledge of computer science (kernel level reverse engineering), in-depth knowledge of potential exfiltration vectors, and a disciplined software development process to create the DLP solutions. Even with all the above, one knows the next version of the Operating System (OS) or the next version of an Application will likely break the solution.

Imagine an endpoint agent which needs to guard against potential data leaks from many different egress channels and that each channel presents a unique way to leak the data.

Endpoint Agent guarding against potential data leaks!

Egress channels can take many forms as shown in the diagram below. Also, over time, OS vendors have introduced new channels, such as AirDrop and Bluetooth. If you are using a legacy DLP product, check their supported list of channels. Do they protect data exfiltration over AirDrop or Bluetooth channels?

The egress channels are not just the physical outlets from the machine, but also the application protocols used to transfer data over them. As an example, the network port (Wi-Fi or Ethernet port), essentially represents two egress channels, but the issue is compounded by the fact that higher level application protocols, which use these ports, can be 10x. This results in significantly more egress channels. Each one needs to be manned 24/7 to prevent escape.

While one can try to normalize the data from these channels to detect data theft, the protocols often make it impossible to have full visibility into all the data.

Data Rules

Let us discuss Data rules. DLP products allow you to define rules around the specifics of the data which you are trying to protect. This can range anywhere from Personal Identifiable Information (PII) to Personal Health Information (PHI), Payment Card Industry Data Security Standards (PCI -DSS), and custom data. The essence of DLP is to know the data you are trying to protect from leaking and to detect when it is being leaked. It can be in the form of a string such as a credit card number, social security number or a hash value of the data.

Say you define a custom rule to protect a document which has the word “keepmesafe”. When seen by a DLP agent, it is an indicator that a document containing such a word cannot leave its protected perimeter. There are many subtle ways to modify this data before sending it out and then to re-construct the data outside in the desired form.

How would one change the data so that it is undetectable by a DLP agent? I will leave this to your imagination, but it does not take much to change “keepmesafe” to “keep*me*safe” or “keep-me-safe”, possibly even breaking it up into separate words before sending it.

OS/Application Impact

Apple, in every release of macOS, has introduced changes to the operating system and their applications, such as Apple Mail and Safari. These changes have resulted in breaking DLP functionality. The core technology used by major DLP products is to inject code into an application, which then allows the endpoint agent to monitor the traffic. Apple introduced System Integrity Protection (SIP), which prevents code injection.

Both Microsoft and Apple are sandboxing their applications to prevent code injection or any other form of code modifications. Most recently Microsoft announced that they will disable injection based Outlook Mail plugins (Disable MS Outlook Injection). This impacts DLP security vendors as they will not have visibility into Outlook Email channel.

What are the security vendors doing to get around such limitations?

They try to find other ways to get visibility into the egress channels. When Apple blocked code injection into Safari as part of the Mojave release, it was a major blow to the security vendors as they lost visibility into all egress data from the browser. There are a number of ways to solve this but only partially. Some of these ways are to (a) create a browser extension which can capture POST requests or to (b) perform a Man-In-The-Middle (MITM).

(a) Browser extensions present significant limitations. Not all POST data is visible to the extension, and they often depend on the APIs supported by the browser. Additionally, browser extensions can only function under certain conditions and often break compatibility due to other browser extensions. Management of browser extensions becomes a challenge if you are using an enterprise product such as GSuite, because the security vendor installs the extension outside the control of GSuite. Most importantly, browser extensions are rendered useless in private/incognito mode. There are other limitations, such as how traffic from web sockets is not visible to the extension due to API limitations.
(b) The following diagram depicts (MITM). Basically, the agent breaks the original connections and acts as the middleman to relay traffic between the endpoint application and the destination.

While this provides visibility into all the SSL egress traffic, it has also a significant downfall. Your data privacy is compromised. The DLP agent is potentially listening in on your “private” connection.

What about cloud storage applications?

There are two forms of egress channels for cloud storage applications (i.e. OneDrive, Google Drive, Dropbox, Box, and others) : 1) access via the browser; and 2) access via a native application. Both present challenges for legacy DLP solutions. We have already discussed some of the browser challenges, but native apps present a different challenge; they use protocols and features that often completely sidestep the DLP agent, such as (Document collaboration).

What about cloud email services such as Gmail or Yahoo Mail?

Data can be leaked either in the email body and/or as an attachment to the email. Since this channel uses a browser, the DLP vendor must support the browser and its specific version. Often, even when the DLP vendor claim support for a given browser, a new version of the browser may break their APIs used to intercept the traffic. Browser vendors such as Google, Mozilla and Safari release updates every 4-6 weeks. The rate of potential breakage is high.

The DLP agent can often handle the content within the email body well, but attachments present a different problem. If the DLP agent detects a false positive on a legitimate email with attachments, the user will start seeing popups by the DLP agent blocking the email. This is because email providers, such as Gmail, will continually retry to sync the email. Imagine the impact on the productivity in a Fortune 500 enterprise environment.

Conclusions

Until now, DLP vendors have relied on reverse engineering and proprietary API access to get visibility into data traffic. OS vendors need to start playing a significant part in enabling access to documented APIs, which do not break the DLP on every new release of their OS. Apple has already begun this journey through their post Catalina support with the Endpoint Security & System Extensions. Application vendors will need to play a bigger role in supporting documented APIs. As you can imagine, the timeline for achieving collaboration between OS vendors, Application vendors, and Cybersecurity vendors is a long way off.

In the near term, instead of depending solely on the DLP vendor to provide an audit trail for the data leak, enterprises can deploy other forms of security to get visibility into their endpoints. There are some serious open source contenders (osquery), which allow you to get full visibility into your endpoints and specifically allow you to get an audit trail. As an example, osquery supports File Integrity Monitoring (FIM), which combined with its hardware monitoring capability can be used to capture USB file transfers. Many large enterprises (Netflix, Facebook, Airbnb, and Akamai) are using osquery for monitoring their fleet. It is also the core for several other security products provided by major vendors in the EDR space.

In conclusion, know the limitations of the product you are using and carefully inspect the support matrix of the product. There is nothing worse than a false sense of security.