DLP
This chapter will cover the basics on how to implemented our Data Loss Prevention (DLP) engine in your organization to comply with your DLP policy requirements. Our engine operate on a level that is called "data in motion", that is on data (e-mail) that is in-transit between two endpoints (clients and/or mail servers). It features different techniques in order to detect policy violations (all covered below). Once a violation is detected, the administrator may choose an appropriate action such a quarantine, log or reject the message.
Contents |
Implementation
The Data Loss Prevention (DLP) engine is implemented by a process called maildlpd. It is used from within the Content Flow. And it behaves very much like an anti-virus engine in the sense that it operation on patterns (user-defined), unpacks compressed actives, searches for violations and once done returns them back to the Content Flow so that an action may be taken.
- Different part of the organization may have different policies.
- There is a basic module in the Content Flow that may be used, and advanced users may choose to call the HSL function ScanDLP() themselves.
- It should primarily be used to detect outbound violations.
Content Scanning
"Content" scanning allows for user-defined rules (regular expressions) to detect well known patterns such as, credit card numbers or "secret" project names. This is useful when you know that no such information should leave the organization. Matching is done case-insensitive.
Example
This example may detect credit card numbers.
\b4\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b6011\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b4\d{3}\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b3(?:0[0-5]|6\d|8\d)\d\s?-?\s?\d{6}\s?-?\s?\d{4}\b
\b(?:213\s?-?\s?1|180\s?-?\s?0)\d{3}\s?-?\s?(?:\d{4}\s?-?\s?){2}\b
\b3[47]\d{2}\s?-?\s?\d{6}\s?-?\s?\d{5}\b
\b5[1-5]\d{2}\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b35\d{2}\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
File Type
"File Name" and "MIME Type" detection may not be a true DLP feature, but for example a software company may have filter to detect source code files (text/x-c or .cpp), and quarantine them until an administration/senior developer has cleared the intent.
Example
Our engine implements a technology called "magic", it searches the beginning of a file to detect the appropriate MIME type for that file. Tools to detect file types (regardless of extension) are available in almost every unix installation and is called "file". To the detect the MIME type of a file run file -mime-type filename.ext. The result shown are what should be used in your rules.
# file --mime-type main.cpp main.cpp: text/x-c
add ^text/x-c$ on a single line. Matching is done as regular expressions, therefore the start ^ and end $ should be marked ^text/x-c$. As for file extension you should escape . and mark the end as well \.cpp$. This is semi-important so that the filter .cpp doesn't match a filename like acpp-report.doc. Matching is done case-insensitive.
Document Fingerprinting
"MD5 Fingerprint" and "SHA1 Fingerprint" allows for exact file matching, it should primary be used on files that is static by nature, such a images, binaries etc. because even the smallest change will alter the document fingerprint. MD5 and SHA1 are both one-way hash algorithms, they take any data or document as input and outputs a string of text unique to that document. They are (for this purpose) equally good.
Example
Tools to generate these hashes are available on all operating system. In Linux these tools are called "md5sum" and "sha1sum".
# md5sum document.ext b07a682853e7bbafea145fa189dc7444 document.ext # sha1sum document.ext 0cd377adf7ebbef00d7e4b0b388c05e21cfda9c7 document.ext
add b07a682853e7bbafea145fa189dc7444 on a single line on a MD5 Fingerprint rule.
Step by step
This guide does not cover how to setup a Quarantine.
You may now test the rule by sending a ZIP file containing (c/cpp) source code files. The message should now rest in the quarantine, and may be released or deleted.
Oct 2 17:04:57 (warning) maildlpd: [67332] [a7f9f34e-ed18-11e0-8c7e-000c2902e326] Attachment mime-type 'text/x-c' violates DLP policy 'SOURCECODE' Oct 2 17:04:57 (info) maildlpd: [67332] [a7f9f34e-ed18-11e0-8c7e-000c2902e326] Found DLP violation SOURCECODE Oct 2 17:04:58 (info) mailscand: [67332] [a7f9f34e-ed18-11e0-8c7e-000c2902e326] Message was accepted for <user@halon.se> (quarantined)