What is Information Leakage

This introduction to information leakage is a primer for what types of information you may encounter both from a security prospective and, within a security testing standpoint. Various methods of information leakage exist. These can be from documents posted online (PDF, Word Documents, etc.) to as benign as photos and even web page source code. This segment will ba an introduction to where you may look in order to discover information leakage and how you can limit the risks associated with this type of risk.

What is information leakage?

Information leakage can occur in almost any exposed data that you willingly or unwillingly post to the internet. It can take on multiple forms. The sole impact of information leakage is the process of unwittingly exposing information that can lead to larger information gathering or, expose your data, users, databases or other sensitive information to an outsider. The most benign of information can in the wrong hands be utilized against you and potentially lead to a compromise.

Limiting or sanitizing information before it is posted on the network, sent to a recipient who is engaging with your company (candidate for a job, web application, word documents, etc.) can pose a risk to your company. Being mindful of what types of information can be released will put you at an advantage if in the event you know where to look and, what tools are available to your systems, processes or documents in order to scrub sensitive information before it is released.

Word Document Leakage

Word documents are at times the starting point for many google dorks against companies. Why? Quite simple, a word document might not itself be something many people would think of, however they can contain a multitude of information that an attacker may deem as important. Even the smallest of word documents (Excel, OpenOffice, Libre Office, etc.) can in fact contain information you don't want to intentionally share.

In the case of google dorks, it is important to limit sensitive information. Verbiage such as "sensitive", "not for public", "Confidential" are all labels that an attacker may search with your company within a google search. For example, an attacker may search google for: site:company.com filetype:xlsx OR xls OR doc OR docx OR odt as a small sample to see what returns. To make matters worse, additional key words can be attached to this search to enrich the information that is returned. What information you ask? Well, a simple edit to the google search such as: site:company.com filetype:xlsx OR xls OR doc OR docx OR odt AND "not for public" OR "confidential" OR "sensitive" may be some of the markers that an attacker will include.

To make matters far more worse than what they seem, attackers at times can and will include key words such as but not limited to: firewall OR IDS or other labels to search for configuration items. How can this get even worse? Well, simply put, they can also search for configuration items within the grounds of .conf, config or configuration files with special extensions. Although we are discussing word documents and the latter can be applied to web applications, keep in mind text files (txt, log, .config, .conf) are all methods you can leak information. We will discuss text files more specifically for web applications and services later. However, the premise of this example applies. Especially if those labels are associated with a word document of any type!

Ok, so what is the deal with word documents anyway? For starters, at this point we are going to ignore the contents of the word document itself. You know, the very portion you use to convey information -- the document body itself. If we download a word document and it is not properly sanitized before being submitted (whether online, e-mail or other form) author information can include who authored the file, dept. and, within the meta-data information (if you know where to look) you might even expose some network paths such as printers. While a printer might not seem like a big deal consider this. With CVE's listing printers with exploit code to leverage those devices, what sensitive information may be lurking on your companies printer? Oh, it's internal only? How about the risk of exposing internal IP addresses so an attacker knows the IP convention being used internally? Is it a 192, 172, 10 range? Is this blocked at the firewall from entering your network? -=]

Text Files

Within this segment we will discuss text files. While this can be any text file, we will specifically call out to a few that come to mind. 1) Configuration files, 2) firewall / IDS or other security logs, 3) backup files (especially within Linux / Unix systems -- yes, that means you, too MacOS!).

The first culprit that we will discuss is the wonderful configuration file. For those of us within the Linux and Unix environments, we know all too well that configuration items within these systems will most always be a text file. Have a web application running? Yep! Config file (especially the ranks of apache service or, any service for that matter). Have a content management system running? Yep! Why is a configuration file dangerous if it is exposed? Especially to a search engine? It exposes your security posture -- it allows the attacker to see what your thoughts are regarding security or lack thereof. For simplicity, we will focus on the /var/www/html/ or /var/www/public_html location of most common web services.

With the introduction of CMS systems, you will commonly find configuration.php, wp-config.php, config.X or, *config*.asp|php|aspx. Pick your poison, I'm sure it exists somewhere. The level of hazard for allowing a file like this to exist to the public are: database access (user name / password), e-mails that might be exposed to the internet (think when configuration issues arise; e.g: This email address is being protected from spambots. You need JavaScript enabled to view it.), etc. If you aren't making the link to the e-mail issue, think about how someone can either spoof, send from the server or send e-mails to that address and potentially socially engineer administrators into doing magical things for them.

Although you might have your configuration items locked down and, that is a good thing! The one thing at times we don't realize is if you are remoted into a system or, editing a file with vi/vim those applications will in fact create a .filename.ext.swp file. If in the event the stars line up and the permissions are on point, attackers can load those file types within a search engine and search for them and, before you know it your configuration file(s) are jacked and on their way to being abused.

Considering that we've veered off course just a bit, the second issue you will cross is: what happens if your IDS / firewall logs are exposed to the internet? Well, as we've seen with the word documents, the dorking for these files are pretty much the same. Yet, the issue that persists is, why are firewall logs important? Aren't configuration items more important? Well, yes and -- no? The issue we have here is, we can see the types of blocks that are in place. What IP Addresses, what protocols are rejected, etc. If we know the stance they are taking, we know what we can include or, exclude from our attack toolkit. This way, our attacks are more specialized to the "defenses" (said in quotes because they are exposing some of it) the company is deploying in order to not get breached. In this scope, we simply read what attacks they've experienced and try different tactics -- this also saves an attacker time in trying things over time.

Lastly, we are going to discuss the wonderful backup file (not the kind that keeps you safe from something like ransomware). Backup files are, at times a blessing and a curse. Why? Because, .bak files are created when someone is editing a "critical" file and they want a restore point if the edits cause the server to tank. While this is a quick reversion method, just like the configuration items, it can be exposed if the permissions are changed and never reverted. Keep in mind specific file types when you are looking to lock down your systems and, .bak files will be one of them!

Some file types to consider: .sql, .tmp, .log, .txt, .ini, .bak, .swp, .conf, .config

PDF Files

The good ol' PDF. Where would most companies be today without you? Ah, Adobe! What a beautiful technology (definitely not in the dial-up days, that's for sure!). PDF is a decent file type, everything you need it portable and can be now-a-days viewed anywhere (mobile device, web browser, desktop -- you name it!) The one thing about PDF file is that they come at some of the same costs as word documents. The meta-data information within them can expose authors (social engineering) and, network paths (printers) that can further, expose your network addressing scheme or the printer name itself. Exploits galore anyone? The same issues that some what plague the document file types exist within this realm. We will discuss how to find these items and, how we can sanitize them later.

For the most part, like we did with the document and the text files, you will want to do the same against your company. Additionally, you want to add other labels such as but not limited to: confidential, not for public, etc. in order to see not only the authors and printers but, what information you truly are exposing to the internet that can be used against you. We will also include some of the file labels you can consider within your searches and, we will show you how to craft your own google dorks to test this for your company.

Web Applications

This segment alone needs it's own book, that's for sure! Web applications can leak data in a number of ways and, because they are so talkative it can be from any number of items. Those items can be: Error pages, invalid input, unsanitized input, unsanitized URL/URI input, and the worst offender of them all: developers! Yes, I said it. Developers at times will include information on a web location that should not be in it. If you right-click and view the source, you may see what I am talking about. Especially if you are moving things between dev and prod. If you're not sanitizing this information, you will effectively expose your systems by either: 1) Providing API keys, 2) Usernames / Passwords; to an attacker in which he or she may leverage for additional access to your organization.

In this scope within web applications what are the things we need to be concerned with? Well, some of the things we can highlight are as follows:

Web Apps Issue Risks Notes
Java / HTML Comments <!--COMMENT-->
/* Multi-Line Comment */
// single line js comment
Exposure of notes, updates, versions, config(s)
Potential database names
potential user name / password
API keys
Use RegEx to find specific areas and files (tools to come)
Server Side Code:
PHP, ASP/X
/* Multi-line comment */
// single-line comment
REM asp comment
' asp comment
<@-- asp comment--@>
Similar to above. Similar to Above
Input Errors
SQL
HTML
PHP
ASP/X
Input errors can exist in a number of locations.
1) Within URLs; index.php?&id=
2) Web forms
Input fields that take values can expose the version numbers, information errors about the server or technologies and if the application is vulnerable to injection.
Additional information, if you are not leaking information and Blind SQL Injection is present, attackers can do timed based attacks to see what may or may or may not work.
Validating web applications and input, fuzzing, attacks with tools like: WebScarab and Burp.

Binaries and Source Code

While to the untrained eye, any time you see a .o, .exe or Mach-O file, we normally think that it's unreadable and nothing can be obtained from the files. However, you would be hard-pressed and be wrong. Ideally, even things in clear-text with an application / binary can be discovered or reverse engineered. Although the reverse engineering aspect is outside the scope of this documentation, we will tell you that a simple strings command that you can see in our Malware Analysis Classes under the Windows - Static Analysis with Strings or, Linux Static Malware Analysis with Strings, Hexedit and Head can give you a better idea of what may be inside of a binary.

If you have a need to store passwords or, user names within a binary you should be encrypting that information to store it, and making it nearly impossible to get to. For instance, a large password that is encrypted and then broken down into multiple string labels or, chars then assembled in an order. This may make it a bit more difficult to pull the actual password. However, if you are using clear-text protocols -- this will defeat the purpose altogether.

The last thing we should point out about binaries is that if in the event they are specific to your environment, there are a few things you can glean from a binary. 1) Network addresses, 2) Protocols in use, 3) username / password, 4) server locations, 5) language and potential compiler / version. As we've seen in other documents, version information can be abused in the sense that we can search CVE databases and find exploits for a given application or version to leverage against a company.

Dangers of Services Like VirusTotal

Login Form