Exploring Unsecured S3 Buckets

Security misconfiguration is one of the most significant causes of data leakage. It's listed in the OWASP Top Ten at #6 and one of the clearest examples can be seen with unsecured S3 buckets.

Amazon's Simple Storage Service, S3 for short, is a web service that can store documents, data, media, pretty much anything and it's extremely cost effective with monthly storage prices ranging from $0.024 / GB to $0.0018 / GB. Data is stored in 'buckets' (think of them like folders) that can be configured to be either private for confidential/sensitive data or as public, which you might want to do if you were storing images for a website.

Today Amazon take great lengths to ensure people only configure S3 buckets to be public deliberately but historically it was much easier to configure public access by accident. To put this into context, recently 845GB of data was leaked from a series of niche dating apps, 5 million files were leaked relating to clients of a project management company and a core file that is part of Twilio's JavaScript SDK was modified by an unknown party because it was stored in an unsecured S3 bucket. There are more examples on this GitHub list of S3 leaks.

"These aren't subtle vulnerabilities. These are stupid design decisions made by engineers who had no idea how to create a secure system. And this, in a nutshell, is the problem with the Internet of Things."

Bruce Schneier on a recent IoT security leak, December-2019

It's gotten so bad that vigilante guardian angels are actively destroying unsecured databases to protect user privacy in automated 'meow attacks', names as such because the only thing the attacker leaves behind is the word "meow". If you'd like to understand more about how people find exposed S3 buckets there's a great post here from a developer who created a tool that finds and indexes exposed buckets so that anyone can search them: How to search for Open Amazon S3 buckets.

Curious to take a look at what's out there I took a look what - Grey Hat Warfare's Public S3 bucket search tool. It's not the only way to look for unsecured S3 buckets, there are numerous ways but for me it's a passing interest so something that's already built and setup suited my needs.

They've got four levels of user, unregistered, registered (both free), Premium (€25/mth) and Enterprise (€55/mth). For free you can have a bit of a play around but you miss some features and don't see all of the content that's been indexed, the Premium tier gives you access to all of the content and misses only one significant feature: the ability to use Regular Expressions to search.

The keyword search is quite straight-forward and takes the typical syntax of using a minus sign for keywords you wish to exclude and you can order by size as well. It gets really interesting when you start using the search modifiers, which are:

  • "Full Path" - searches not just the filename itself but also the directory structure in which the file is stored. This is a Premium only feature but is kind-of critical really if you're trying to deep-dive into the index.
  • "Treat as regex" - allows the use of Regular Expressions when searching which provides a much more powerful filter than you'd have otherwise. RegEx is an Enterprise feature however so it's only really available for those willing to stump up the cash.
  • "Filename Extensions" - this is really handy as you can choose just to search for filetypes you're interested in, for example .zip files or .bak files (typically backups). You can also use this to exclude uninteresting files as well (images, CSS files, etc. as there are billions).

So, what did I find?

There's a lot of data in there, and I did find quite a lot of disturbing personal data and confidential corporate files. Given that this data is already pre-indexed and there are lots of people searching it I doubt I'm finding anything that's 'breaking news' but if I do find that I can attribute to an organisation I will responsibly disclose it to those affected. I'm not in the business of naming and shaming anyone or re-publishing sensitive data so I'll leave it there but the types of data I found include:

  • Large backup files, some that appear to be from banking organisations
  • Microsoft SQL Server databases
  • Bank statements and identity proofs for job applicants

It's important to note that I don't really have a desire to see anyone's personal and confidential data either so I haven't downloaded/opened most of what I've found either but the process has certainly been an eye-opener. I'll leave it to you to explore for yourselves but it's frighteningly easy to stumble across enormous database files, backup files, PDFs of bank statements and other sensitive data.

The key take-away here - if you manage any AWS infrastructure, double-check your permissions and make sure you're not opening yourself to the risk of data leakage.