Quickstart
Install
Requires python 3.11 or later.
Build a dataset
leakrfc
stores metadata for the files that then refers to the actual source files.
For example, take this public file listing archive: https://data.ddosecrets.com/Patriot%20Front/patriotfront/2021/Organizational%20Documents%20and%20Notes/
Crawl these documents into a dataset:
leakrfc -d ddos_patriotfront crawl "https://data.ddosecrets.com/Patriot%20Front/patriotfront/2021/Organizational%20Documents%20and%20Notes"
The metadata and source files are now stored in the archive (./data
by default).
Inspect files and archive
All metadata and other information lives in the ddos_patriotfront/.leakrfc
subdirectory. Files are keyed and accessible by their (relative) path.
Retrieve file metadata:
Retrieve actual file blob:
Show all files metadata present in the dataset archive:
Show only the file paths:
Show only the checksums (sha1 by default):
Tracking changes
The make
command (re-)generates the datasets metadata.
Delete a file:
Now regenerate:
The result output will indicate that 1 file was deleted.
configure storage
storage_config:
uri: s3://my_bucket
backend_kwargs:
endpoint_url: https://s3.example.org
aws_access_key_id: ${AWS_ACCESS_KEY_ID}
aws_secret_access_key: ${AWS_SECRET_ACCESS_KEY}
dataset config.yml
Follows the specification in ftmq.model.Dataset
:
name: my_dataset # also known as "foreign_id"
title: An awesome leak
description: >
Incidunt eum asperiores impedit. Nobis est dolorem et quam autem quo. Name
labore sequi maxime qui non voluptatum ducimus voluptas. Exercitationem enim
similique asperiores quod et quae maiores. Et accusantium accusantium error
et alias aut omnis eos. Omnis porro sit eum et.
updated_at: 2024-09-25
index_url: https://static.example.org/my_dataset/index.json
# add more metadata
leakrfc: # see above