Make
This generates or updates a dataset archive. This command should be used after files were added or deleted from the archive.
The process can also be used to turn any existing directory or remote location into a leakrfc
dataset.
Reference
Make or update a leakrfc dataset and check integrity
make_dataset(dataset, check_integrity=True, cleanup=True, metadata_only=False)
Make or update a leakrfc dataset and optionally check its integrity.
Per default, this iterates through all the source files and creates (or updates) file metadata json files.
At the end, dataset statistics and documents.csv (and their diff) are created.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
DatasetArchive
|
leakrfc Dataset instance |
required |
check_integrity
|
bool | None
|
Check checksum for each file (logs mismatches) |
True
|
cleanup
|
bool | None
|
When checking integrity, fix mismatched metadata and delete unreferenced metadata files |
True
|
metadata_only
|
bool | None
|
Only iterate through existing metadata files, don't look for new source files |
False
|