Aleph
Sync a leakrfc dataset into an Aleph instance. This uses alephclient, so the configured ALEPHCLIENT_API_KEY
needs to have the appropriate permissions.
Collections will be created if they don't exist and their metadata will be updated (this can be disabled via --no-metadata
). The Aleph collections foreign id can be set via --foreign-id
and defaults to the leakrfc dataset name.
As long as using the global cache (environment CACHE=1
, default) only new documents are synced. The cache handles multiple Aleph instances and keeps track of the individual status for each of them.
Aleph api configuration can as well set via command line:
Sync documents into a subfolder that will be created if it doesn't exist:
Reference
Sync Aleph collections into leakrfc or vice versa via alephclient
sync_to_aleph(dataset, host, api_key, prefix=None, foreign_id=None, metadata=True)
Incrementally sync a leakrfc dataset into an Aleph instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
DatasetArchive
|
leakrfc Dataset instance |
required |
host
|
str | None
|
Aleph host (can be set via env |
required |
api_key
|
str | None
|
Aleph api key (can be set via env |
required |
prefix
|
str | None
|
Add a folder prefix to import documents into |
None
|
foreign_id
|
str | None
|
Aleph collection foreign_id (if different from leakrfc dataset name) |
None
|
metadata
|
bool | None
|
Update Aleph collection metadata |
True
|