CLI reference

Complete reference for every juditha subcommand. For high-level workflow, see Usage / CLI.

Every command honours the global JUDITHA_* environment variables documented in Usage / CLI / Environment variables.

Top-level

juditha [OPTIONS] COMMAND [ARGS]...

Option	Description
`--version`	Print the installed version and exit.
`--settings`	Print the resolved settings (URI, thresholds, …) and exit.
`--install-completion`	Install shell completion for the current shell.
`--show-completion`	Print shell completion script to stdout.
`--help`	Show top-level help and exit.

Ingest FollowTheMoney entities (newline-delimited JSON) into the LevelDB aggregator.

juditha load-entities [OPTIONS]

Option	Default	Description
`-i`	`-`	Input URI (file, HTTP, S3, stdin).

Does not build the index. Run juditha build afterwards.

Ingest a flat list of names (one per line). Each line is wrapped into a LegalEntity proxy.

juditha load-names [OPTIONS]

Option	Default	Description
`-i`	`-`	Input URI (file, HTTP, S3, stdin).

Does not build the index. Run juditha build afterwards.

Ingest a nomenklatura-style dataset (a JSON config that points at one or more entity resources).

juditha load-dataset [OPTIONS]

Option	Default	Description
`-i`	`-`	Dataset URI.

Does not build the index. Run juditha build afterwards.

Ingest a nomenklatura-style catalog (a JSON config that points at multiple datasets). Walks every dataset and loads it.

juditha load-catalog [OPTIONS]

Option	Default	Description
`-i`	`-`	Catalog URI.

Does not build the index. Run juditha build afterwards.

Rebuild the tantivy index and the Aho-Corasick extractor from the aggregator. Deletes the existing index directory first; safe to rerun.

juditha build [OPTIONS]

No options beyond --help.

Single-best-match search of the names index.

juditha lookup [OPTIONS] VALUE

Argument / Option	Default	Description
`VALUE`	(required)	The query string.
`--threshold`	`0.97`	Minimum Jaro similarity for a match. Lower it for noisier inputs.

Prints the Result (rich-formatted) on a hit, not found otherwise.

Stream every aggregated cluster in the LevelDB aggregator as newline-delimited JSON. Useful for inspecting / dumping the loaded corpus.

juditha iterate [OPTIONS]

Option	Default	Description
`-o`	`-`	Output URI (file, HTTP, S3, stdout).

Run the Aho-Corasick automaton over a fulltext and emit one JSON Mention per line. See Extract.

juditha extract [OPTIONS]

Option	Default	Description
`-i`	`-`	Input URI (the document to scan).
`-o`	`-`	Output URI (the mention stream).

Reverse-search the names index against a fulltext via tantivy phrase queries. Output shape matches extract. See Percolate.

juditha percolate [OPTIONS]

Option	Default	Description
`-i`	`-`	Input URI (the document to scan).
`-o`	`-`	Output URI (the mention stream).
`--slop`	`0`	Allowed intervening tokens between name parts. `0` = exact phrase.