Skip to content

Juditha

Juditha

Juditha

Intro Intro
On this page
Quickstart
Usage
Usage
- CLI
- Python
Load data
Load data
- Names
- Entities
Extra features
Extra features
- Extract
- Percolate
Reference
Reference
- CLI
- Settings
- Benchmark

Juditha

A super-fast in-process lookup service for canonical names, backed by tantivy.

juditha exists to tame the noise that follows from Named Entity Recognition: given a huge list of known names (company registries, persons of interest, sanctions lists), it tells you whether a span produced by your NER pipeline corresponds to one of them, even when the casing, accents, token order, or spelling differs.

The implementation uses a pre-populated names database and index. Data is either FollowTheMoney entities or simply list of names.

What you can do with it

Validate and canonicalise NER spans against a known-name corpus (Quickstart, Usage).
Load names from a flat list, FollowTheMoney entities, or a nomenklatura dataset / catalog (Load data).
Extract every known-name mention from a fulltext document, either via an Aho-Corasick automaton or via percolation (reverse search of the names index).

Where to go next

Start with the Quickstart.
Usage / CLI and the full CLI reference.
Usage / Python.

The name

Juditha Dommer was the daughter of a coppersmith and raised seven children, while her husband Johann Pachelbel wrote a canon.

Versioning

To mark the compatibility with followthemoney, juditha follows the same major version, which is currently 4.x.x.

License and copyright

juditha, (C) 2024 investigativedata.io. (C) 2025, 2026 Data and Research Center – DARC. Licensed under AGPLv3 or later. See NOTICE and LICENSE.