Investigraph

investigraph is a framework for building datasets for FollowTheMoney data.

About

investigraph is an ETL framework that allows research teams to build their own data catalog themselves as easily and reproducible as possible. The investigraph framework provides logic for extracting, transforming and loading any data source into followthemoney entities.

For most common data source formats, this process is possible without programming knowledge, by means of an easy yaml specification interface. However, if it turns out that a specific dataset can not be parsed with the built-in logic, a developer can plug in custom python scripts at specific places within the pipeline to fulfill even the most edge cases in data processing.

Features

Create datasets in the format for OpenAleph
Cached remote source fetching and archiving of sources
Data extraction based on pandas (runpandarun)
Data patching via datapatch
Transforming data records into followthemoney entities via yaml mappings
Loading result data into a various range of targets, including cloud storage (via fsspec) or FtM stores (via ftmq)
"Bring your own code" and plug it in into the right stage if the built-in logic doesn't fit your use case

Value for investigative research teams

standardized process to convert different data sets into a uniform and thus comparable format
control of this process for non-technical people
Creation of an own (internal) data catalog
Regular, automatic updates of the data
A growing community that makes more and more data sets accessible
Access to a public (open source) data library operated by the Data and Research Center and OpenSanctions

Github repositories

investigraph-etl - ETL pipeline framework for FollowTheMoney data
investigraph-eu - Catalog of european datasets powered by investigraph
runpandarun - A simple interface written in python for reproducible i/o workflows around tabular data via pandas
ftmq - An attempt towards a followthemoney query dsl
investigraph-datasets - Example datasets configuration
investigraph-site - Landing page for investigraph (next.js app)
investigraph-api - public API instance to use as a test playground
ftmq-api - Lightweight API that exposes a ftm store to a public endpoint.

Supported by

In 2023, developing of investigraph was supported by Media Tech Lab Bayern batch #3 for six months.