Skip to content

memorious on pypi Python test and package pre-commit Coverage Status AGPLv3+ License

Memorious

A light-weight web scraping toolkit for Python.

Info

This is a hard fork of the original memorious project that was discontinued in 2023. Currently, this package can only be installed via git:

pip install "memorious @ git+https://github.com/dataresearchcenter/memorious.git"

See development section for what has changed since.

Features

  • Modular pipelines - Compose crawlers from reusable stages
  • Built-in operations - Fetch, parse, store, and more
  • Incremental crawling - Skip already-processed items
  • HTTP caching - Conditional requests with ETag support
  • OpenAleph integration - Push data to OpenAleph instances
  • FTM support - Extract and store FollowTheMoney entities

Quick Example

name: my_crawler
pipeline:
  init:
    method: seed
    params:
      url: https://example.com
    handle:
      pass: fetch

  fetch:
    method: fetch
    handle:
      pass: store

  store:
    method: directory
    params:
      path: ./output
pip install memorious
memorious run my_crawler.yml

Documentation

Reference