Skip to content

Docs memorious4 on pypi PyPI Downloads PyPI - Python Version Python test and package pre-commit Coverage Status AGPLv3+ License Pydantic v2

Memorious

A light-weight web scraping toolkit for Python.

Info

This is a hard fork of the original memorious project that was discontinued in 2023. To avoid pypi naming conflict, this package is called memorious4

pip install memorious4

See development section for what has changed since.

Features

  • Modular pipelines - Compose crawlers from reusable stages
  • Built-in operations - Fetch, parse, store, and more
  • Incremental crawling - Skip already-processed items
  • HTTP caching - Conditional requests with ETag support
  • OpenAleph integration - Push data to OpenAleph instances
  • FTM support - Extract and store FollowTheMoney entities

Quick Example

name: my_crawler
pipeline:
  init:
    method: seed
    params:
      url: https://example.com
    handle:
      pass: fetch

  fetch:
    method: fetch
    handle:
      pass: store

  store:
    method: directory
    params:
      path: ./output
pip install memorious
memorious run my_crawler.yml

Documentation

Reference