Memorious
A light-weight web scraping toolkit for Python.
Info
This is a hard fork of the original memorious project that was discontinued in 2023. To avoid pypi naming conflict, this package is called memorious4
pip install memorious4
See development section for what has changed since.
Features
- Modular pipelines - Compose crawlers from reusable stages
- Built-in operations - Fetch, parse, store, and more
- Incremental crawling - Skip already-processed items
- HTTP caching - Conditional requests with ETag support
- OpenAleph integration - Push data to OpenAleph instances
- FTM support - Extract and store FollowTheMoney entities
Quick Example
name: my_crawler
pipeline:
init:
method: seed
params:
url: https://example.com
handle:
pass: fetch
fetch:
method: fetch
handle:
pass: store
store:
method: directory
params:
path: ./output
Documentation
- Quick Start - Get up and running in minutes
- Installation - Installation and setup
- Crawlers - How to configure crawlers
- Operations - Available operations
Reference
- CLI Reference - Command-line interface
- Crawler Reference - Configuration options
- Operations Reference - API documentation
- Settings Reference - Environment variables