Skip to content

Installation

From PyPI

pip install memorious

From Source

git clone https://github.com/dataresearchcenter/memorious.git
cd memorious
pip install -e .

Optional Dependencies

Memorious has optional dependencies for specific features:

# SQL database support (SQLite, PostgreSQL)
pip install memorious[sql]

# PostgreSQL with psycopg2
pip install memorious[postgres]

# Redis support
pip install memorious[redis]

# FTP support
pip install memorious[ftp]

Verify Installation

memorious --version

Running a Crawler

Crawlers are defined in YAML files. Run a crawler with:

memorious run my_crawler.yml

Custom operations can be referenced directly by file path - no extra installation required:

# my_crawler.yml
pipeline:
  process:
    method: ./src/my_ops.py:process_data

Environment Setup

Memorious is configured via environment variables. Create a .env file or export them:

# Base directory for data storage
export MEMORIOUS_BASE_PATH=./data

# Enable debug logging
export MEMORIOUS_DEBUG=true

Development Setup

For local development and testing, in-memory storage works out of the box:

export MEMORIOUS_BASE_PATH=./data
export MEMORIOUS_CACHE_URI=memory://
export PROCRASTINATE_DB_URI=memory:
export PROCRASTINATE_SYNC=1

Production Setup

For production with multiple workers:

# Core
export MEMORIOUS_BASE_PATH=/var/lib/memorious

# Redis for shared cache
export MEMORIOUS_CACHE_URI=redis://localhost:6379/0

# PostgreSQL for job queue and tags
export MEMORIOUS_TAGS_URI=postgresql://user:pass@localhost/memorious
export PROCRASTINATE_DB_URI=postgresql://user:pass@localhost/memorious

See the Settings Reference for all configuration options.