Skip to content

CLI Reference

Crawler framework for documents and structured scrapers.

The memorious command-line interface manages crawler execution, worker processes, and crawler state. Run any command with --help to see options inline.

memorious

memorious [OPTIONS] COMMAND [ARGS]...

Top-level command. Without a subcommand, displays the help screen.

Option Short Description
--version -v Show the installed memorious version and exit.
--settings Print the resolved runtime settings and exit.
--install-completion Install shell completion for the current shell.
--show-completion Print shell completion script for copying or customization.
--help Show the help message and exit.

Subcommands

Command Description
run Run a crawler from a YAML config file.
worker Start the procrastinate worker to process crawler jobs.
cancel Cancel pending jobs for a crawler.
flush Delete all data and tags generated by a crawler.
status Show crawler status: recent runs and stored document count.

memorious run

Run a crawler from a YAML config file.

memorious run [OPTIONS] URI

The crawler is loaded from the given YAML config (local path or remote URI), queued as a job in procrastinate, and executed by an embedded worker until the queue drains (or --wait is given).

Arguments

Argument Required Description
URI yes URI or path to a crawler YAML config file.

Options

Option Short Default Description
--continue-on-error False Don't stop crawler execution on error.
--flush False Delete all existing data before execution.
--concurrency -c 1 Number of concurrent jobs (use >1 for I/O-bound crawlers).
--wait -w False Keep worker running after jobs complete (until interrupted).
--idle-timeout -t 30 Auto-stop after N seconds of inactivity. Defaults to 30 when concurrency>1; pass 0 to disable.
--clear-runs / --no-clear-runs --clear-runs Cancel remaining tasks from previous runs before starting. Use --no-clear-runs to resume an interrupted crawl without losing queued jobs.
--help Show the help message and exit.

Examples

# Run a crawler from a local file
memorious run ./crawlers/example.yml

# Run with higher concurrency for I/O-bound work
memorious run ./crawlers/example.yml --concurrency 8

# Flush prior data and re-run from scratch
memorious run ./crawlers/example.yml --flush

# Resume a previously interrupted crawl
memorious run ./crawlers/example.yml --no-clear-runs

memorious worker

Start the procrastinate worker to process crawler jobs.

memorious worker [OPTIONS]

Runs a standalone worker that consumes jobs from the procrastinate queue. Use this in deployments where workers run as long-lived processes separate from memorious run.

Options

Option Short Default Description
--concurrency -c 1 Number of concurrent jobs (use >1 for I/O-bound crawlers).
--help Show the help message and exit.

Example

memorious worker --concurrency 4

memorious cancel

Cancel pending jobs for a crawler.

memorious cancel [OPTIONS] URI

Marks all queued and in-progress tasks for the crawler as cancelled. Already-completed work is preserved.

Arguments

Argument Required Description
URI yes URI or path to a crawler YAML config file.

Example

memorious cancel ./crawlers/example.yml

memorious flush

Delete all data and tags generated by a crawler.

memorious flush [OPTIONS] URI

Removes archive entries, tags, and incremental state for the crawler. Run with care — the operation is destructive and cannot be undone.

Arguments

Argument Required Description
URI yes URI or path to a crawler YAML config file.

Example

memorious flush ./crawlers/example.yml

memorious status

Show crawler status: recent runs and stored document count.

memorious status [OPTIONS] URI

Prints a table of recent runs (run id, start time, age) and the total number of stored documents.

Arguments

Argument Required Description
URI yes URI or path to a crawler YAML config file.

Options

Option Short Default Description
--runs -r 5 Number of recent runs to show.
--help Show the help message and exit.

Example

memorious status ./crawlers/example.yml --runs 10