CLI Reference
Crawler framework for documents and structured scrapers.
The memorious command-line interface manages crawler execution, worker processes, and crawler state. Run any command with --help to see options inline.
memorious
Top-level command. Without a subcommand, displays the help screen.
| Option | Short | Description |
|---|---|---|
--version |
-v |
Show the installed memorious version and exit. |
--settings |
Print the resolved runtime settings and exit. | |
--install-completion |
Install shell completion for the current shell. | |
--show-completion |
Print shell completion script for copying or customization. | |
--help |
Show the help message and exit. |
Subcommands
| Command | Description |
|---|---|
run |
Run a crawler from a YAML config file. |
worker |
Start the procrastinate worker to process crawler jobs. |
cancel |
Cancel pending jobs for a crawler. |
flush |
Delete all data and tags generated by a crawler. |
status |
Show crawler status: recent runs and stored document count. |
memorious run
Run a crawler from a YAML config file.
The crawler is loaded from the given YAML config (local path or remote URI), queued as a job in procrastinate, and executed by an embedded worker until the queue drains (or --wait is given).
Arguments
| Argument | Required | Description |
|---|---|---|
URI |
yes | URI or path to a crawler YAML config file. |
Options
| Option | Short | Default | Description |
|---|---|---|---|
--continue-on-error |
False |
Don't stop crawler execution on error. | |
--flush |
False |
Delete all existing data before execution. | |
--concurrency |
-c |
1 |
Number of concurrent jobs (use >1 for I/O-bound crawlers). |
--wait |
-w |
False |
Keep worker running after jobs complete (until interrupted). |
--idle-timeout |
-t |
30 |
Auto-stop after N seconds of inactivity. Defaults to 30 when concurrency>1; pass 0 to disable. |
--clear-runs / --no-clear-runs |
--clear-runs |
Cancel remaining tasks from previous runs before starting. Use --no-clear-runs to resume an interrupted crawl without losing queued jobs. |
|
--help |
Show the help message and exit. |
Examples
# Run a crawler from a local file
memorious run ./crawlers/example.yml
# Run with higher concurrency for I/O-bound work
memorious run ./crawlers/example.yml --concurrency 8
# Flush prior data and re-run from scratch
memorious run ./crawlers/example.yml --flush
# Resume a previously interrupted crawl
memorious run ./crawlers/example.yml --no-clear-runs
memorious worker
Start the procrastinate worker to process crawler jobs.
Runs a standalone worker that consumes jobs from the procrastinate queue. Use this in deployments where workers run as long-lived processes separate from memorious run.
Options
| Option | Short | Default | Description |
|---|---|---|---|
--concurrency |
-c |
1 |
Number of concurrent jobs (use >1 for I/O-bound crawlers). |
--help |
Show the help message and exit. |
Example
memorious cancel
Cancel pending jobs for a crawler.
Marks all queued and in-progress tasks for the crawler as cancelled. Already-completed work is preserved.
Arguments
| Argument | Required | Description |
|---|---|---|
URI |
yes | URI or path to a crawler YAML config file. |
Example
memorious flush
Delete all data and tags generated by a crawler.
Removes archive entries, tags, and incremental state for the crawler. Run with care — the operation is destructive and cannot be undone.
Arguments
| Argument | Required | Description |
|---|---|---|
URI |
yes | URI or path to a crawler YAML config file. |
Example
memorious status
Show crawler status: recent runs and stored document count.
Prints a table of recent runs (run id, start time, age) and the total number of stored documents.
Arguments
| Argument | Required | Description |
|---|---|---|
URI |
yes | URI or path to a crawler YAML config file. |
Options
| Option | Short | Default | Description |
|---|---|---|---|
--runs |
-r |
5 |
Number of recent runs to show. |
--help |
Show the help message and exit. |