Use as blob storage
Any anystore
backend can act as a file-like blob storage. Of course, this makes most sense to actual file-like backends such as a local directory or a remote s3-compatible endpoint, but technically a redis instance can act as a blob storage as well.
In general, in the blob storage use-case, store keys
are actual file paths and the corresponding values
are the file contents as a byte stream.
All high level functions described in the basic usage of course work here as well.
Configure store
Configure the base uri via environment ANYSTORE_URI
or during runtime. See configuration.
Serialization mode must be set to raw
as we are always dealing with the byte blobs of the files.
Set ANYSTORE_SERIALIZATION_MODE=raw
in environment or configure during runtime:
List and retrieve contents
Iterate through paths, optionally filtering via prefix
or glob
:
for key in store.iterate_keys(glob="**/*.pdf"):
print(key)
for key in store.iterate_keys(prefix="txt_files/src"):
print(key)
Command line can be used, too:
Retrieve content of a file (this is only useful for small files, consider using the file handler below for bigger blobs).
# change serialization mode to "auto" to retrieve a string instead of bytes
content = store.get("path/file.txt", serialization_mode="auto")
# use "json" mode if feasible
data = store.get("data.json", serialization_mode="json")
Stream a file line by line:
# each line will be serialized from json
for data in store.stream("data.jsonl", serialization_mode="json"):
yield data
Write (small) files
This is particularly useful to easily upload results of a data wrangling process to remote targets:
result = calculate_data()
store = get_store("s3://bucket/dataset", serialization_mode="json")
store.put("index.json", result)
For bigger data chunks that should be streamed, consider using the file handler below.
Get a file handler
Similar to pythons built-in open()
, a BinaryIO
handler can be accessed:
import csv
from anystore import get_store
store = get_store("s3://my_bucket")
with store.open("path/to/file.csv") as fh:
yield from csv.reader(fh)
Write data
Delete a file
As described in the basic usage, the pop
or delete
methods can be used to delete a file. This is obviously irreversible.
Copy contents from and to stores
Recursively crawl a http file listing:
from anystore import get_store
source = get_store("https://example.org/files")
target = get_store("./downloads")
for path in source.iterate_keys():
# streaming copy:
with source.open(path) as i:
with target.open(path, "wb") as o:
i.write(o.read())
Migrate the text files from a local directory to a redis backend:
from anystore import get_store
source = get_store("./local_data")
target = get_store("redis://localhost")
for path in source.iterate_keys(glob="**/*.txt"):
# streaming copy:
with source.open(path) as i:
with target.open(path, "wb") as o:
i.write(o.read())
Now, the content would be accessible in the redis store as it would be a file store:
Process remote files locally
Download a remote file for temporary use. The local file will be cleaned up when leaving the context.