Skip to content

Serialization

Data written to and read from a store can have different types and needs to be (de)serialized.

Serialization is needed when writing to the store (store.put function), deserialization is needed when reading from a store (store.get, store.pop or store.stream).

Serialization can be configured in store settings, or during runtime.

Defaults

Without further options, serialization mode is "auto" (see below), which makes it easy to store and retrieve arbitrary data without too much trouble. This applies to primitive data types (str, int, ...) as well as data structures that can be represented as json. More complex structures will be pickled via cloudpickle and deserialized on retrieval, but one should consider handling serialization more explicit in such cases.

Put some different data to the store:

# a text string
store.put("foo", "bar")

# a number
store.put("foo", 1)

# a dictionary
store.put("foo", {"data": 1})

# a pydantic object
store.put("foo", data)

# an arbitrary object
store.put("func", lambda x: x*2)
func = stiore.get("func")
assert func(2) == 4

When retrieving back these values, they will be converted back to the same type (even the lambda function), except the pydantic model. This will be returned as the data dictionary, see below for explicitly working with pydantic models.

Use the serialization mode

Control how data is serialized using the mode keyword. The four modes are:

  • "raw": Return value as is, assuming bytes
  • "json": Use orjson to (de)serialize
  • "pickle": Use cloudpickle to (de)serialize
  • "auto": Try different serialization methods, the default (see above)
store.put("data", "hello")
# this will return bytes:
store.get("data", mode="raw")

# explicitly use json serialization
store.put("data", {1: 2}, mode="json")

Store and retrieve pydantic models

Pass through the model option to work with pydantic data:

data = MyPydanticModel(hello="world")
store.put("data", data, model=MyPydanticModel)
# retrieve the data as the pydantic model:
res = store.get("data", model=MyPydanticModel)

Use custom functions

Use serialization_func and deserialization_func as options:

# a generator cannot be saved to a store
data = range(100)

def convert(data):
    return [d for d in data]

store.put("data", data, serialization_func=convert)

# convert back to the generator
# (that's stupid, but you get the idea...)
def unconvert(data):
    return (d for d in data)

result = store.get("data", deserialization_func=unconvert)

Of course just lambda functions could be used here as well:

store.put("double_data", 4, serialization_func=lambda x: x*2)

Reference

see reference details.