Skip to content

Batch Functions

read_batches

from tempora.utils.batch import read_batches

Deserialize batches from a directory of parquet files.

read_batches(
    path: str | Path,
    *,
    filesystem: fs.FileSystem | None = None,
    fs_config: dict[str, Any] | None = None,
    as_tensor: bool = False,
    pad_value: int | float = np.nan,
    output_format: OutputFormat | None = None,
    pin_memory: bool = False
) -> list[Batch]

Parameters

Name Description
path Directory path to serialized batch files.
filesystem Optional PyArrow filesystem to read from.
fs_config Filesystem configuration if filesystem is not provided.
as_tensor Return tensor form if True, otherwise packed matrix form.
pad_value Tensor padding value for uneven sequence lengths.
output_format Batch data format. Supported aliases: 'pytorch' ('pt'), 'tensorflow' ('tf'), 'jax' ('jx'), 'numpy' ('np'), 'pandas' ('df'), 'pyarrow' ('pa').
pin_memory For PyTorch tensors, use page-locked CPU memory to speed up GPU transfer.