Batch Functions
read_batches
from tempora.utils.batch import read_batches
Deserialize batches from a directory of parquet files.
read_batches(
path: str | Path,
*,
filesystem: fs.FileSystem | None = None,
fs_config: dict[str, Any] | None = None,
as_tensor: bool = False,
pad_value: int | float = np.nan,
output_format: OutputFormat | None = None,
pin_memory: bool = False
) -> list[Batch]
Parameters
| Name | Description |
|---|---|
path |
Directory path to serialized batch files. |
filesystem |
Optional PyArrow filesystem to read from. |
fs_config |
Filesystem configuration if filesystem is not provided. |
as_tensor |
Return tensor form if True, otherwise packed matrix form. |
pad_value |
Tensor padding value for uneven sequence lengths. |
output_format |
Batch data format. Supported aliases: 'pytorch' ('pt'), 'tensorflow' ('tf'), 'jax' ('jx'), 'numpy' ('np'), 'pandas' ('df'), 'pyarrow' ('pa'). |
pin_memory |
For PyTorch tensors, use page-locked CPU memory to speed up GPU transfer. |