Skip to content

Sampler Methods & Properties

Full reference for sampler instance methods and shared properties.

Methods

Name Description
__call__ Return an iterator over sampled Batch objects from a dataset.
iter_series Iterate sampled Batch objects per series.
write_batches Write sampled batches to disk (local directory or cloud storage).

Properties

Name Description
data_schema Data schema for the sampler output (available after sampling a dataset).
targets_schema Target schema if a target_spec is configured.

__call__

__call__(
    dataset: Dataset,
    num_batches: int | None = None,
    *,
    reset: bool = False
) -> Iterator[Batch]

Return an iterator over sampled Batch objects from a dataset.

Parameter Name Description
dataset Dataset to sample from.
num_batches Optional maximum batches to yield, or None for all available.
reset If True, reset the sampler's internal state before sampling.

iter_series

iter_series(
    dataset: Dataset
) -> Iterator[Iterator[Batch]]

Iterate sampled Batch objects per series.

Parameter Name Description
dataset Dataset to sample from.

write_batches

write_batches(
    dataset: Dataset,
    num_batches: int,
    path: str | Path,
    *,
    prefix: str = 'batch_',
    offset: int = 0,
    filesystem: fs.FileSystem | None = None,
    fs_config: dict[str, Any] | None = None,
    overwrite: bool = False,
    reset: bool = False
) -> None

Write sampled batches to disk (local directory or cloud storage).

Parameter Name Description
dataset Dataset to sample from.
num_batches Number of batches to write.
path Output directory path.
prefix Filename prefix for each batch.
offset Starting index for batch numbering.
filesystem Optional PyArrow filesystem instance to write to.
fs_config Optional filesystem configuration if filesystem is not provided.
overwrite Overwrite existing files if True.
reset If True, reset the sampler's internal state before sampling.