Sampler Methods & Properties
Full reference for sampler instance methods and shared properties.
Methods
| Name | Description |
|---|---|
__call__ |
Return an iterator over sampled Batch objects from a dataset. |
iter_series |
Iterate sampled Batch objects per series. |
write_batches |
Write sampled batches to disk (local directory or cloud storage). |
Properties
| Name | Description |
|---|---|
data_schema |
Data schema for the sampler output (available after sampling a dataset). |
targets_schema |
Target schema if a target_spec is configured. |
__call__
__call__(
dataset: Dataset,
num_batches: int | None = None,
*,
reset: bool = False
) -> Iterator[Batch]
Return an iterator over sampled Batch objects from a dataset.
| Parameter Name | Description |
|---|---|
dataset |
Dataset to sample from. |
num_batches |
Optional maximum batches to yield, or None for all available. |
reset |
If True, reset the sampler's internal state before sampling. |
iter_series
iter_series(
dataset: Dataset
) -> Iterator[Iterator[Batch]]
Iterate sampled Batch objects per series.
| Parameter Name | Description |
|---|---|
dataset |
Dataset to sample from. |
write_batches
write_batches(
dataset: Dataset,
num_batches: int,
path: str | Path,
*,
prefix: str = 'batch_',
offset: int = 0,
filesystem: fs.FileSystem | None = None,
fs_config: dict[str, Any] | None = None,
overwrite: bool = False,
reset: bool = False
) -> None
Write sampled batches to disk (local directory or cloud storage).
| Parameter Name | Description |
|---|---|
dataset |
Dataset to sample from. |
num_batches |
Number of batches to write. |
path |
Output directory path. |
prefix |
Filename prefix for each batch. |
offset |
Starting index for batch numbering. |
filesystem |
Optional PyArrow filesystem instance to write to. |
fs_config |
Optional filesystem configuration if filesystem is not provided. |
overwrite |
Overwrite existing files if True. |
reset |
If True, reset the sampler's internal state before sampling. |