Sampler Classes

`RandomSampler`

from tempora.samplers import RandomSampler

Randomly samples batches from a dataset.

RandomSampler(
    context_len: int | float | dt.timedelta | np.timedelta64 | pd.Timedelta | Length,
    batch_size: int = 32,
    columns: list[str] | None = None,
    transform_spec: TransformSpec | None = None,
    target_spec: TargetSpec | None = None,
    class_sampling: list[ClassSamplingSpec] | None = None,
    output_format: OutputFormat = 'ndarray',
    as_tensor: bool = False,
    pad_value: int | float = np.nan,
    options: SamplerOptions = SamplerOptions()
)

Parameters

Name	Description
`context_len`	Length of each sampled context window (`Length` or time delta).
`batch_size`	Number of samples per batch.
`columns`	Optional feature columns to include.
`transform_spec`	Optional `TransformSpec` for context-window transforms.
`target_spec`	Optional Target Specification object.
`class_sampling`	Optional list of `ClassSamplingSpec` entries for weighted sampling of class-based targets. Each class spec defines a SQL expression that matches rows containing the target/label together with its sampling weight. If total class weights are less than `1`, the remainder is treated as an implicit background class.
`output_format`	Batch data format. Supported aliases: `'pytorch'` (`'pt'`), `'tensorflow'` (`'tf'`), `'jax'` (`'jx'`), `'numpy'` (`'np'`), `'pandas'` (`'df'`), `'pyarrow'` (`'pa'`).
`as_tensor`	Convert output to tensor format where applicable.
`pad_value`	Value used to pad variable-length sequences.
`options`	Sampler options (buffering, randomness, etc.).

`SequentialSampler`

from tempora.samplers import SequentialSampler

Samples sequential batches from each entity/series.

SequentialSampler(
    context_len: int | float | dt.timedelta | np.timedelta64 | pd.Timedelta | Length,
    batch_size: int = 32,
    columns: list[str] | None = None,
    transform_spec: TransformSpec | None = None,
    target_spec: TargetSpec | None = None,
    output_format: OutputFormat = 'ndarray',
    as_tensor: bool = False,
    pad_value: int | float = np.nan,
    options: SamplerOptions = SamplerOptions(),
    random_start: bool = False,
    random_end: bool = False
)

Parameters

Name	Description
`context_len`	Length of each sampled context window (`Length` or time delta).
`batch_size`	Number of samples per batch.
`columns`	Optional feature columns to include.
`transform_spec`	Optional `TransformSpec` for context-window transforms.
`target_spec`	Optional Target Specification object.
`output_format`	Batch data format. Supported aliases: `'pytorch'` (`'pt'`), `'tensorflow'` (`'tf'`), `'jax'` (`'jx'`), `'numpy'` (`'np'`), `'pandas'` (`'df'`), `'pyarrow'` (`'pa'`).
`as_tensor`	Convert output to tensor format where applicable.
`pad_value`	Value used to pad variable-length sequences.
`options`	Sampler options (buffering, randomness, etc.).
`random_start`	Randomize the start offset for each series.
`random_end`	Randomize the end offset for each series.

`SeriesSampler`

from tempora.samplers import SeriesSampler

Samples batches from a dataset without a required context_len argument.

SeriesSampler(
    batch_size: int = 32,
    columns: list[str] | None = None,
    transform_spec: TransformSpec | None = None,
    target_spec: SeriesTargetSpec | None = None,
    output_format: OutputFormat = 'ndarray',
    as_tensor: bool = False,
    pad_value: int | float = np.nan,
    options: SamplerOptions = SamplerOptions()
)

Parameters

Name	Description
`batch_size`	Number of samples per batch.
`columns`	Optional feature columns to include.
`transform_spec`	Optional `TransformSpec` for context-window transforms.
`target_spec`	Optional series-level Target Specification object.
`output_format`	Batch data format. Supported aliases: `'pytorch'` (`'pt'`), `'tensorflow'` (`'tf'`), `'jax'` (`'jx'`), `'numpy'` (`'np'`), `'pandas'` (`'df'`), `'pyarrow'` (`'pa'`).
`as_tensor`	Convert output to tensor format where applicable.
`pad_value`	Value used to pad variable-length sequences.
`options`	Sampler options (buffering, randomness, etc.).

`SamplerOptions`

from tempora.samplers import SamplerOptions

Options for batch samplers.

SamplerOptions(
    use_table_cache: bool = True,
    incremental_table_update: bool = True,
    allow_partial_segments: bool = True,
    allow_null_entity_keys: bool = False,
    weight_series: Literal['duration', 'inverse_duration', 'num_rows', 'inverse_num_rows'] | None = None,
    left_censor_len: int | float | dt.timedelta | np.timedelta64 | pd.Timedelta | Length | None = None,
    right_censor_len: int | float | dt.timedelta | np.timedelta64 | pd.Timedelta | Length | None = None,
    align_on_data: bool = False,
    segments_per_query: int | None = None,
    segments_buffer_len: int | None = None,
    max_attempts: int = 1000,
    rng_seed: int = 201174
)

Parameters

Name	Description
`use_table_cache`	Use the cached sampling table for the dataset, otherwise compute and cache a new table before sampling.
`incremental_table_update`	Build the sampling table incrementally during sampling instead of computing it upfront (only for datasets partitioned on `entity_keys`).
`allow_partial_segments`	Allow sampling segments shorter than the sampler context length (for example at series boundaries).
`allow_null_entity_keys`	Allow sampling segments where one or more entity key columns are null.
`weight_series`	Optional series-level sample weighting. Use `'duration'` / `'num_rows'` to sample in proportion to series duration (longer series yield more segments), `'inverse_duration'` / `'inverse_num_rows'` to favor shorter series, or `None` (default) for approximately equal contribution per series.
`left_censor_len`	Optional exclusion of the first `left_censor_len` of each time series from sampling.
`right_censor_len`	Optional exclusion of the last `right_censor_len` of each time series from sampling.
`align_on_data`	Align the start of each sampled segment to the nearest time point (useful for unevenly sampled data).
`segments_per_query`	Optional number of sampled segments to generate per SQL query.
`segments_buffer_len`	Optional number of sampled segments to buffer from the server.
`max_attempts`	Maximum attempts to successfully sample a segment before raising an exception.
`rng_seed`	RNG seed for the sampler.

`ClassSamplingSpec`

from tempora.samplers import ClassSamplingSpec

Class-level sampling specification for classification targets.

ClassSamplingSpec(
    name: str,
    expr: str,
    weight: float
)

Parameters

Name	Description
`name`	Class identifier used for logging/debugging. Must be unique within `class_sampling`.
`expr`	SQL-compatible expression that matches all rows in the dataset containing the desired target/label.
`weight`	Normalized class sampling weight in `(0, 1]`.