Skip to content

Sampler Classes

RandomSampler

from tempora.samplers import RandomSampler

Randomly samples batches from a dataset.

RandomSampler(
    context_len: int | float | dt.timedelta | np.timedelta64 | pd.Timedelta | Length,
    batch_size: int = 32,
    columns: list[str] | None = None,
    transform_spec: TransformSpec | None = None,
    target_spec: TargetSpec | None = None,
    class_sampling: list[ClassSamplingSpec] | None = None,
    output_format: OutputFormat = 'ndarray',
    as_tensor: bool = False,
    pad_value: int | float = np.nan,
    options: SamplerOptions = SamplerOptions()
)

Parameters

Name Description
context_len Length of each sampled context window (Length or time delta).
batch_size Number of samples per batch.
columns Optional feature columns to include.
transform_spec Optional TransformSpec for context-window transforms.
target_spec Optional Target Specification object.
class_sampling Optional list of ClassSamplingSpec entries for weighted sampling of class-based targets. Each class spec defines a SQL expression that matches rows containing the target/label together with its sampling weight. If total class weights are less than 1, the remainder is treated as an implicit background class.
output_format Batch data format. Supported aliases: 'pytorch' ('pt'), 'tensorflow' ('tf'), 'jax' ('jx'), 'numpy' ('np'), 'pandas' ('df'), 'pyarrow' ('pa').
as_tensor Convert output to tensor format where applicable.
pad_value Value used to pad variable-length sequences.
options Sampler options (buffering, randomness, etc.).

SequentialSampler

from tempora.samplers import SequentialSampler

Samples sequential batches from each entity/series.

SequentialSampler(
    context_len: int | float | dt.timedelta | np.timedelta64 | pd.Timedelta | Length,
    batch_size: int = 32,
    columns: list[str] | None = None,
    transform_spec: TransformSpec | None = None,
    target_spec: TargetSpec | None = None,
    output_format: OutputFormat = 'ndarray',
    as_tensor: bool = False,
    pad_value: int | float = np.nan,
    options: SamplerOptions = SamplerOptions(),
    random_start: bool = False,
    random_end: bool = False
)

Parameters

Name Description
context_len Length of each sampled context window (Length or time delta).
batch_size Number of samples per batch.
columns Optional feature columns to include.
transform_spec Optional TransformSpec for context-window transforms.
target_spec Optional Target Specification object.
output_format Batch data format. Supported aliases: 'pytorch' ('pt'), 'tensorflow' ('tf'), 'jax' ('jx'), 'numpy' ('np'), 'pandas' ('df'), 'pyarrow' ('pa').
as_tensor Convert output to tensor format where applicable.
pad_value Value used to pad variable-length sequences.
options Sampler options (buffering, randomness, etc.).
random_start Randomize the start offset for each series.
random_end Randomize the end offset for each series.

SeriesSampler

from tempora.samplers import SeriesSampler

Samples batches from a dataset without a required context_len argument.

SeriesSampler(
    batch_size: int = 32,
    columns: list[str] | None = None,
    transform_spec: TransformSpec | None = None,
    target_spec: SeriesTargetSpec | None = None,
    output_format: OutputFormat = 'ndarray',
    as_tensor: bool = False,
    pad_value: int | float = np.nan,
    options: SamplerOptions = SamplerOptions()
)

Parameters

Name Description
batch_size Number of samples per batch.
columns Optional feature columns to include.
transform_spec Optional TransformSpec for context-window transforms.
target_spec Optional series-level Target Specification object.
output_format Batch data format. Supported aliases: 'pytorch' ('pt'), 'tensorflow' ('tf'), 'jax' ('jx'), 'numpy' ('np'), 'pandas' ('df'), 'pyarrow' ('pa').
as_tensor Convert output to tensor format where applicable.
pad_value Value used to pad variable-length sequences.
options Sampler options (buffering, randomness, etc.).

SamplerOptions

from tempora.samplers import SamplerOptions

Options for batch samplers.

SamplerOptions(
    use_table_cache: bool = True,
    incremental_table_update: bool = True,
    allow_partial_segments: bool = True,
    allow_null_entity_keys: bool = False,
    weight_series: Literal['duration', 'inverse_duration', 'num_rows', 'inverse_num_rows'] | None = None,
    left_censor_len: int | float | dt.timedelta | np.timedelta64 | pd.Timedelta | Length | None = None,
    right_censor_len: int | float | dt.timedelta | np.timedelta64 | pd.Timedelta | Length | None = None,
    align_on_data: bool = False,
    segments_per_query: int | None = None,
    segments_buffer_len: int | None = None,
    max_attempts: int = 1000,
    rng_seed: int = 201174
)

Parameters

Name Description
use_table_cache Use the cached sampling table for the dataset, otherwise compute and cache a new table before sampling.
incremental_table_update Build the sampling table incrementally during sampling instead of computing it upfront (only for datasets partitioned on entity_keys).
allow_partial_segments Allow sampling segments shorter than the sampler context length (for example at series boundaries).
allow_null_entity_keys Allow sampling segments where one or more entity key columns are null.
weight_series Optional series-level sample weighting. Use 'duration' / 'num_rows' to sample in proportion to series duration (longer series yield more segments), 'inverse_duration' / 'inverse_num_rows' to favor shorter series, or None (default) for approximately equal contribution per series.
left_censor_len Optional exclusion of the first left_censor_len of each time series from sampling.
right_censor_len Optional exclusion of the last right_censor_len of each time series from sampling.
align_on_data Align the start of each sampled segment to the nearest time point (useful for unevenly sampled data).
segments_per_query Optional number of sampled segments to generate per SQL query.
segments_buffer_len Optional number of sampled segments to buffer from the server.
max_attempts Maximum attempts to successfully sample a segment before raising an exception.
rng_seed RNG seed for the sampler.

ClassSamplingSpec

from tempora.samplers import ClassSamplingSpec

Class-level sampling specification for classification targets.

ClassSamplingSpec(
    name: str,
    expr: str,
    weight: float
)

Parameters

Name Description
name Class identifier used for logging/debugging. Must be unique within class_sampling.
expr SQL-compatible expression that matches all rows in the dataset containing the desired target/label.
weight Normalized class sampling weight in (0, 1].