Skip to content

Transforms

TransformFunction

from tempora.samplers.transform_spec import TransformFunction

Callable protocol for transforming context-window or target-window data.

class TransformMetadata(TypedDict):
    columns: list[str]
    source: Literal['context', 'target']
    time_column: str | None
    entity_keys: list[str] | None

class TransformFunction(Protocol):
    def __call__(
        self, data: pd.DataFrame, metadata: TransformMetadata,
    ) -> pd.DataFrame:
        ...

Parameters

Name Description
data Input window data as a Pandas DataFrame.
metadata Metadata describing the window (columns, source, time_column, entity_keys).

Returns

Name Description
pd.DataFrame Transformed output data.

Notes:

  • Function must be serializable for client/server transport.
  • source is 'context' for sampler transforms and 'target' for target-spec transforms.

TransformSpec

from tempora.samplers.transform_spec import TransformSpec

Serializable transform specification used by samplers and target specifications.

TransformSpec(
    transform: TransformFunction,
    output_schema: pa.Schema | None = None,
    on_schema_mismatch: Literal['raise', 'coerce', 'skip'] = 'raise'
)

Parameters

Name Description
transform TransformFunction callable.
output_schema Optional PyArrow schema for transformed output validation/inference.
on_schema_mismatch Policy when transformed schema differs from expected: 'raise' (raise exception), 'coerce' (attempt column-wise cast and set uncastable values to null; if any required column is missing, fall through to 'skip' for that segment), or 'skip' (skip example and log warning).

LabelEncoder

from tempora.samplers.transform_spec import LabelEncoder

Built-in target transform that encodes target-window values into numeric class labels.

LabelEncoder(
    reduction: Literal['first', 'last', 'mode', 'unique'] = 'last',
    class_map: dict[Any, int] | None = None,
    unknown_policy: Literal['raise', 'skip', 'use_default'] = 'skip',
    default_label: int | None = None
)

Parameters

Name Description
reduction Reduction applied to target-window values before label encoding.
class_map Explicit mapping from raw labels to encoded integer labels; required for string targets.
unknown_policy How unknown labels are handled: 'raise', 'skip', or 'use_default'.
default_label Encoded label to use when unknown_policy is 'use_default'.

Notes:

  • LabelEncoder requires the target column dtype to be bool, integer, or string (including dictionary-encoded variants).
  • String and float/decimal target columns are permitted only when class_map is provided.
  • class_map values must be unique integers.
  • Datetime-like target columns (for example timestamp / date) are not supported by LabelEncoder.

EventEncoder

from tempora.samplers.transform_spec import EventEncoder

Built-in target transform that encodes event date/datetime targets into binary in-window labels.

EventEncoder(
    positive_label: int = 1,
    negative_label: int = 0,
    include_start: bool = True,
    include_end: bool = False,
    missing_policy: Literal['negative', 'skip', 'raise'] = 'negative'
)

Parameters

Name Description
positive_label Label assigned when an event occurs within the target window bounds.
negative_label Label assigned when no event occurs within the target window bounds.
include_start Whether to include the window start bound when checking event timestamps.
include_end Whether to include the window end bound when checking event timestamps.
missing_policy Handling for missing/invalid event values: 'negative', 'skip', or 'raise'.

Notes:

  • EventEncoder requires the target column dtype to be date or datetime (including dictionary-encoded variants).