Transforms
TransformFunction
from tempora.samplers.transform_spec import TransformFunction
Callable protocol for transforming context-window or target-window data.
class TransformMetadata(TypedDict):
columns: list[str]
source: Literal['context', 'target']
time_column: str | None
entity_keys: list[str] | None
class TransformFunction(Protocol):
def __call__(
self, data: pd.DataFrame, metadata: TransformMetadata,
) -> pd.DataFrame:
...
Parameters
| Name | Description |
|---|---|
data |
Input window data as a Pandas DataFrame. |
metadata |
Metadata describing the window (columns, source, time_column, entity_keys). |
Returns
| Name | Description |
|---|---|
pd.DataFrame |
Transformed output data. |
Notes:
- Function must be serializable for client/server transport.
sourceis'context'for sampler transforms and'target'for target-spec transforms.
TransformSpec
from tempora.samplers.transform_spec import TransformSpec
Serializable transform specification used by samplers and target specifications.
TransformSpec(
transform: TransformFunction,
output_schema: pa.Schema | None = None,
on_schema_mismatch: Literal['raise', 'coerce', 'skip'] = 'raise'
)
Parameters
| Name | Description |
|---|---|
transform |
TransformFunction callable. |
output_schema |
Optional PyArrow schema for transformed output validation/inference. |
on_schema_mismatch |
Policy when transformed schema differs from expected: 'raise' (raise exception), 'coerce' (attempt column-wise cast and set uncastable values to null; if any required column is missing, fall through to 'skip' for that segment), or 'skip' (skip example and log warning). |
LabelEncoder
from tempora.samplers.transform_spec import LabelEncoder
Built-in target transform that encodes target-window values into numeric class labels.
LabelEncoder(
reduction: Literal['first', 'last', 'mode', 'unique'] = 'last',
class_map: dict[Any, int] | None = None,
unknown_policy: Literal['raise', 'skip', 'use_default'] = 'skip',
default_label: int | None = None
)
Parameters
| Name | Description |
|---|---|
reduction |
Reduction applied to target-window values before label encoding. |
class_map |
Explicit mapping from raw labels to encoded integer labels; required for string targets. |
unknown_policy |
How unknown labels are handled: 'raise', 'skip', or 'use_default'. |
default_label |
Encoded label to use when unknown_policy is 'use_default'. |
Notes:
LabelEncoderrequires the target column dtype to be bool, integer, or string (including dictionary-encoded variants).- String and float/decimal target columns are permitted only when
class_mapis provided. class_mapvalues must be unique integers.- Datetime-like target columns (for example
timestamp/date) are not supported byLabelEncoder.
EventEncoder
from tempora.samplers.transform_spec import EventEncoder
Built-in target transform that encodes event date/datetime targets into binary in-window labels.
EventEncoder(
positive_label: int = 1,
negative_label: int = 0,
include_start: bool = True,
include_end: bool = False,
missing_policy: Literal['negative', 'skip', 'raise'] = 'negative'
)
Parameters
| Name | Description |
|---|---|
positive_label |
Label assigned when an event occurs within the target window bounds. |
negative_label |
Label assigned when no event occurs within the target window bounds. |
include_start |
Whether to include the window start bound when checking event timestamps. |
include_end |
Whether to include the window end bound when checking event timestamps. |
missing_policy |
Handling for missing/invalid event values: 'negative', 'skip', or 'raise'. |
Notes:
EventEncoderrequires the target column dtype to be date or datetime (including dictionary-encoded variants).