Skip to content

Data Sources

Files & Object Stores

from tempora.datasets import FSDataset

Class: FSDataset.

File system or object store-backed dataset.

FSDataset(
    source: str | Path | list[str | Path],
    file_format: Literal['parquet', 'csv', 'json', 'feather', 'orc'] | None = None,
    file_schema: pa.Schema | None = None,
    columns: list[str] | None = None,
    read_options: ReadOptions | None = None,
    parse_options: ParseOptions | None = None,
    convert_options: ConvertOptions | None = None,
    schema_inf_depth: float | None = None,
    partitioning: Literal['hive'] | Partitioning | None = None,
    fs_config: dict[str, Any] | None = None,
    dataset_name: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
source File path(s). Use s3://, gcs://, hdfs:// for remote storage. Use server: prefix for datasets stored on the Tempora server.
file_format File format or None to infer from extensions.
file_schema PyArrow Schema for the dataset.
columns Column subset to load.
read_options Read Options object.
parse_options Parse Options object.
convert_options Convert Options object.
schema_inf_depth MiB of data to use for schema inference (default 8 MiB).
partitioning Partitioning configuration ('hive' or Partitioning Options).
fs_config File system configuration passed to PyArrow LocalFileSystem, GcsFileSystem, or HadoopFileSystem.
dataset_name Optional dataset name.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

BigQuery

from tempora.datasets import BigQueryDataset

Class: BigQueryDataset.

BigQuery-backed dataset.

You must first open a BigQuery connection using connect_bigquery before creating a BigQueryDataset.

BigQueryDataset(
    dataset: str,
    table: str,
    project: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
dataset BigQuery dataset name.
table Table name within the dataset.
project Optional GCP project override. If omitted, the active BigQuery connection project is used.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

ClickHouse

from tempora.datasets import ClickhouseDataset

Class: ClickhouseDataset.

ClickHouse-backed dataset.

You must first open a ClickHouse connection using connect_clickhouse before creating a ClickhouseDataset.

ClickhouseDataset(
    database: str,
    table: str,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
database ClickHouse database name.
table Table name within the database.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

Databricks

from tempora.datasets import DatabricksDataset

Class: DatabricksDataset.

Databricks-backed dataset.

You must first open a Databricks connection using connect_databricks before creating a DatabricksDataset.

DatabricksDataset(
    schema: str,
    table: str,
    catalog: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
schema Databricks schema name.
table Table name within the schema.
catalog Optional catalog name. If omitted, the connection's default catalog is used.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

InfluxDB

from tempora.datasets import InfluxDBDataset

Class: InfluxDBDataset.

InfluxDB-backed dataset.

You must first open an InfluxDB connection using connect_influxdb before creating an InfluxDBDataset.

Limitations: joins, pivots, and materialization are not currently supported.

InfluxDBDataset(
    database: str,
    table: str,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    targets: bool = False
)

Parameters

Name Description
database InfluxDB database name.
table Table name within the database.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
targets Whether this dataset contains target data.

MotherDuck

from tempora.datasets import MotherDuckDataset

Class: MotherDuckDataset.

MotherDuck-backed dataset.

You must first open a MotherDuck connection using connect_motherduck before creating a MotherDuckDataset.

MotherDuckDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
database MotherDuck database name.
table Table name within the database.
schema Optional schema name.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

MSSQL

from tempora.datasets import MSSQLDataset

Class: MSSQLDataset.

MSSQL-backed dataset.

You must first open an MSSQL connection using connect_mssql before creating an MSSQLDataset.

MSSQLDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
database MSSQL database name.
table Table name within the database.
schema Optional schema name.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

MySQL

from tempora.datasets import MySQLDataset

Class: MySQLDataset.

MySQL-backed dataset.

You must first open a MySQL connection using connect_mysql before creating a MySQLDataset.

MySQLDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
database MySQL database name.
table Table name within the database.
schema Optional schema name.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

Oracle

from tempora.datasets import OracleDataset

Class: OracleDataset.

Oracle-backed dataset.

You must first open an Oracle connection using connect_oracle before creating an OracleDataset.

OracleDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
database Oracle database or service name.
table Table name within the database.
schema Optional schema name.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

PostgreSQL

from tempora.datasets import PostgreSQLDataset

Class: PostgreSQLDataset.

PostgreSQL-backed dataset.

You must first open a PostgreSQL connection using connect_postgresql before creating a PostgreSQLDataset.

PostgreSQLDataset(
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
table PostgreSQL table name.
schema Optional schema name.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

Redshift

from tempora.datasets import RedshiftDataset

Class: RedshiftDataset.

Redshift-backed dataset.

You must first open a Redshift connection using connect_redshift before creating a RedshiftDataset.

RedshiftDataset(
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
table Redshift table name.
schema Optional schema name.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

Snowflake

from tempora.datasets import SnowflakeDataset

Class: SnowflakeDataset.

Snowflake-backed dataset.

You must first open a Snowflake connection using connect_snowflake before creating a SnowflakeDataset.

SnowflakeDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
database Snowflake database name.
table Table name within the database or schema.
schema Optional schema name. If omitted, Snowflake's current schema rules apply.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.

Teradata

from tempora.datasets import TeradataDataset

Class: TeradataDataset.

Teradata-backed dataset.

You must first open a Teradata connection using connect_teradata before creating a TeradataDataset.

TeradataDataset(
    database: str,
    table: str,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name Description
database Teradata database name.
table Table name within the database.
time_column Name of the time/sequence column (optional).
entity_keys Primary key column(s) identifying entities.
pivot Optional Pivot settings.
materialize Whether to materialize on the server.
targets Whether this dataset contains target data.