Data Sources

Files & Object Stores

from tempora.datasets import FSDataset

Class: FSDataset.

File system or object store-backed dataset.

FSDataset(
    source: str | Path | list[str | Path],
    file_format: Literal['parquet', 'csv', 'json', 'feather', 'orc'] | None = None,
    file_schema: pa.Schema | None = None,
    columns: list[str] | None = None,
    read_options: ReadOptions | None = None,
    parse_options: ParseOptions | None = None,
    convert_options: ConvertOptions | None = None,
    schema_inf_depth: float | None = None,
    partitioning: Literal['hive'] | Partitioning | None = None,
    fs_config: dict[str, Any] | None = None,
    dataset_name: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`source`	File path(s). Use `s3://`, `gcs://`, `hdfs://` for remote storage. Use `server:` prefix for datasets stored on the Tempora server.
`file_format`	File format or `None` to infer from extensions.
`file_schema`	PyArrow `Schema` for the dataset.
`columns`	Column subset to load.
`read_options`	Read Options object.
`parse_options`	Parse Options object.
`convert_options`	Convert Options object.
`schema_inf_depth`	MiB of data to use for schema inference (default 8 MiB).
`partitioning`	Partitioning configuration (`'hive'` or Partitioning Options).
`fs_config`	File system configuration passed to PyArrow LocalFileSystem, GcsFileSystem, or HadoopFileSystem.
`dataset_name`	Optional dataset name.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

BigQuery

from tempora.datasets import BigQueryDataset

Class: BigQueryDataset.

BigQuery-backed dataset.

You must first open a BigQuery connection using connect_bigquery before creating a BigQueryDataset.

BigQueryDataset(
    dataset: str,
    table: str,
    project: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`dataset`	BigQuery dataset name.
`table`	Table name within the dataset.
`project`	Optional GCP project override. If omitted, the active BigQuery connection project is used.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

ClickHouse

from tempora.datasets import ClickhouseDataset

Class: ClickhouseDataset.

ClickHouse-backed dataset.

You must first open a ClickHouse connection using connect_clickhouse before creating a ClickhouseDataset.

ClickhouseDataset(
    database: str,
    table: str,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`database`	ClickHouse database name.
`table`	Table name within the database.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

Databricks

from tempora.datasets import DatabricksDataset

Class: DatabricksDataset.

Databricks-backed dataset.

You must first open a Databricks connection using connect_databricks before creating a DatabricksDataset.

DatabricksDataset(
    schema: str,
    table: str,
    catalog: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`schema`	Databricks schema name.
`table`	Table name within the schema.
`catalog`	Optional catalog name. If omitted, the connection's default catalog is used.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

InfluxDB

from tempora.datasets import InfluxDBDataset

Class: InfluxDBDataset.

InfluxDB-backed dataset.

You must first open an InfluxDB connection using connect_influxdb before creating an InfluxDBDataset.

Limitations: joins, pivots, and materialization are not currently supported.

InfluxDBDataset(
    database: str,
    table: str,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    targets: bool = False
)

Parameters

Name	Description
`database`	InfluxDB database name.
`table`	Table name within the database.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`targets`	Whether this dataset contains target data.

MotherDuck

from tempora.datasets import MotherDuckDataset

Class: MotherDuckDataset.

MotherDuck-backed dataset.

You must first open a MotherDuck connection using connect_motherduck before creating a MotherDuckDataset.

MotherDuckDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`database`	MotherDuck database name.
`table`	Table name within the database.
`schema`	Optional schema name.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

MSSQL

from tempora.datasets import MSSQLDataset

Class: MSSQLDataset.

MSSQL-backed dataset.

You must first open an MSSQL connection using connect_mssql before creating an MSSQLDataset.

MSSQLDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`database`	MSSQL database name.
`table`	Table name within the database.
`schema`	Optional schema name.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

MySQL

from tempora.datasets import MySQLDataset

Class: MySQLDataset.

MySQL-backed dataset.

You must first open a MySQL connection using connect_mysql before creating a MySQLDataset.

MySQLDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`database`	MySQL database name.
`table`	Table name within the database.
`schema`	Optional schema name.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

Oracle

from tempora.datasets import OracleDataset

Class: OracleDataset.

Oracle-backed dataset.

You must first open an Oracle connection using connect_oracle before creating an OracleDataset.

OracleDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`database`	Oracle database or service name.
`table`	Table name within the database.
`schema`	Optional schema name.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

PostgreSQL

from tempora.datasets import PostgreSQLDataset

Class: PostgreSQLDataset.

PostgreSQL-backed dataset.

You must first open a PostgreSQL connection using connect_postgresql before creating a PostgreSQLDataset.

PostgreSQLDataset(
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`table`	PostgreSQL table name.
`schema`	Optional schema name.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

Redshift

from tempora.datasets import RedshiftDataset

Class: RedshiftDataset.

Redshift-backed dataset.

You must first open a Redshift connection using connect_redshift before creating a RedshiftDataset.

RedshiftDataset(
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`table`	Redshift table name.
`schema`	Optional schema name.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

Snowflake

from tempora.datasets import SnowflakeDataset

Class: SnowflakeDataset.

Snowflake-backed dataset.

You must first open a Snowflake connection using connect_snowflake before creating a SnowflakeDataset.

SnowflakeDataset(
    database: str,
    table: str,
    schema: str | None = None,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`database`	Snowflake database name.
`table`	Table name within the database or schema.
`schema`	Optional schema name. If omitted, Snowflake's current schema rules apply.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.

Teradata

from tempora.datasets import TeradataDataset

Class: TeradataDataset.

Teradata-backed dataset.

You must first open a Teradata connection using connect_teradata before creating a TeradataDataset.

TeradataDataset(
    database: str,
    table: str,
    time_column: str | None = None,
    entity_keys: list[str] | None = None,
    pivot: Pivot | None = None,
    materialize: bool = False,
    targets: bool = False
)

Parameters

Name	Description
`database`	Teradata database name.
`table`	Table name within the database.
`time_column`	Name of the time/sequence column (optional).
`entity_keys`	Primary key column(s) identifying entities.
`pivot`	Optional Pivot settings.
`materialize`	Whether to materialize on the server.
`targets`	Whether this dataset contains target data.