Data Sources
Files & Object Stores
from tempora.datasets import FSDataset
Class: FSDataset.
File system or object store-backed dataset.
FSDataset(
source: str | Path | list[str | Path],
file_format: Literal['parquet', 'csv', 'json', 'feather', 'orc'] | None = None,
file_schema: pa.Schema | None = None,
columns: list[str] | None = None,
read_options: ReadOptions | None = None,
parse_options: ParseOptions | None = None,
convert_options: ConvertOptions | None = None,
schema_inf_depth: float | None = None,
partitioning: Literal['hive'] | Partitioning | None = None,
fs_config: dict[str, Any] | None = None,
dataset_name: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
source |
File path(s). Use s3://, gcs://, hdfs:// for remote storage. Use server: prefix for datasets stored on the Tempora server. |
file_format |
File format or None to infer from extensions. |
file_schema |
PyArrow Schema for the dataset. |
columns |
Column subset to load. |
read_options |
Read Options object. |
parse_options |
Parse Options object. |
convert_options |
Convert Options object. |
schema_inf_depth |
MiB of data to use for schema inference (default 8 MiB). |
partitioning |
Partitioning configuration ('hive' or Partitioning Options). |
fs_config |
File system configuration passed to PyArrow LocalFileSystem, GcsFileSystem, or HadoopFileSystem. |
dataset_name |
Optional dataset name. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
BigQuery
from tempora.datasets import BigQueryDataset
Class: BigQueryDataset.
BigQuery-backed dataset.
You must first open a BigQuery connection using connect_bigquery before creating a BigQueryDataset.
BigQueryDataset(
dataset: str,
table: str,
project: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
dataset |
BigQuery dataset name. |
table |
Table name within the dataset. |
project |
Optional GCP project override. If omitted, the active BigQuery connection project is used. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
ClickHouse
from tempora.datasets import ClickhouseDataset
Class: ClickhouseDataset.
ClickHouse-backed dataset.
You must first open a ClickHouse connection using connect_clickhouse before creating a ClickhouseDataset.
ClickhouseDataset(
database: str,
table: str,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
database |
ClickHouse database name. |
table |
Table name within the database. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
Databricks
from tempora.datasets import DatabricksDataset
Class: DatabricksDataset.
Databricks-backed dataset.
You must first open a Databricks connection using connect_databricks before creating a DatabricksDataset.
DatabricksDataset(
schema: str,
table: str,
catalog: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
schema |
Databricks schema name. |
table |
Table name within the schema. |
catalog |
Optional catalog name. If omitted, the connection's default catalog is used. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
InfluxDB
from tempora.datasets import InfluxDBDataset
Class: InfluxDBDataset.
InfluxDB-backed dataset.
You must first open an InfluxDB connection using connect_influxdb before creating an InfluxDBDataset.
Limitations: joins, pivots, and materialization are not currently supported.
InfluxDBDataset(
database: str,
table: str,
time_column: str | None = None,
entity_keys: list[str] | None = None,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
database |
InfluxDB database name. |
table |
Table name within the database. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
targets |
Whether this dataset contains target data. |
MotherDuck
from tempora.datasets import MotherDuckDataset
Class: MotherDuckDataset.
MotherDuck-backed dataset.
You must first open a MotherDuck connection using connect_motherduck before creating a MotherDuckDataset.
MotherDuckDataset(
database: str,
table: str,
schema: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
database |
MotherDuck database name. |
table |
Table name within the database. |
schema |
Optional schema name. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
MSSQL
from tempora.datasets import MSSQLDataset
Class: MSSQLDataset.
MSSQL-backed dataset.
You must first open an MSSQL connection using connect_mssql before creating an MSSQLDataset.
MSSQLDataset(
database: str,
table: str,
schema: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
database |
MSSQL database name. |
table |
Table name within the database. |
schema |
Optional schema name. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
MySQL
from tempora.datasets import MySQLDataset
Class: MySQLDataset.
MySQL-backed dataset.
You must first open a MySQL connection using connect_mysql before creating a MySQLDataset.
MySQLDataset(
database: str,
table: str,
schema: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
database |
MySQL database name. |
table |
Table name within the database. |
schema |
Optional schema name. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
Oracle
from tempora.datasets import OracleDataset
Class: OracleDataset.
Oracle-backed dataset.
You must first open an Oracle connection using connect_oracle before creating an OracleDataset.
OracleDataset(
database: str,
table: str,
schema: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
database |
Oracle database or service name. |
table |
Table name within the database. |
schema |
Optional schema name. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
PostgreSQL
from tempora.datasets import PostgreSQLDataset
Class: PostgreSQLDataset.
PostgreSQL-backed dataset.
You must first open a PostgreSQL connection using connect_postgresql before creating a PostgreSQLDataset.
PostgreSQLDataset(
table: str,
schema: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
table |
PostgreSQL table name. |
schema |
Optional schema name. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
Redshift
from tempora.datasets import RedshiftDataset
Class: RedshiftDataset.
Redshift-backed dataset.
You must first open a Redshift connection using connect_redshift before creating a RedshiftDataset.
RedshiftDataset(
table: str,
schema: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
table |
Redshift table name. |
schema |
Optional schema name. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
Snowflake
from tempora.datasets import SnowflakeDataset
Class: SnowflakeDataset.
Snowflake-backed dataset.
You must first open a Snowflake connection using connect_snowflake before creating a SnowflakeDataset.
SnowflakeDataset(
database: str,
table: str,
schema: str | None = None,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
database |
Snowflake database name. |
table |
Table name within the database or schema. |
schema |
Optional schema name. If omitted, Snowflake's current schema rules apply. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |
Teradata
from tempora.datasets import TeradataDataset
Class: TeradataDataset.
Teradata-backed dataset.
You must first open a Teradata connection using connect_teradata before creating a TeradataDataset.
TeradataDataset(
database: str,
table: str,
time_column: str | None = None,
entity_keys: list[str] | None = None,
pivot: Pivot | None = None,
materialize: bool = False,
targets: bool = False
)
Parameters
| Name | Description |
|---|---|
database |
Teradata database name. |
table |
Table name within the database. |
time_column |
Name of the time/sequence column (optional). |
entity_keys |
Primary key column(s) identifying entities. |
pivot |
Optional Pivot settings. |
materialize |
Whether to materialize on the server. |
targets |
Whether this dataset contains target data. |