Dataset Methods & Properties

Full reference for dataset instance methods and shared properties.

Methods

Name	Description
`df`	Alias for `to_pandas`.
`drop`	Drop the dataset from the server.
`filter`	Filter the dataset.
`head`	Return the first `n` rows.
`join`	Join with another dataset.
`np`	Alias for `to_numpy`.
`to_arrow`	Return the dataset as a PyArrow `Table` (materialized) or `RecordBatchReader`.
`to_numpy`	Return the dataset as a NumPy array (if possible).
`to_pandas`	Return the dataset as a Pandas DataFrame.
`write_dataset`	Write the dataset to a file system using PyArrow `write_dataset`.

Properties

Name	Description
`data_schema`	PyArrow `Schema` for the dataset (fetched from server).
`fs_type`	File system type for `FSDataset` (`'server'`, `'s3'`, `'gcs'`, `'hdfs'`, `'local'`).
`num_rows`	Number of rows in the dataset.

`df`

df(
    use_nullable_dtypes: bool = False
) -> pandas.DataFrame

Alias for to_pandas.

Parameter Name	Description
`use_nullable_dtypes`	Use pandas nullable dtypes if True.

`drop`

drop() -> None

Drop the dataset from the server.

`filter`

filter(
    ts_filter: str | None = None,
    /,
    *,
    columns: list[str] | None = None,
    materialize: bool = False
) -> FilteredDataset

Filter the dataset.

Parameter Name	Description
`ts_filter`	SQL WHERE-style filter.
`columns`	Column subset.
`materialize`	Materialize the filtered result on the server where supported.

`head`

head(
    n: int = 10,
    *,
    as_arrow: bool = False
) -> pandas.DataFrame | pyarrow.Table

Return the first n rows.

Parameter Name	Description
`n`	Rows to return.
`as_arrow`	Return PyArrow `Table` if True.

`join`

join(
    dataset: Dataset,
    join_condition: str | list[str],
    *,
    asof_join: bool = False,
    direction: str = 'forward',
    allow_exact_matches: bool = True
) -> Dataset

Join with another dataset.

ASOF joins must be the final join in a join chain and cannot be followed by regular SQL joins.

InfluxDB phase 1 does not support joins.

Parameter Name	Description
`dataset`	Dataset to join.
`join_condition`	Column list or SQL-style condition.
`asof_join`	ASOF join on time columns. Must be the final join in a join chain.
`direction`	Forward or backward.
`allow_exact_matches`	Disallow exact matches if False.

`np`

np() -> numpy.ndarray

Alias for to_numpy.

`to_arrow`

to_arrow(
    materialize: bool = True
) -> pyarrow.Table | pyarrow.RecordBatchReader

Return the dataset as a PyArrow Table (materialized) or RecordBatchReader.

Parameter Name	Description
`materialize`	Return `Table` if True, otherwise `RecordBatchReader`.

`to_numpy`

to_numpy() -> numpy.ndarray

Return the dataset as a NumPy array (if possible).

`to_pandas`

to_pandas(
    use_nullable_dtypes: bool = False
) -> pandas.DataFrame

Return the dataset as a Pandas DataFrame.

Parameter Name	Description
`use_nullable_dtypes`	Use pandas nullable dtypes if True.

`write_dataset`

write_dataset(*args, **kwargs) -> None

Write the dataset to a file system using PyArrow write_dataset.

Parameter Name	Description
`*args`	Passed to pa.ds.write_dataset.
`**kwargs`	Passed to pa.ds.write_dataset.