Skip to content

Dataset Methods & Properties

Full reference for dataset instance methods and shared properties.

Methods

Name Description
df Alias for to_pandas.
drop Drop the dataset from the server.
filter Filter the dataset.
head Return the first n rows.
join Join with another dataset.
np Alias for to_numpy.
to_arrow Return the dataset as a PyArrow Table (materialized) or RecordBatchReader.
to_numpy Return the dataset as a NumPy array (if possible).
to_pandas Return the dataset as a Pandas DataFrame.
write_dataset Write the dataset to a file system using PyArrow write_dataset.

Properties

Name Description
data_schema PyArrow Schema for the dataset (fetched from server).
fs_type File system type for FSDataset ('server', 's3', 'gcs', 'hdfs', 'local').
num_rows Number of rows in the dataset.

df

df(
    use_nullable_dtypes: bool = False
) -> pandas.DataFrame

Alias for to_pandas.

Parameter Name Description
use_nullable_dtypes Use pandas nullable dtypes if True.

drop

drop() -> None

Drop the dataset from the server.

filter

filter(
    ts_filter: str | None = None,
    /,
    *,
    columns: list[str] | None = None,
    materialize: bool = False
) -> FilteredDataset

Filter the dataset.

Parameter Name Description
ts_filter SQL WHERE-style filter.
columns Column subset.
materialize Materialize the filtered result on the server where supported.

head(
    n: int = 10,
    *,
    as_arrow: bool = False
) -> pandas.DataFrame | pyarrow.Table

Return the first n rows.

Parameter Name Description
n Rows to return.
as_arrow Return PyArrow Table if True.

join

join(
    dataset: Dataset,
    join_condition: str | list[str],
    *,
    asof_join: bool = False,
    direction: str = 'forward',
    allow_exact_matches: bool = True
) -> Dataset

Join with another dataset.

ASOF joins must be the final join in a join chain and cannot be followed by regular SQL joins.

InfluxDB phase 1 does not support joins.

Parameter Name Description
dataset Dataset to join.
join_condition Column list or SQL-style condition.
asof_join ASOF join on time columns. Must be the final join in a join chain.
direction Forward or backward.
allow_exact_matches Disallow exact matches if False.

np

np() -> numpy.ndarray

Alias for to_numpy.

to_arrow

to_arrow(
    materialize: bool = True
) -> pyarrow.Table | pyarrow.RecordBatchReader

Return the dataset as a PyArrow Table (materialized) or RecordBatchReader.

Parameter Name Description
materialize Return Table if True, otherwise RecordBatchReader.

to_numpy

to_numpy() -> numpy.ndarray

Return the dataset as a NumPy array (if possible).

to_pandas

to_pandas(
    use_nullable_dtypes: bool = False
) -> pandas.DataFrame

Return the dataset as a Pandas DataFrame.

Parameter Name Description
use_nullable_dtypes Use pandas nullable dtypes if True.

write_dataset

write_dataset(*args, **kwargs) -> None

Write the dataset to a file system using PyArrow write_dataset.

Parameter Name Description
*args Passed to pa.ds.write_dataset.
**kwargs Passed to pa.ds.write_dataset.