Dataset Methods & Properties
Full reference for dataset instance methods and shared properties.
Methods
| Name | Description |
|---|---|
df |
Alias for to_pandas. |
drop |
Drop the dataset from the server. |
filter |
Filter the dataset. |
head |
Return the first n rows. |
join |
Join with another dataset. |
np |
Alias for to_numpy. |
to_arrow |
Return the dataset as a PyArrow Table (materialized) or RecordBatchReader. |
to_numpy |
Return the dataset as a NumPy array (if possible). |
to_pandas |
Return the dataset as a Pandas DataFrame. |
write_dataset |
Write the dataset to a file system using PyArrow write_dataset. |
Properties
| Name | Description |
|---|---|
data_schema |
PyArrow Schema for the dataset (fetched from server). |
fs_type |
File system type for FSDataset ('server', 's3', 'gcs', 'hdfs', 'local'). |
num_rows |
Number of rows in the dataset. |
df
df(
use_nullable_dtypes: bool = False
) -> pandas.DataFrame
Alias for to_pandas.
| Parameter Name | Description |
|---|---|
use_nullable_dtypes |
Use pandas nullable dtypes if True. |
drop
drop() -> None
Drop the dataset from the server.
filter
filter(
ts_filter: str | None = None,
/,
*,
columns: list[str] | None = None,
materialize: bool = False
) -> FilteredDataset
Filter the dataset.
| Parameter Name | Description |
|---|---|
ts_filter |
SQL WHERE-style filter. |
columns |
Column subset. |
materialize |
Materialize the filtered result on the server where supported. |
head
head(
n: int = 10,
*,
as_arrow: bool = False
) -> pandas.DataFrame | pyarrow.Table
Return the first n rows.
| Parameter Name | Description |
|---|---|
n |
Rows to return. |
as_arrow |
Return PyArrow Table if True. |
join
join(
dataset: Dataset,
join_condition: str | list[str],
*,
asof_join: bool = False,
direction: str = 'forward',
allow_exact_matches: bool = True
) -> Dataset
Join with another dataset.
ASOF joins must be the final join in a join chain and cannot be followed by regular SQL joins.
InfluxDB phase 1 does not support joins.
| Parameter Name | Description |
|---|---|
dataset |
Dataset to join. |
join_condition |
Column list or SQL-style condition. |
asof_join |
ASOF join on time columns. Must be the final join in a join chain. |
direction |
Forward or backward. |
allow_exact_matches |
Disallow exact matches if False. |
np
np() -> numpy.ndarray
Alias for to_numpy.
to_arrow
to_arrow(
materialize: bool = True
) -> pyarrow.Table | pyarrow.RecordBatchReader
Return the dataset as a PyArrow Table (materialized) or RecordBatchReader.
| Parameter Name | Description |
|---|---|
materialize |
Return Table if True, otherwise RecordBatchReader. |
to_numpy
to_numpy() -> numpy.ndarray
Return the dataset as a NumPy array (if possible).
to_pandas
to_pandas(
use_nullable_dtypes: bool = False
) -> pandas.DataFrame
Return the dataset as a Pandas DataFrame.
| Parameter Name | Description |
|---|---|
use_nullable_dtypes |
Use pandas nullable dtypes if True. |
write_dataset
write_dataset(*args, **kwargs) -> None
Write the dataset to a file system using PyArrow write_dataset.
| Parameter Name | Description |
|---|---|
*args |
Passed to pa.ds.write_dataset. |
**kwargs |
Passed to pa.ds.write_dataset. |