bodo.pandas.BodoDataFrame.to_parquet¶
BodoDataFrame.to_parquet(path=None, engine="auto", compression="snappy", index=None, partition_cols=None, storage_options=None, row_group_size=-1, **kwargs)
Parameters
-
path: str: Output path to write. It can be a local path (e.g.
output.parquet
), AWS S3 (s3://...
), Azure ALDS (abfs://...
,abfss://...
), or GCP GCS (gcs://...
,gs://
). -
compression : str, default 'snappy': File compression to use. Can be None, 'snappy', 'gzip', or 'brotli'.
-
row_group_size : int: Row group size in output Parquet files. -1 allows the backend to choose.
-
All other parameters will trigger a fallback to
pandas.DataFrame.to_parquet
.
Example
import bodo.pandas as bd
bdf = bd.DataFrame(
{
"A": bd.array([1, 2, 3, 7] * 3, "Int64"),
"B": ["A1", "B1", "C1", "Abc"] * 3,
"C": bd.array([6, 5, 4] * 4, "Int64"),
}
)
bdf.to_parquet("output.parquet")
print(bd.read_parquet("output.parquet"))
Output:
A B C
0 1 A1 6
1 2 B1 5
2 3 C1 4
3 7 Abc 6
4 1 A1 5
5 2 B1 4
6 3 C1 6
7 7 Abc 5
8 1 A1 4
9 2 B1 6
10 3 C1 5
11 7 Abc 4