Input/Output¶
bodo.pandas.read_parquet¶
bodo.pandas.read_parquet(
path,
engine="auto",
columns=None,
storage_options=None,
use_nullable_dtypes=lib.no_default,
dtype_backend=lib.no_default,
filesystem=None,
filters=None,
**kwargs,
) -> BodoDataFrame
Creates a BodoDataFrame object for reading from parquet file(s) lazily.
Parameters
-
path : str, list[str]: Location of the parquet file(s) to read. Refer to
pandas.read_parquet
for more details. The type of this argument differs from Pandas. -
All other parameters will trigger a fallback to
pandas.read_parquet
if a non-default value is provided. Returns
-
BodoDataFrame
Example
import bodo
import bodo.pandas as bodo_pd
import pandas as pd
original_df = pd.DataFrame(
{"foo": range(15), "bar": range(15, 30)}
)
@bodo.jit
def write_parquet(df):
df.to_parquet("example.pq")
write_parquet(original_df)
restored_df = bodo_pd.read_parquet("example.pq")
print(type(restored_df))
print(restored_df.head())
Output:
bodo.pandas.read_iceberg¶
bodo.pandas.read_iceberg(
table_identifier: str,
catalog_name: str | None = None,
catalog_properties: dict[str, Any] | None = None,
row_filter: str | None = None,
selected_fields: tuple[str] | None = None,
case_sensitive: bool = True,
snapshot_id: int | None = None,
limit: int | None = None,
scan_properties: dict[str, Any] | None = None,
) -> BodoDataFrame
Creates a BodoDataFrame object for reading from an Iceberg table lazily.
Refer to pandas.read_iceberg
for more details.
Warning
This function is experimental in Pandas and may change in future releases.
Parameters
-
table_identifier: str: Identifier of the Iceberg table to read. This should be in the format
schema.table
-
catalog_name: str, optional: Name of the catalog to use. If not provided, the default catalog will be used. See PyIceberg's documentation for more details.
-
catalog_properties: dict[str, Any], optional: Properties for the catalog connection.
-
row_filter: str, optional: expression to filter rows.
-
selected_fields: tuple[str], optional: Fields to select from the table, if not provided, all fields will be selected.
-
snapshot_id: int, optional: ID of the snapshot to read from. If not provided, the latest snapshot will be used.
-
limit: int, optional: Maximum number of rows to read. If not provided, all rows will be read.
-
Non-default values for case_sensitive and scan_properties will trigger a fallback to
pandas.read_iceberg
. Returns
-
BodoDataFrame
Example
Read a table using a predefined PyIceberg catalog.
import bodo
import bodo.pandas as bodo_pd
df = bodo_pd.read_iceberg(
table_identifier="my_schema.my_table",
catalog_name="my_catalog",
row_filter="col1 > 10",
selected_fields=("col1", "col2"),
snapshot_id=123456789,
limit=1000
)
Read a table using a new PyIceberg catalog with custom properties.