Input/Output¶

bodo.pandas.read_parquet¶

bodo.pandas.read_parquet(
    path,
    engine="auto",
    columns=None,
    storage_options=None,
    use_nullable_dtypes=lib.no_default,
    dtype_backend=lib.no_default,
    filesystem=None,
    filters=None,
    **kwargs,
) -> BodoDataFrame

Creates a BodoDataFrame object for reading from parquet file(s) lazily.

Parameters: path : str, list[str]: Location of the parquet file(s) to read. Refer to pandas.read_parquet for more details. The type of this argument differs from Pandas.; All other parameters will trigger a fallback to pandas.read_parquet if a non-default value is provided.
Returns: BodoDataFrame

Example

import bodo.pandas as bd

original_df = bd.DataFrame(
    {"foo": range(15), "bar": range(15, 30)}
   )

original_df.to_parquet("example.pq")

restored_df = bd.read_parquet("example.pq")
print(type(restored_df))
print(restored_df.head())

Output:

<class 'bodo.pandas.frame.BodoDataFrame'>
   foo  bar
0    0   15
1    1   16
2    2   17
3    3   18
4    4   19

bodo.pandas.read_iceberg¶

bodo.pandas.read_iceberg(
    table_identifier: str,
    catalog_name: str | None = None,
    catalog_properties: dict[str, Any] | None = None,
    row_filter: str | None = None,
    selected_fields: tuple[str] | None = None,
    case_sensitive: bool = True,
    snapshot_id: int | None = None,
    limit: int | None = None,
    scan_properties: dict[str, Any] | None = None,
    location: str | None = None,
) -> BodoDataFrame

Creates a BodoDataFrame object for reading from an Iceberg table lazily.

Refer to pandas.read_iceberg for more details.

Warning

This function is experimental in Pandas and may change in future releases.

Parameters: table_identifier: str: Identifier of the Iceberg table to read. This should be in the format schema.table; catalog_name: str, optional: Name of the catalog to use. If not provided, the default catalog will be used. See PyIceberg's documentation for more details.; catalog_properties: dict[str, Any], optional: Properties for the catalog connection.; row_filter: str, optional: expression to filter rows.; selected_fields: tuple[str], optional: Fields to select from the table, if not provided, all fields will be selected.; snapshot_id: int, optional: ID of the snapshot to read from. If not provided, the latest snapshot will be used.; limit: int, optional: Maximum number of rows to read. If not provided, all rows will be read.; location: str, optional: Location of the table (if supported by the catalog). If this is passed a path and catalog_name and catalog_properties are None, it will use a filesystem catalog with the provided location. If the location is an S3 Tables ARN it will use the S3TablesCatalog.; Non-default values for case_sensitive and scan_properties will trigger a fallback to pandas.read_iceberg.
Returns: BodoDataFrame

Examples

Simple read of a table stored without a catalog on the filesystem:

import bodo.pandas as bd

df = bd.read_iceberg("my_table", location="s3://path/to/iceberg/warehouse")

Read a table using a predefined PyIceberg catalog.

import bodo.pandas as bd

df = bd.read_iceberg(
    table_identifier="my_schema.my_table",
    catalog_name="my_catalog",
    row_filter="col1 > 10",
    selected_fields=("col1", "col2"),
    snapshot_id=123456789,
    limit=1000
)

Read a table using a new PyIceberg catalog with custom properties.

import bodo.pandas as bd
import pyiceberg.catalog

df = bd.read_iceberg(
    table_identifier="my_schema.my_table",
    catalog_properties={
        pyiceberg.catalog.PY_CATALOG_IMPL: "bodo.io.iceberg.catalog.dir.DirCatalog",
        pyiceberg.catalog.WAREHOUSE_LOCATION: path_to_warehouse_dir,
    }
)

Read a table from an S3 Tables Bucket using the location parameter.

import bodo.pandas as bd

df = bd.read_iceberg(
    table_identifier="my_table",
    location="arn:aws:s3tables:<region>:<account_number>:my-bucket/my-table"
)

bodo.pandas.read_iceberg_table¶

bodo.pandas.read_iceberg_table(
    table: pyiceberg.table.Table,
) -> BodoDataFrame

Creates a BodoDataFrame object for reading from an Iceberg table lazily.

Warning

This function is not part of the Pandas API and is specific to Bodo.

Parameters: table_identifier: pyiceberg.table.Table: PyIceberg Table object to read with Bodo.
Returns: BodoDataFrame

Examples

Simple read of a local table stored in a sql catalog:

from pyiceberg.catalog import load_catalog

warehouse_path = "/tmp/warehouse"
catalog = load_catalog(
    "default",
    **{
        'type': 'sql',
        "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
        "warehouse": f"file://{warehouse_path}",
    },
)
table = catalog.load_table("my_schema.my_table")
df = bodo.pandas.read_iceberg_table(table)