Introduction to Iceberg in Bodo¶

Apache Iceberg is an open table format designed for storing large datasets as a lakehouse. With Iceberg, data stored in open-source file formats in a data lake (e.g. S3) can be used like a data warehouse. This solves many of the problems of traditional data lakes, such as:

ACID (Atomicity, Consistency, Isolation, and Durability) transaction compliance
Evolving table schemas
Consistent metadata storage formats
Scalable reads and writes at scale
Time travel

Bodo has first-class read and write support for Iceberg tables in both Python and SQL. Bodo supports Iceberg tables that use the Apache Parquet file format.

Note

Iceberg support is generally available as of v2024.4. If you are using a previous alpha version, we recommend that you upgrade.

Getting Started¶

The Iceberg Connector comes pre-installed on the Bodo Platform for immediate use.

For a general introduction on how to use Iceberg in Bodo, take a look at our quickstart with Iceberg.

Otherwise, start by installing the bodo-iceberg-connector package from Conda.

conda install -c bodo.ai bodo-iceberg-connector

Supported Iceberg Catalogs¶

These are the Iceberg catalogs supported in Bodo Python and SQL:

Catalog Name	Bodo Python Support	BodoSQL Support	Additional Notes
HadoopCatalog	Yes	Yes, via the FileSystemCatalog	Local and S3 Support
Snowflake's Managed Iceberg Catalog	Yes	Yes, via the SnowflakeCatalog	Integrated into BodoSQL's Snowflake support
Tabular's RESTCatalog	Yes	Yes, via the TabularCatalog	Only tested on S3
GlueCatalog	Yes	Yes, via TablePath
HiveCatalog	Yes	Yes, via TablePath

Limitations and Considerations¶

Here are the following limitations when working with Iceberg tables in Bodo:

Iceberg Features¶

Bodo only supports data files in the Parquet format. Avro and ORC is currently unsupported.
Bodo can read from V1 tables, but not write to them.
Bodo can't read or write Iceberg columns of type UUID and FIXED.
Bodo does not support reading or writing delete files. Thus, it does not support Merge-on-Read yet.

SQL Features¶

Iceberg tables do not support the TEMPORARY or TRANSIENT options when creating tables.
The Iceberg View spec is not supported right now. In the case of the:
- Filesystem Catalog: View names will be undefined.
- Snowflake Catalog: View names will be defined if there is a definition in Snowflake. Otherwise, it will be undefined.