Skip to content

Introduction to Iceberg in Bodo

Apache Iceberg is an open table format designed for storing large datasets as a lakehouse. With Iceberg, data stored in open-source file formats in a data lake (e.g. S3) can be used like a data warehouse. This solves many of the problems of traditional data lakes, such as:

  • ACID (Atomicity, Consistency, Isolation, and Durability) transaction compliance
  • Evolving table schemas
  • Consistent metadata storage formats
  • Scalable reads and writes at scale
  • Time travel

Bodo has first-class read and write support for Iceberg tables in both Python and SQL. Bodo supports Iceberg tables that use the Apache Parquet file format.

Note

Iceberg support is generally available as of v2024.4. If you are using a previous alpha version, we recommend that you upgrade.

Getting Started

The Iceberg Connector comes pre-installed on the Bodo Platform for immediate use.

For a general introduction on how to use Iceberg in Bodo, take a look at our quickstart with Iceberg.

Otherwise, start by installing the bodo-iceberg-connector package from Conda.

conda install -c bodo.ai bodo-iceberg-connector

Supported Iceberg Catalogs

These are the Iceberg catalogs supported in Bodo Python and SQL:

Catalog Name Bodo Python Support BodoSQL Support Additional Notes
HadoopCatalog Yes Yes, via the FileSystemCatalog Local and S3 Support
Snowflake's Managed Iceberg Catalog Yes Yes, via the SnowflakeCatalog Integrated into BodoSQL's Snowflake support
GlueCatalog Yes Yes, via TablePath
HiveCatalog Yes Yes, via TablePath

Limitations and Considerations

Here are the following limitations when working with Iceberg tables in Bodo:

Iceberg Features

  • Bodo only supports data files in the Parquet format. Avro and ORC is currently unsupported.
  • Bodo can read from V1 tables, but not write to them.
  • Bodo can't read or write Iceberg columns of type UUID and FIXED.
  • Bodo does not support reading or writing delete files. Thus, it does not support Merge-on-Read yet.

SQL Features

  • Iceberg tables do not support the TEMPORARY or TRANSIENT options when creating tables.
  • The Iceberg View spec is not supported right now. In the case of the:
    • Filesystem Catalog: View names will be undefined.
    • Snowflake Catalog: View names will be defined if there is a definition in Snowflake. Otherwise, it will be undefined.