Introduction to Iceberg in Bodo¶
Apache Iceberg is an open table format designed for storing large datasets as a lakehouse. With Iceberg, data stored in open-source file formats in a data lake (e.g. S3) can be used like a data warehouse. This solves many of the problems of traditional data lakes, such as:
- ACID (Atomicity, Consistency, Isolation, and Durability) transaction compliance
- Evolving table schemas
- Consistent metadata storage formats
- Scalable reads and writes at scale
- Time travel
Bodo has first-class read and write support for Iceberg tables in both Python and SQL. Bodo supports Iceberg tables that use the Apache Parquet file format.
Note
Iceberg support is generally available as of v2024.4. If you are using a previous alpha version, we recommend that you upgrade.
Getting Started¶
The Iceberg Connector comes pre-installed on the Bodo Platform for immediate use.
For a general introduction on how to use Iceberg in Bodo, take a look at our quickstart with Iceberg.
Otherwise, start by installing the bodo-iceberg-connector
package from Conda.
Supported Iceberg Catalogs¶
These are the Iceberg catalogs supported in Bodo Python and SQL:
Catalog Name | Bodo Python Support | BodoSQL Support | Additional Notes |
---|---|---|---|
HadoopCatalog | Yes | Yes, via the FileSystemCatalog | Local and S3 Support |
Snowflake's Managed Iceberg Catalog | Yes | Yes, via the SnowflakeCatalog | Integrated into BodoSQL's Snowflake support |
Tabular's RESTCatalog | Yes | Yes, via the TabularCatalog | Only tested on S3 |
GlueCatalog | Yes | Yes, via TablePath | |
HiveCatalog | Yes | Yes, via TablePath |
Limitations and Considerations¶
Here are the following limitations when working with Iceberg tables in Bodo:
Iceberg Features¶
- Bodo only supports data files in the Parquet format. Avro and ORC is currently unsupported.
- Bodo can read from V1 tables, but not write to them.
- Bodo can't read or write Iceberg columns of type UUID and FIXED.
- Bodo does not support reading or writing delete files. Thus, it does not support Merge-on-Read yet.
SQL Features¶
- Iceberg tables do not support the
TEMPORARY
orTRANSIENT
options when creating tables. - The Iceberg View spec is not supported right now. In the case of the:
- Filesystem Catalog: View names will be undefined.
- Snowflake Catalog: View names will be defined if there is a definition in Snowflake. Otherwise, it will be undefined.