Skip to content

Bodo 2024.5 Release (Date: 5/3/2024)

New Features:

  • Added an environment variable BODO_SQL_STYLE to control some of the defaults for BodoSQL’s SQL dialect. The default is SNOWFLAKE which uses the Snowflake protocols for identifier case sensitivity and null ordering defaults. Another option for the environment variable is SPARK which uses spark’s defaults for identifier case sensitivity and null ordering defaults. The environment variable’s value is not case-sensitive.
  • Added support for groupby sum of decimal values
  • Added support for writing puffin files with Iceberg writes by setting BODO_ENABLE_THETA_SKETCHES=1. A future release will enable this by default.
  • Added support for casting between decimal values with different precision and scale
  • Added support for multiplication of two decimal scalars/arrays.
  • Added support for CREATE SCHEMA and DROP SCHEMA commands in Iceberg and Snowflake

Performance Improvements:

  • Slightly reduced the total number of metadata queries made to Snowflake during compilation time to determine when a string column should be dictionary encoded by removing unnecessary/redundant requests.

Bug Fixes:

  • Fixed a bug that caused some Iceberg nullability filters to not be pushed down.
  • Fixed a rarely occurring segfault when gathering a map array.
  • Fixed an issue that caused an error while loading string columns from Snowflake managed Iceberg tables.
  • Fixed some bugs in handling nested-types in joins.

Dependency Upgrades:

  • Upgraded to Iceberg 1.5.1

2024.5.1 New Features and Improvements

New Features:

  • Support MIN and MAX on string columns when GROUP BY is not provided.

Bug Fixes:

  • Adjusted BodoSQL compile-time Snowflake metadata requests to avoid an error due to changing formats of Snowflake result sets for certain description queries.

2024.5.2 New Features and Improvements

New Features:

  • Update BodoSQL plans to contain an explicit Cache Node to indicate when the planner is reusing part of a plan from a cached result.
  • Simplified plans by concatenating join filters into a single node in BodoSQL plans.
  • Updated BodoSQL plans to clarify when computation is being done by BodoSQL via updates to each plan operator’s name.
  • Enabled automatic creation or updating of Theta Sketches for columns with certain data types in BodoSQL when an Iceberg table is created with CREATE TABLE AS SELECT or updated with INSERT INTO. The Theta Sketches are written to a Puffin file during an Iceberg write. See the Bodo documentation on Puffin files for more details, including how to disable this feature.
  • Support casting floats to decimals.
  • Support multiplying integers and decimals.
  • Users can now supply statistics for Parquet datasets when using the TablePath API. This can significantly improve the quality of the SQL plans in many cases.
  • Support for CREATE VIEW with the Snowflake Catalog

Performance Improvements:

  • Support for merging aggregates in the planner to avoid unnecessary aggregations.

Bug Fixes:

  • Output of a SUM aggregation with a GROUP BY clause on integer columns is now up-casted to int64 to prevent overflows.
  • Made loading from UDF/UDTF information from Snowflake more robust to handle future Snowflake changes to metadata query outputs.
  • Fixed a gap where duplicate streaming joins wouldn’t be cached.

2024.5.3/2024.5.4 New Features and Improvements

New Features:

  • Added support for CREATE [OR REPLACE] VIEW for Snowflake catalogs and Iceberg catalogs that support views.
  • Added support for reading Iceberg views from catalogs that support views.
  • Changed the way that JavaScript UDFs are displayed in the emitted SQL plans so that it is possible to tell which JavaScript UDF is being called. Previously, all such UDFs would just display as a function call SNOWFLAKE_NATIVE_UDF, but now they display as SNOWFLAKE_NATIVE_JAVASCRIPT_UDF::<function_name>.
  • Added support for join filters with TablePath and Local Tables
  • Added Iceberg REST catalog support in pandas APIs
  • Added BodoSQL TabularCatalog to connect to Tabular's Iceberg REST catalog

Performance Improvements:

  • Improved performance of decimal multiplication.
  • Improved the propagation NDV estimates of the planner across filters with an IS NOT NULL condition.
  • Bodo is now compiled with link-time optimizations by default, providing a ~5% performance boost.

Bug Fixes:

  • Fixed a bug where join filters pushed into a subsequent join wouldn’t always use the bloom filter.
  • Fixes a bug in ROUND which could lead to an incorrect result in case of overflows.