Skip to content

Bodo 2024.5 Release (Date: 5/3/2024)

New Features:

  • Added an environment variable BODO_SQL_STYLE to control some of the defaults for BodoSQL’s SQL dialect. The default is SNOWFLAKE which uses the Snowflake protocols for identifier case sensitivity and null ordering defaults. Another option for the environment variable is SPARK which uses spark’s defaults for identifier case sensitivity and null ordering defaults. The environment variable’s value is not case-sensitive.
  • Added support for groupby sum of decimal values
  • Added support for writing puffin files with Iceberg writes by setting BODO_ENABLE_THETA_SKETCHES=1. A future release will enable this by default.
  • Added support for casting between decimal values with different precision and scale
  • Added support for multiplication of two decimal scalars/arrays.
  • Added support for CREATE SCHEMA and DROP SCHEMA commands in Iceberg and Snowflake

Performance Improvements:

  • Slightly reduced the total number of metadata queries made to Snowflake during compilation time to determine when a string column should be dictionary encoded by removing unnecessary/redundant requests.

Bug Fixes:

  • Fixed a bug that caused some Iceberg nullability filters to not be pushed down.
  • Fixed a rarely occurring segfault when gathering a map array.
  • Fixed an issue that caused an error while loading string columns from Snowflake managed Iceberg tables.
  • Fixed some bugs in handling nested-types in joins.

Dependency Upgrades:

  • Upgraded to Iceberg 1.5.1

2024.5.1 New Features and Improvements

New Features:

  • Support MIN and MAX on string columns when GROUP BY is not provided.

Bug Fixes:

  • Adjusted BodoSQL compile-time Snowflake metadata requests to avoid an error due to changing formats of Snowflake result sets for certain description queries.

2024.5.2 New Features and Improvements

New Features:

  • Update BodoSQL plans to contain an explicit Cache Node to indicate when the planner is reusing part of a plan from a cached result.
  • Simplified plans by concatenating RuntimeJoinFilters into a single node in BodoSQL plans.
  • Updated BodoSQL plans to clarify when computation is being done by BodoSQL via updates to each plan operator’s name.
  • Enabled automatic creation or updating of Theta Sketches for columns with certain data types in BodoSQL when an Iceberg table is created with CREATE TABLE AS SELECT or updated with INSERT INTO. The Theta Sketches are written to a Puffin file during an Iceberg write. See the Bodo documentation on Puffin files for more details, including how to disable this feature.
  • Support casting floats to decimals.
  • Support multiplying integers and decimals.
  • Users can now supply statistics for Parquet datasets when using the TablePath API. This can significantly improve the quality of the SQL plans in many cases.
  • Support for CREATE VIEW with the Snowflake Catalog

Performance Improvements:

  • Support for merging aggregates in the planner to avoid unnecessary aggregations.

Bug Fixes:

  • Output of a SUM aggregation with a GROUP BY clause on integer columns is now up-casted to int64 to prevent overflows.
  • Made loading from UDF/UDTF information from Snowflake more robust to handle future Snowflake changes to metadata query outputs.
  • Fixed a gap where duplicate streaming joins wouldn’t be cached.