Bodo 2024.5 Release (Date: 5/3/2024)¶
New Features:¶
- Added an environment variable
BODO_SQL_STYLE
to control some of the defaults for BodoSQL’s SQL dialect. The default isSNOWFLAKE
which uses the Snowflake protocols for identifier case sensitivity and null ordering defaults. Another option for the environment variable isSPARK
which uses spark’s defaults for identifier case sensitivity and null ordering defaults. The environment variable’s value is not case-sensitive. - Added support for groupby sum of decimal values
- Added support for writing puffin files with Iceberg writes by setting
BODO_ENABLE_THETA_SKETCHES=1
. A future release will enable this by default. - Added support for casting between decimal values with different precision and scale
- Added support for multiplication of two decimal scalars/arrays.
- Added support for
CREATE SCHEMA
andDROP SCHEMA
commands in Iceberg and Snowflake
Performance Improvements:¶
- Slightly reduced the total number of metadata queries made to Snowflake during compilation time to determine when a string column should be dictionary encoded by removing unnecessary/redundant requests.
Bug Fixes:¶
- Fixed a bug that caused some Iceberg nullability filters to not be pushed down.
- Fixed a rarely occurring segfault when gathering a map array.
- Fixed an issue that caused an error while loading string columns from Snowflake managed Iceberg tables.
- Fixed some bugs in handling nested-types in joins.
Dependency Upgrades:¶
- Upgraded to Iceberg 1.5.1
2024.5.1 New Features and Improvements¶
New Features:¶
- Support
MIN
andMAX
on string columns when GROUP BY is not provided.
Bug Fixes:¶
- Adjusted BodoSQL compile-time Snowflake metadata requests to avoid an error due to changing formats of Snowflake result sets for certain description queries.
2024.5.2 New Features and Improvements¶
New Features:¶
- Update BodoSQL plans to contain an explicit Cache Node to indicate when the planner is reusing part of a plan from a cached result.
- Simplified plans by concatenating join filters into a single node in BodoSQL plans.
- Updated BodoSQL plans to clarify when computation is being done by BodoSQL via updates to each plan operator’s name.
- Enabled automatic creation or updating of Theta Sketches for columns with certain data types in BodoSQL when an Iceberg table is created with
CREATE TABLE AS SELECT
or updated withINSERT INTO
. The Theta Sketches are written to a Puffin file during an Iceberg write. See the Bodo documentation on Puffin files for more details, including how to disable this feature. - Support casting floats to decimals.
- Support multiplying integers and decimals.
- Users can now supply statistics for Parquet datasets when using the TablePath API. This can significantly improve the quality of the SQL plans in many cases.
- Support for
CREATE VIEW
with the Snowflake Catalog
Performance Improvements:¶
- Support for merging aggregates in the planner to avoid unnecessary aggregations.
Bug Fixes:¶
- Output of a
SUM
aggregation with aGROUP BY
clause on integer columns is now up-casted to int64 to prevent overflows. - Made loading from UDF/UDTF information from Snowflake more robust to handle future Snowflake changes to metadata query outputs.
- Fixed a gap where duplicate streaming joins wouldn’t be cached.
2024.5.3/2024.5.4 New Features and Improvements¶
New Features:¶
- Added support for
CREATE [OR REPLACE] VIEW
for Snowflake catalogs and Iceberg catalogs that support views. - Added support for reading Iceberg views from catalogs that support views.
- Changed the way that JavaScript UDFs are displayed in the emitted SQL plans so that it is possible to tell which JavaScript UDF is being called. Previously, all such UDFs would just display as a function call
SNOWFLAKE_NATIVE_UDF
, but now they display asSNOWFLAKE_NATIVE_JAVASCRIPT_UDF::<function_name>
. - Added support for join filters with TablePath and Local Tables
- Added Iceberg REST catalog support in pandas APIs
- Added BodoSQL TabularCatalog to connect to Tabular's Iceberg REST catalog
Performance Improvements:¶
- Improved performance of decimal multiplication.
- Improved the propagation NDV estimates of the planner across filters with an
IS NOT NULL
condition. - Bodo is now compiled with link-time optimizations by default, providing a ~5% performance boost.
Bug Fixes:¶
- Fixed a bug where join filters pushed into a subsequent join wouldn’t always use the bloom filter.
- Fixes a bug in ROUND which could lead to an incorrect result in case of overflows.