Bodo 2024.8 Release (Date: 8/6/2024)¶
New Features:¶
- Expanded decimal support:
- Added native decimal support for addition/subtraction.
- Added native decimal support for
MEDIANwithGROUP BY. - Added native decimal support for the following numeric functions:
ATAN,ATAN2,ATANH,COS,COSH,COT,DEGREES,RADIANS,SIN,SINH,TAN,TANH,EXP,POWER,LOG,LN,SQRT,SQUARE,ROUND,ABS,DIV0,SIGN. - Support for binary operations between decimal and integer.
- Extended use of low-ndv IN join filters to Iceberg.
- Generate Join Filters with empty build tables and Snowflake IO.
- Support for Join Filters with Interval Joins.
- Streaming support for several more aggregate patterns, especially with
PIVOT. - Streaming support for grouping sets without the empty set.
- Support Create Table As Select (CTAS) with
TABLE COMMENTS,COLUMN COMMENTS, andTBLPROPERTIESfor Iceberg.
Performance Improvements:¶
- Revamp shuffle in SQL join and groupby to use non-blocking MPI which can improve performance significantly.
- Improved streaming performance for more rank window functions (
RANK,PERCENT_RANK,CUME_DIST,ROW_NUMBER). - Added streaming support for
MAX(X) OVER (),MIN(X) OVER (),SUM(X) OVER (), andCOUNT(X) OVER (). - Expanded sort for the streaming sort based implementation of window functions to all array types.
- Started decomposing
AVG, var/std functions, covar functions, andCORRinto combinations of sum/count to both allow for decimal support, and also allow for increased re-use of computations when multiple such functions are present. - Significant improvements to Iceberg Scan Planning performance such as fetching the parquet metadata in parallel.
- Improved shuffle performance for variable length types (string/array).
- Improved BodoSQL’s ability to prune empty plan sections.
- Improved performance of decimal to string and decimal to double casting.
Bug Fixes:¶
- Fixed bug related to string columns in streaming join sometimes causing segmentation faults.
- Fixed bug when calling
DATEADD(and similar functions) with month/quarter/year units when the input date was a leapday. - Fixed bug in verbose mode that would cause incorrect timer information to be displayed for non-streaming rel nodes.
- Fixed bug writing where writing to an iceberg table with a non-iceberg type (
int8/uint8) caused an error instead of an upcast. - Fixed a bug that would cause
SHOW TABLEScommands to error when the tables had a large number of rows/bytes. - Fixed a bug in comparisons for elements of nested arrays with dictionary-encoded data.
Dependency Upgrades:¶
- Upgraded to Numba 0.60.
2024.8.1 New Features and Improvements¶
New Features:¶
- Added support for storage account name and storage account key and VM-identities when using Azure.
- Added support for
SUMon decimals when invoked as a window function or a regular aggregation without groupby. - Added support for
VAR_POP,VAR_SAMP,STDDEV_POP,STDDEV_SAMP,COVAR_POP,COVAR_SAMP,CORR,PERCENTILE_DISCandPERCENTILE_CONTon decimals. - Added support for
FACTORIAL,CEIL,FLOOR,ROUND, andTRUNCon decimals. - Added support for
DIV0NULL.
Performance Improvements:¶
- SQL Planner improvements to better handle use of NDV values and nullability of functions.
- Ensured BodoSQL now decomposes
VAR_POP,VAR_SAMP,STDDEV_POP,STDDEV_SAMP,COVAR_POP,COVAR_SAMPandCORRinto combinations ofSUMandCOUNTto improve re-use of computations across multiple functions. - Added streaming support for
AVG(X) OVER ()andCOUNT(*) OVER (). - Added streaming support for
SUMas a window function when there are partitions but no frames. - Improved performance of reading string columns for Iceberg tables. This is only available on the Bodo Cloud Platform at this stage.
- Improved robustness when pushing down very large IN queries with Iceberg.
- Enabled pruning an entire Iceberg table via a JoinFilter with an empty build table.
- Enabled rewriting
COUNT(DISTINCT)to use an internal optimization via grouping sets that allows more equal work distribution across cores. - Improve performance of reading small Iceberg tables.
Bug Fixes:¶
- Fixed bug enabling streaming for
CUME_DISTandPERCENT_RANKwhen there are no partition columns. - Arrays of tuple values are now returned to regular Python properly as arrays of tuples instead of arrays of structs.
- Fixed bugs that prevented compiling complex nested UDF inlining cases.
Dependency Upgrades:¶
- Upgraded to Calcite 1.37.
- Upgraded to Arrow 17.