Bodo 2024.8 Release (Date: 8/6/2024)¶
New Features:¶
- Expanded decimal support:
- Added native decimal support for addition/subtraction.
- Added native decimal support for
MEDIAN
withGROUP BY
. - Added native decimal support for the following numeric functions:
ATAN
,ATAN2
,ATANH
,COS
,COSH
,COT
,DEGREES
,RADIANS
,SIN
,SINH
,TAN
,TANH
,EXP
,POWER
,LOG
,LN
,SQRT
,SQUARE
,ROUND
,ABS
,DIV0
,SIGN
. - Support for binary operations between decimal and integer.
- Extended use of low-ndv IN join filters to Iceberg.
- Generate Join Filters with empty build tables and Snowflake IO.
- Support for Join Filters with Interval Joins.
- Streaming support for several more aggregate patterns, especially with
PIVOT
. - Streaming support for grouping sets without the empty set.
- Support Create Table As Select (CTAS) with
TABLE COMMENTS
,COLUMN COMMENTS
, andTBLPROPERTIES
for Iceberg.
Performance Improvements:¶
- Revamp shuffle in SQL join and groupby to use non-blocking MPI which can improve performance significantly.
- Improved streaming performance for more rank window functions (
RANK
,PERCENT_RANK
,CUME_DIST
,ROW_NUMBER
). - Added streaming support for
MAX(X) OVER ()
,MIN(X) OVER ()
,SUM(X) OVER ()
, andCOUNT(X) OVER ()
. - Expanded sort for the streaming sort based implementation of window functions to all array types.
- Started decomposing
AVG
, var/std functions, covar functions, andCORR
into combinations of sum/count to both allow for decimal support, and also allow for increased re-use of computations when multiple such functions are present. - Significant improvements to Iceberg Scan Planning performance such as fetching the parquet metadata in parallel.
- Improved shuffle performance for variable length types (string/array).
- Improved BodoSQL’s ability to prune empty plan sections.
- Improved performance of decimal to string and decimal to double casting.
Bug Fixes:¶
- Fixed bug related to string columns in streaming join sometimes causing segmentation faults.
- Fixed bug when calling
DATEADD
(and similar functions) with month/quarter/year units when the input date was a leapday. - Fixed bug in verbose mode that would cause incorrect timer information to be displayed for non-streaming rel nodes.
- Fixed bug writing where writing to an iceberg table with a non-iceberg type (
int8
/uint8
) caused an error instead of an upcast. - Fixed a bug that would cause
SHOW TABLES
commands to error when the tables had a large number of rows/bytes. - Fixed a bug in comparisons for elements of nested arrays with dictionary-encoded data.
Dependency Upgrades:¶
- Upgraded to Numba 0.60.
2024.8.1 New Features and Improvements¶
New Features:¶
- Added support for storage account name and storage account key and VM-identities when using Azure.
- Added support for
SUM
on decimals when invoked as a window function or a regular aggregation without groupby. - Added support for
VAR_POP
,VAR_SAMP
,STDDEV_POP
,STDDEV_SAMP
,COVAR_POP
,COVAR_SAMP
,CORR
,PERCENTILE_DISC
andPERCENTILE_CONT
on decimals. - Added support for
FACTORIAL
,CEIL
,FLOOR
,ROUND
, andTRUNC
on decimals. - Added support for
DIV0NULL
.
Performance Improvements:¶
- SQL Planner improvements to better handle use of NDV values and nullability of functions.
- Ensured BodoSQL now decomposes
VAR_POP
,VAR_SAMP
,STDDEV_POP
,STDDEV_SAMP
,COVAR_POP
,COVAR_SAMP
andCORR
into combinations ofSUM
andCOUNT
to improve re-use of computations across multiple functions. - Added streaming support for
AVG(X) OVER ()
andCOUNT(*) OVER ()
. - Added streaming support for
SUM
as a window function when there are partitions but no frames. - Improved performance of reading string columns for Iceberg tables. This is only available on the Bodo Cloud Platform at this stage.
- Improved robustness when pushing down very large IN queries with Iceberg.
- Enabled pruning an entire Iceberg table via a JoinFilter with an empty build table.
- Enabled rewriting
COUNT(DISTINCT)
to use an internal optimization via grouping sets that allows more equal work distribution across cores. - Improve performance of reading small Iceberg tables.
Bug Fixes:¶
- Fixed bug enabling streaming for
CUME_DIST
andPERCENT_RANK
when there are no partition columns. - Arrays of tuple values are now returned to regular Python properly as arrays of tuples instead of arrays of structs.
- Fixed bugs that prevented compiling complex nested UDF inlining cases.
Dependency Upgrades:¶
- Upgraded to Calcite 1.37.
- Upgraded to Arrow 17.