2024)¶

Expanded decimal support:
- Added native decimal support for addition/subtraction.
- Added native decimal support for MEDIAN with GROUP BY.
- Added native decimal support for the following numeric functions: ATAN, ATAN2, ATANH, COS, COSH, COT, DEGREES, RADIANS, SIN, SINH, TAN, TANH, EXP, POWER, LOG, LN, SQRT, SQUARE, ROUND, ABS, DIV0, SIGN.
- Support for binary operations between decimal and integer.
Extended use of low-ndv IN join filters to Iceberg.
Generate Join Filters with empty build tables and Snowflake IO.
Support for Join Filters with Interval Joins.
Streaming support for several more aggregate patterns, especially with PIVOT.
Streaming support for grouping sets without the empty set.
Support Create Table As Select (CTAS) with TABLE COMMENTS, COLUMN COMMENTS, and TBLPROPERTIES for Iceberg.

Revamp shuffle in SQL join and groupby to use non-blocking MPI which can improve performance significantly.
Improved streaming performance for more rank window functions (RANK, PERCENT_RANK, CUME_DIST, ROW_NUMBER).
Added streaming support for MAX(X) OVER (), MIN(X) OVER (), SUM(X) OVER (), and COUNT(X) OVER ().
Expanded sort for the streaming sort based implementation of window functions to all array types.
Started decomposing AVG, var/std functions, covar functions, and CORR into combinations of sum/count to both allow for decimal support, and also allow for increased re-use of computations when multiple such functions are present.
Significant improvements to Iceberg Scan Planning performance such as fetching the parquet metadata in parallel.
Improved shuffle performance for variable length types (string/array).
Improved BodoSQL’s ability to prune empty plan sections.
Improved performance of decimal to string and decimal to double casting.

Fixed bug related to string columns in streaming join sometimes causing segmentation faults.
Fixed bug when calling DATEADD (and similar functions) with month/quarter/year units when the input date was a leapday.
Fixed bug in verbose mode that would cause incorrect timer information to be displayed for non-streaming rel nodes.
Fixed bug writing where writing to an iceberg table with a non-iceberg type (int8/uint8) caused an error instead of an upcast.
Fixed a bug that would cause SHOW TABLES commands to error when the tables had a large number of rows/bytes.
Fixed a bug in comparisons for elements of nested arrays with dictionary-encoded data.

2024.8.1 New Features and Improvements¶

Added support for storage account name and storage account key and VM-identities when using Azure.
Added support for SUM on decimals when invoked as a window function or a regular aggregation without groupby.
Added support for VAR_POP, VAR_SAMP, STDDEV_POP, STDDEV_SAMP, COVAR_POP, COVAR_SAMP, CORR, PERCENTILE_DISC and PERCENTILE_CONT on decimals.
Added support for FACTORIAL, CEIL, FLOOR, ROUND, and TRUNC on decimals.
Added support for DIV0NULL.

SQL Planner improvements to better handle use of NDV values and nullability of functions.
Ensured BodoSQL now decomposes VAR_POP, VAR_SAMP, STDDEV_POP, STDDEV_SAMP, COVAR_POP, COVAR_SAMP and CORR into combinations of SUM and COUNT to improve re-use of computations across multiple functions.
Added streaming support for AVG(X) OVER () and COUNT(*) OVER ().
Added streaming support for SUM as a window function when there are partitions but no frames.
Improved performance of reading string columns for Iceberg tables. This is only available on the Bodo Cloud Platform at this stage.
Improved robustness when pushing down very large IN queries with Iceberg.
Enabled pruning an entire Iceberg table via a JoinFilter with an empty build table.
Enabled rewriting COUNT(DISTINCT) to use an internal optimization via grouping sets that allows more equal work distribution across cores.
Improve performance of reading small Iceberg tables.

Fixed bug enabling streaming for CUME_DIST and PERCENT_RANK when there are no partition columns.
Arrays of tuple values are now returned to regular Python properly as arrays of tuples instead of arrays of structs.
Fixed bugs that prevented compiling complex nested UDF inlining cases.