Bodo 2023.11 Release (Date: 11/07/2023)¶
New Features and Improvements¶
- BodoSQL generated plans have been further optimized to reduce compile time, runtime and memory usage, as well as resolve several bugs pertaining to stability.
- Support more functionality in BodoSQL:
EQUAL_NULL
and<=>
now supported on semi-structured data.- Added support for the semi-structured functions
ARRAY_CONSTRUCT
,ARRAYS_OVERLAP
,ARRAY_POSITION
. CONCAT
(and||
) now support concating binary data.LAST_DAY
now supports unquoted interval literals as arguments.- Support for writing semi-structured array columns and some cases of semi-structured object columns to Snowflake.
- Some casting functions that previously had incorrect behavior with nullable data are now fixed.
2023.11.1 New Features and Improvements¶
Feature Updates:¶
- Added the table function
FLATTEN
which can be used to explode a column of arrays alongside theLATERAL
keyword. Currently allows ARRAY columns being passed in to the INPUT argument, as well as JSON columns under limited circumstances. Currently only allows the defaults for named argumentsPATH
,OUTER
,RECURSIVE
andMODE
. - Added the function
OBJECT_CONSTRUCT_KEEP_NULL
, including the special syntaxOBJECT_CONSTRUCT_KEEP_NULL(*)
, as long as all of the key arguments are string literals. - Added the aggregation function
OBJECT_AGG
which takes in a column of strings and a column of data of any type and combines them into a JSON value where the first column is the keys and the second column is the values. Currently only supported within a GROUP BY clause. - Added the function
OBJECT_DELETE
. Non-literals key strings are partially supported. - Support
ARRAY_CONTAINS
,ARRAY_UNIQUE_AGG
,ARRAY_EXCEPT
,ARRAY_INTERSECTION
,ARRAY_CAT
, andSTRTOK_TO_ARRAY
. - Support for the table function
SPLIT_TO_TABLE
. - Support all sub-second interval units and their aliases.
- Support all interval literal units without quotes.
- Support unquoted date/time units in all functions that accept date/time units.
- Support reading Object and deeply nested semi-structured columns from Snowflake.
Improvements and Bug Fixes:¶
- Enabled streaming in pipelines with semi-structured data.
- Reduced out of memory risk during shuffle.
- Optimized handling of nullable columns during SQL plan optimization.
Dependency Updates:¶
- Upgrade to Calcite 1.34
- Upgrade to Pandas 2
- Upgrade to Cython 3
- Upgrade to HDF5 1.14
2023.11.2 New Features and Improvements¶
Feature Updates:¶
- General improvements to the spill-to-disk functionality.
- Support
ARRAY_COMPACT
. - Support for indexing into array values via
GET()
andarr[idx]
. - Added logging for failed metadata collection during planning. This is available with verbose_level >= 2.
- Support for precision and scale arguments with to_number try_to_number, and aliases.
DATE_FROM_PARTS
/TIME_FROM_PARTS
/TIMESTAMP_FROM_PARTS
& their aliases can take any numeric arguments, as opposed to only integers.
Improvements and Bug Fixes:¶
- Fixed bug where output of certain aggregate calls would have the wrong nullability.
- Fixed various bugs that were limiting ability to call functions on columns of arrays of JSON data.
- Fixed a bug in the Union (Distinct) operator which could lead to a hang in some cases.
- Fixed
TO_ARRAY
andARRAY_TO_STRING
on column inputs and case statement. - Fixed shuffle corner case bug for semi-structured data array nulls.
- Fixes to
COALESCE
typing behavior.
Dependency Updates:¶
- Bodo now uses PyArrow arrays to box/unbox nested arrays with zero-copy (made possible by Pandas 2 upgrade).