Bodo 2024.9 Release (Date: 9/25/2024)¶
New Features:¶
- Added support for
pd.Series.argmin
,pd.Series.argmax
,pd.Series.str.removeprefix
,pd.Series.str.removesuffix
,pd.Series.str.casefold
andSeries.str.fullmatch
. - Added support for
pd.Series.str.partition
with expand=True. - Added support for support
HAVERSINE
with Decimal input data type. - Changed Bodo logger defaults to stdout instead of stderr.
Performance Improvements:¶
- Changed Iceberg write to use Arrow azurefs instead of hadoop.
- Changed to use Iceberg metadata instead of Parquet metadata for file scan planning to speed up Iceberg reads overall.
- Added ability to fetch metadata for Snowflake-managed Iceberg tables at the beginning of query execution and in-parallel for faster Iceberg file scan planning.
- Added streaming support for the window functions
COUNT(X)
,COUNT_IF
,BOOLAND_AGG
,BOOLOR_AGG
,BITAND_AGG
,BITOR_AGG
andBITXOR_AGG
. - Added streaming support for the window functions
LEAD
,LAG
andNTILE
when aPARTITION BY
clause is provided. - Added streaming support for the window functions
FIRST_VALUE
,LAST_VALUE
,ANY_VALUE
,MIN
, andMAX
on numeric data. - Ensured BodoSQL decomposes the window functions
PERCENT_RANK
,CUME_DIST
andRATIO_TO_REPORT
into other window functions that can be computed together with streaming. - Enabled computation of multiple window functions at once while streaming.
- Enabled window functions computed with an
OVER ()
window in streaming to spill data to disk, reducing peak memory utilization. - Improved the quality of BodoSQL planner to reduce redundant computation.
- Added various optimizations for the streaming sort operator.
- Made the BodoSQL planner more aggressive with eliminating common subexpressions that are not top-level expressions.
Bug Fixes:¶
- Improved the amount of possible query decorrelation in BodoSQL.
- Fixed a bug in Snowflake-managed Iceberg table writer where the catalog integration creation could fail in the presence of another concurrent writer.
- Fixed various bugs in the streaming sort operator.
- Fixed behavior of
pd.Series.str.split
whenn>=1
but the delimiter is not provided. - Improved stability when reading from CSV files.
Dependency Upgrades:¶
- Upgraded to Pandas 2.2.