Bodo 2024.9 Release (Date: 9/25/2024)¶
New Features:¶
- Added support for
pd.Series.argmin,pd.Series.argmax,pd.Series.str.removeprefix,pd.Series.str.removesuffix,pd.Series.str.casefoldandSeries.str.fullmatch. - Added support for
pd.Series.str.partitionwith expand=True. - Added support for support
HAVERSINEwith Decimal input data type. - Changed Bodo logger defaults to stdout instead of stderr.
Performance Improvements:¶
- Changed Iceberg write to use Arrow azurefs instead of hadoop.
- Changed to use Iceberg metadata instead of Parquet metadata for file scan planning to speed up Iceberg reads overall.
- Added ability to fetch metadata for Snowflake-managed Iceberg tables at the beginning of query execution and in-parallel for faster Iceberg file scan planning.
- Added streaming support for the window functions
COUNT(X),COUNT_IF,BOOLAND_AGG,BOOLOR_AGG,BITAND_AGG,BITOR_AGGandBITXOR_AGG. - Added streaming support for the window functions
LEAD,LAGandNTILEwhen aPARTITION BYclause is provided. - Added streaming support for the window functions
FIRST_VALUE,LAST_VALUE,ANY_VALUE,MIN, andMAXon numeric data. - Ensured BodoSQL decomposes the window functions
PERCENT_RANK,CUME_DISTandRATIO_TO_REPORTinto other window functions that can be computed together with streaming. - Enabled computation of multiple window functions at once while streaming.
- Enabled window functions computed with an
OVER ()window in streaming to spill data to disk, reducing peak memory utilization. - Improved the quality of BodoSQL planner to reduce redundant computation.
- Added various optimizations for the streaming sort operator.
- Made the BodoSQL planner more aggressive with eliminating common subexpressions that are not top-level expressions.
Bug Fixes:¶
- Improved the amount of possible query decorrelation in BodoSQL.
- Fixed a bug in Snowflake-managed Iceberg table writer where the catalog integration creation could fail in the presence of another concurrent writer.
- Fixed various bugs in the streaming sort operator.
- Fixed behavior of
pd.Series.str.splitwhenn>=1but the delimiter is not provided. - Improved stability when reading from CSV files.
Dependency Upgrades:¶
- Upgraded to Pandas 2.2.