Skip to content

Bodo 2024.9 Release (Date: 9/25/2024)

New Features:

  • Added support for pd.Series.argmin, pd.Series.argmax, pd.Series.str.removeprefix, pd.Series.str.removesuffix, pd.Series.str.casefold and Series.str.fullmatch.
  • Added support for pd.Series.str.partition with expand=True.
  • Added support for support HAVERSINE with Decimal input data type.
  • Changed Bodo logger defaults to stdout instead of stderr.

Performance Improvements:

  • Changed Iceberg write to use Arrow azurefs instead of hadoop.
  • Changed to use Iceberg metadata instead of Parquet metadata for file scan planning to speed up Iceberg reads overall.
  • Added ability to fetch metadata for Snowflake-managed Iceberg tables at the beginning of query execution and in-parallel for faster Iceberg file scan planning.
  • Added streaming support for the window functions COUNT(X), COUNT_IF, BOOLAND_AGG, BOOLOR_AGG, BITAND_AGG, BITOR_AGG and BITXOR_AGG.
  • Added streaming support for the window functions LEAD, LAG and NTILE when a PARTITION BY clause is provided.
  • Added streaming support for the window functions FIRST_VALUE, LAST_VALUE, ANY_VALUE, MIN, and MAX on numeric data.
  • Ensured BodoSQL decomposes the window functions PERCENT_RANK, CUME_DIST and RATIO_TO_REPORT into other window functions that can be computed together with streaming.
  • Enabled computation of multiple window functions at once while streaming.
  • Enabled window functions computed with an OVER () window in streaming to spill data to disk, reducing peak memory utilization.
  • Improved the quality of BodoSQL planner to reduce redundant computation.
  • Added various optimizations for the streaming sort operator.
  • Made the BodoSQL planner more aggressive with eliminating common subexpressions that are not top-level expressions.

Bug Fixes:

  • Improved the amount of possible query decorrelation in BodoSQL.
  • Fixed a bug in Snowflake-managed Iceberg table writer where the catalog integration creation could fail in the presence of another concurrent writer.
  • Fixed various bugs in the streaming sort operator.
  • Fixed behavior of pd.Series.str.split when n>=1 but the delimiter is not provided.
  • Improved stability when reading from CSV files.

Dependency Upgrades:

  • Upgraded to Pandas 2.2.