2025)¶

🎉 Highlights¶

In this release, we are excited to introduce the first experimental version of the Bodo DataFrame library which is a drop-in replacement for Pandas. The current release has an initial feature set including parquet read, filtering, projection, UDFs, lazy evaluation, query plan optimization, and streaming parallel execution (both local and clusters). See our blog “Rethinking DataFrames” for more details and refer to our docs page for supported features and examples.

✨ New Features¶

Added support for read_parquet for reading parquet files into Bodo DataFrames and from_pandas for converting Pandas DataFrames into Bodo DataFrames.
Added support for Series.map and DataFrame.apply as well as Series string methods Series.str.lower and Series.str.strip
Added support for DataFrame and Series head and limit pushdown.
Added support for column projections e.g. df[“A”] and projection pushdown
Added support filtering e.g. df[df.A < 10] and filter pushdown.
Added support for setting columns in a DataFrame.
Added graceful fallbacks to Pandas for cases the DataFrame library does not support yet.
Added support for optimizing query plans using DuckDB’s optimizer.
Added support for streaming parallel execution to allow seamless scaling and prevent out of memory issues.

🐛 Bug Fixes¶

Improve error messages for Iceberg write schema validation.
Fix bugs related to distribution of arguments in Spawn mode.

⚙️ Dependency Upgrades¶

Added DuckDB as a vendored dependency for query optimization. We have removed some of the code we don’t need and plan to remove more going forward.
Upgraded Numba version to 0.61.2