Bodo 2025.5 Release (Date: 05/16/2025)¶
🎉 Highlights¶
In this release, we are excited to introduce the first experimental version of the Bodo DataFrame library which is a drop-in replacement for Pandas. The current release has an initial feature set including parquet read, filtering, projection, UDFs, lazy evaluation, query plan optimization, and streaming parallel execution (both local and clusters). See our blog “Rethinking DataFrames” for more details and refer to our docs page for supported features and examples.
✨ New Features¶
- Added support for
read_parquet
for reading parquet files into Bodo DataFrames andfrom_pandas
for converting Pandas DataFrames into Bodo DataFrames. - Added support for
Series.map
andDataFrame.apply
as well as Series string methodsSeries.str.lower
andSeries.str.strip
- Added support for DataFrame and Series
head
and limit pushdown. - Added support for column projections e.g.
df[“A”]
and projection pushdown - Added support filtering e.g.
df[df.A < 10]
and filter pushdown. - Added support for setting columns in a DataFrame.
- Added graceful fallbacks to Pandas for cases the DataFrame library does not support yet.
- Added support for optimizing query plans using DuckDB’s optimizer.
- Added support for streaming parallel execution to allow seamless scaling and prevent out of memory issues.
🐛 Bug Fixes¶
- Improve error messages for Iceberg write schema validation.
- Fix bugs related to distribution of arguments in Spawn mode.
⚙️ Dependency Upgrades¶
- Added DuckDB as a vendored dependency for query optimization. We have removed some of the code we don’t need and plan to remove more going forward.
- Upgraded Numba version to 0.61.2