Bodo 2025.10 Release (Date: 10/03/2025)¶
🎉 Highlights¶
This release, we are excited to significantly improve the responsiveness of Bodo DataFrames with lazy JIT imports, optimize performance with Common Table Expressions (CTEs), as well as upgrade to Arrow 21.
✨ New Features¶
- Getting the length of a BodoDataFrame or BodoSeries now returns a lazily evaluated BodoScalar.
- Add support for subset argument to drop_duplicates.
🏎️ Performance Improvements¶
- Support lazy BodoScalar binary operations for better optimizations.
- Recognize duplicate computations in execution trees and execute them only once using Common Table Expressions (CTEs).
- Support internal gather/scatter calls without JIT for faster response times.
- Support Iceberg read/write without JIT import for faster response times.
⚙️ Dependency Changes¶
- Upgraded Arrow dependency to 21.0.
2025.10.1¶
✨ New Features¶
- Added a
torch_trainfunction and supporting functions to automatically distribute and initialize PyTorch training. - Changed Pandas fallback behavior to automatically convert results back to Bodo if possible, so that subsequent operations can continue lazily.
drop_duplicatesnow supports thesubsetandkeeparguments.- Added Series floor support.
- Added Series and expression modulus (%) support.
🏎️ Performance Improvements¶
to_datetimenow runs as native code.- Cross-product now supported for common table expressions.
str.matchnow runs as native code.
🐛 Bug Fixes¶
Series.ai.embed,Series.ai.tokenize, andSeries.ai.llm_generatenow accept large_string types as input.- Fixed output type of
np.asarrayon numeric BodoDataFrames. - Better support for the boolean not operator.
- Proper support for arithmetic mixing of numeric and boolean values.
2025.10.2¶
✨ New Features¶
- Added
prepare_datasetto simplify data redistribution and sampling intorch_traintraining loops.