2025)¶

🎉 Highlights¶

This release, we are excited to significantly improve the responsiveness of Bodo DataFrames with lazy JIT imports, optimize performance with Common Table Expressions (CTEs), as well as upgrade to Arrow 21.

✨ New Features¶

Getting the length of a BodoDataFrame or BodoSeries now returns a lazily evaluated BodoScalar.
Add support for subset argument to drop_duplicates.

🏎️ Performance Improvements¶

Support lazy BodoScalar binary operations for better optimizations.
Recognize duplicate computations in execution trees and execute them only once using Common Table Expressions (CTEs).
Support internal gather/scatter calls without JIT for faster response times.
Support Iceberg read/write without JIT import for faster response times.

⚙️ Dependency Changes¶

Upgraded Arrow dependency to 21.0.

2025.10.1¶

✨ New Features¶

Added a torch_train function and supporting functions to automatically distribute and initialize PyTorch training.
Changed Pandas fallback behavior to automatically convert results back to Bodo if possible, so that subsequent operations can continue lazily.
drop_duplicates now supports the subset and keep arguments.
Added Series floor support.
Added Series and expression modulus (%) support.

🏎️ Performance Improvements¶

to_datetime now runs as native code.
Cross-product now supported for common table expressions.
str.match now runs as native code.

🐛 Bug Fixes¶

Series.ai.embed, Series.ai.tokenize, and Series.ai.llm_generate now accept large_string types as input.
Fixed output type of np.asarray on numeric BodoDataFrames.
Better support for the boolean not operator.
Proper support for arithmetic mixing of numeric and boolean values.

2025.10.2¶

✨ New Features¶

Added prepare_dataset to simplify data redistribution and sampling in torch_train training loops.