Bodo 2021.1 Release (Date: 1/26/2021)¶
This release includes many new features, bug fixes and performance improvements. Overall, 61 code patches were merged since the last release.
New Features and Improvements¶
-
Connectors:
- Support filter pushdown when reading partitioned parquet
datasets: at compile time, Bodo detects if filters are applied
to a dataframe after
read_parquet
, and generates code that applies those filters at read time so that only the required parquet files are read. - Support for
Series.to_csv()
- Supports passing
file
anddtype
arguments ofnp.fromfile
as kwargs.
- Support filter pushdown when reading partitioned parquet
datasets: at compile time, Bodo detects if filters are applied
to a dataframe after
-
Support for f-strings in Bodo jitted functions
-
Support passing Bodo distributed JIT functions to other Bodo JIT functions
-
Pandas coverage:
- Support groupby with
pd.NamedAgg()
- Support for
groupby.size
- Support for
groupby.shift
- Match input row order of pandas in
groupby.apply
when applicable - Support
min_periods
in rolling calls - Support passing a dictionary of data types to
df.astype()
- Support dataframe setitem of multiple columns. For example:
df[["A", "B"]] = 1.3
- Support for
Index.get_loc()
- Support
ddof
argument (delta degrees of freedom) ofSeries.cov
- Support
Series.is_monotonic
property - Initial support for dictionaries in
Series.replace
- Support
Series.reset_index(drop=True)
- Support level argument with all levels in
reset_index()
- Several documentation improvements
- Support groupby with
-
Scikit-learn:
- Support for
sklearn.model_selection.train_test_split
inside jit functions. - Support for
sklearn.preprocessing.MinMaxScaler
inside jit functions.
- Support for