Bodo 2021.4 Release (Date: 4/19/2021)¶
This release includes many new features, bug fixes and usability improvements. Overall, 98 code patches were merged since the last release.
New Features and Improvements¶
-
Bodo is available for Windows as a Conda package (similar to Linux and macOS)
-
Removed boost library dependency
-
Many improvements to error checking and reporting, including:
- Internal compiler errors and stack traces are now avoided more effectively (clear errors are thrown)
- Ensure that an error is thrown if user specifies an argument as distributed but it must be replicated
- Improvements in error checking for user-defined functions (UDFs)
-
Connectors:
- Support for writing partitioned Parquet datasets
(
df.to_parquetwithpartition_colsparameter) - Support for S3 anonymous access with
storage_options={"anon": True}inpd.read_parquet() - Parquet read: optimized metadata collection for nested parquet directories (includes hive-partitioned dataset)
- To reduce Parquet read time, schema validation of multi-file
parquet datasets can be disabled with
bodo.parquet_validate_schema=False
- Support for writing partitioned Parquet datasets
(
-
Reduced compilation time for Pandas APIs
-
Improved compilation time for
df.head/tail -
Support for format spec in f-strings, for example:
f"{a:0.0%}" -
Support for arrays in
bodo.rebalance() -
Pandas coverage:
- Support for
df.filterfor filtering columns - Support for
indicator=Trueinpd.merge() - Support for
DataFrame/Series/GroupBy.pipe() - Support for setting dataframe columns using a 2D array
- Support for string and nullable arrays (e.g. pd.Int64Dtype) in
DataFrame/Series.shift() - Support for
pandas.tseries.offsets.MonthBegin Series.whereandSeries.mask: support for nullable arrays (e.g. pd.Int64Dtype)
- Support for
-
Scikit-learn:
- Support for
sklearn.ensemble.RandomForestRegressor
- Support for