Bodo 2020.12 Release (Date: 12/30/2020)¶
This release includes many new features, bug fixes and performance improvements. Overall, 60 code patches were merged since the last release.
New Features and Improvements¶
-
Bodo is updated to use Numba 0.52 (latest)
-
Support for reading CSV and Parquet from Azure Data Lake Storage (ADLS)
-
- Improved support for UDFs
-
- More robust user function handling
- Improved support for date/time data types in UDFs
-
- Improved support for rolling window functions
-
- Support
rawargument ofapply() - Support column selection from rolling objects
- Support for nullable int values
- Support
-
- Pandas coverage:
-
- Support for
groupby.apply - Support for groupby rolling functions
- Improved support for dataframe indexing using df.loc/iloc
- Improve dtype handling in
read_csv - Support for
Series.mask - Improved robustness for highly skewed string data (e.g. most of string data is on a few processes due to uneven data distribution)
- Support for dataframes with repeated column names
- Support for
datetime.datearrays as Index inpivot_tableand as argument topd.DatetimeIndex - Improved error checking in Pandas implementations
- Unroll constant loops for type stability in more cases
- Support for
-
- Numpy coverage:
-
- Support for
np.hstack
- Support for
-
- Scikit-learn:
-
- Support for
sklearn.preprocessing.StandardScalerinside jit functions.
- Support for