Bodo 2020.10 Release (Date: 10/20/2020)¶
This release includes many new features, bug fixes and performance improvements. Overall, 117 code patches were merged since the last release.
New Features and Improvements¶
- 
Initial support for Python classes using bodo.jitclassdecorator.
- 
- Scikit-learn:
- 
 Initial support for these scikit-learn classes: : - `sklearn.linear_model.SGDClassifier` - `sklearn.linear_model.SGDRegressor` - `sklearn.cluster.KMeans` For more information please refer to the documentation [here](https://docs.bodo.ai/latest/source/sklearn.html) - Improved scaling of `RandomForestClassifier` training
- 
Memory management and memory consumption improvements 
- 
- Improvements for User-defined functions (UDFs):
- 
- Compilation errors are now clearly shown for UDFs
- Support more complex UDFs (by running a full compiler pipeline)
- Support passing keyword arguments to UDF in
    DataFrame.apply()andSeries.apply()
- Support much wider range of UDF types in groupby.agg
 
 
- 
- Connectors:
- 
- Improved connector error handling
- Improved performance of pd.read_csv(further improvements in next release)
- pd.read_parquetsupports column containing all NA (null) values
 
 
- 
Caching: for Bodo functions that receive parquet file names as string arguments, the cache will now be reused when file name arguments differ but have the same parquet dataset type (schema). 
- 
Significantly improved the performance of merge/join operations in some cases 
- 
Support for loops over dataframe columns by automatic loop unrolling 
- 
Support using global dataframe/array values inside jit functions 
- 
Performance optimization for the series.str.split().explode()pattern
- 
- Pandas coverage:
- 
- Support setting df.columnsanddf.index
- Support setting values in Categorical arrays
- series.str.split: added support for regular expression and- nparameter
- Series.replacesupport for more array types
- Support pd.series.dt.quarter
- Support series.str.slice_replace
- Support series.str.repeat
- Improved support for df.pivot_tableandpd.crosstab
- Support for Series.notnull
- Support integer label indexing for Dataframes and Series with RangeIndex
- Support setting NoneandOptionalvalues for most arrays
 
- Support setting 
 
- 
- NumPy coverage:
- 
- Support for np.union1d
- np.where,- np.unique,- np.sort,- np.repeat: support for Series and most array types
- Support np.argmaxwithaxis=1
- Support for np.min,np.max,min,max,np.sum,sum,np.prodon nullable arrays
 
- Support for