Bodo 2020.08 Release (Date: 08/21/2020)¶
This release includes many new features, bug fixes and performance improvements. Overall, 112 code patches were merged since the last release.
New Features and Improvements¶
- 
Bodo is updated to use the latest versions of Numba, pandas and Arrow: - Numba 0.51.0
- pandas 1.1.0
- Arrow 1.0
 
- 
Support reading and writing Parquet files with columns where values are arrays or structs, which can contain other arrays/structs with arbitrary nesting. 
- 
S3 I/O: automatically determine the region of the S3 bucket when reading and writing. 
- 
Initial support for scikit-learn RandomForestClassifier (fit, predict and score methods) 
- 
Support sklearn.metrics.precision_score,sklearn.metrics.recall_scoreandsklearn.metrics.f1_score.
- 
Improved caching support (caching @bodo.jitfunctions with cache=True)
- 
Initial support for arrays of map data structures 
- 
Support countandoffsetarguments ofnp.fromfile
- 
New bodo.rebalance()function for load balancing dataframes manually if desired
- 
Support setting dataframe column as attribute, for example: df.B = "AA"
- 
Support DataFrame min/max/sum/prod/mean/median functions with axis=1
- 
Support df.loc[:,columns]indexing
- 
pd.concatsupport for mix of Numpy and nullable integer/bool arrays
- 
Support parallel append to dataframes (concatenation reduction) 
- 
Support GroupBy.idxminandGroupBy.idxmax
- 
Improvements and optimizations in user-defined function (UDF) handling 
- 
Basic support for Series.where()
- 
Support calling bodo.jit functions inside prange loops 
- 
Support DataFrame.select_dtypeswith constant strings
- 
Support DataFrame.sample
- 
Support Series.replace()anddf.replace()(scalars and lists)
- 
Support for Series.dt methods: total_seconds()andto_pytimedelta()
- 
Improved support for Categorical data types 
- 
Support for pandas.Timestamp.isocalendar()
- 
Support np.digitize()
- 
Improved error handling during I/O when input CSV or Parquet file does not exist 
- 
Support pd.concat(axis=1) for dataframes 
- 
Significant improvements in compilation time for dataframes with large number of columns 
- 
bodo.is_jit_execution()can be used to know if a function is running with Bodo.