Bodo 2020.08 Release (Date: 08/21/2020)¶
This release includes many new features, bug fixes and performance improvements. Overall, 112 code patches were merged since the last release.
New Features and Improvements¶
- 
Bodo is updated to use the latest versions of Numba, pandas and Arrow:
- Numba 0.51.0
 - pandas 1.1.0
 - Arrow 1.0
 
 - 
Support reading and writing Parquet files with columns where values are arrays or structs, which can contain other arrays/structs with arbitrary nesting.
 - 
S3 I/O: automatically determine the region of the S3 bucket when reading and writing.
 - 
Initial support for scikit-learn RandomForestClassifier (fit, predict and score methods)
 - 
Support
sklearn.metrics.precision_score,sklearn.metrics.recall_scoreandsklearn.metrics.f1_score. - 
Improved caching support (caching
@bodo.jitfunctions with cache=True) - 
Initial support for arrays of map data structures
 - 
Support
countandoffsetarguments ofnp.fromfile - 
New
bodo.rebalance()function for load balancing dataframes manually if desired - 
Support setting dataframe column as attribute, for example:
df.B = "AA" - 
Support DataFrame min/max/sum/prod/mean/median functions with
axis=1 - 
Support
df.loc[:,columns]indexing - 
pd.concatsupport for mix of Numpy and nullable integer/bool arrays - 
Support parallel append to dataframes (concatenation reduction)
 - 
Support
GroupBy.idxminandGroupBy.idxmax - 
Improvements and optimizations in user-defined function (UDF) handling
 - 
Basic support for
Series.where() - 
Support calling bodo.jit functions inside prange loops
 - 
Support
DataFrame.select_dtypeswith constant strings - 
Support
DataFrame.sample - 
Support
Series.replace()anddf.replace()(scalars and lists) - 
Support for Series.dt methods:
total_seconds()andto_pytimedelta() - 
Improved support for Categorical data types
 - 
Support for
pandas.Timestamp.isocalendar() - 
Support
np.digitize() - 
Improved error handling during I/O when input CSV or Parquet file does not exist
 - 
Support pd.concat(axis=1) for dataframes
 - 
Significant improvements in compilation time for dataframes with large number of columns
 - 
bodo.is_jit_execution()can be used to know if a function is running with Bodo.