Bodo 2020.08 Release (Date: 08/21/2020)¶
This release includes many new features, bug fixes and performance improvements. Overall, 112 code patches were merged since the last release.
New Features and Improvements¶
Bodo is updated to use the latest versions of Numba, pandas and Arrow:
Support reading and writing Parquet files with columns where values are arrays or structs, which can contain other arrays/structs with arbitrary nesting.
S3 I/O: automatically determine the region of the S3 bucket when reading and writing.
Initial support for scikit-learn RandomForestClassifier (fit, predict and score methods)
Improved caching support (caching @bodo.jit functions with cache=True)
Initial support for arrays of map data structures
bodo.rebalance()function for load balancing dataframes manually if desired
Support setting dataframe column as attribute, for example:
df.B = "AA"
Support DataFrame min/max/sum/prod/mean/median functions with axis=1
pd.concatsupport for mix of Numpy and nullable integer/bool arrays
Support parallel append to dataframes (concatenation reduction)
Improvements and optimizations in user-defined function (UDF) handling
Basic support for
Support calling bodo.jit functions inside prange loops
DataFrame.select_dtypeswith constant strings
df.replace()(scalars and lists)
Support for Series.dt methods:
Improved support for Categorical data types
Improved error handling during I/O when input CSV or Parquet file does not exist
Support pd.concat(axis=1) for dataframes
Significant improvements in compilation time for dataframes with large number of columns
bodo.is_jit_execution()can be used to know if a function is running with Bodo.