Bodo 2020.11 Release (Date: 11/19/2020)¶
This release includes many new features, bug fixes and performance improvements. Overall, 126 code patches were merged since the last release.
New Features and Improvements¶
- 
Bodo is updated to use Apache Arrow 2.0 (latest)
 - 
Performance and memory optimizations
- Significant memory usage optimizations for several operations involving string arrays
 - Up to 2x speedup for many string operations such as
    
Series.str.replace/get/containsandgroupby.sum() 
 - 
User-defined functions (UDFs)
- Support for returning datafarames from 
DataFrame.apply()andSeries.apply() - Support for returning nested arrays
 
 - Support for returning datafarames from 
 - 
Caching: for Bodo functions that receive CSV and JSON file names as string arguments, the cache will now be reused when file name arguments differ but have the same dataset type (schema).
 - 
Support for distributed deep learning with Tensorflow and PyTorch: https://docs.bodo.ai/latest/source/dl.html
 - 
Pandas coverage:
- Support for tuple values in Series and DataFrame columns
 - Improvements to error checking and handling
 - Automatic unrolling of loops over dataframe columns when necessary for type stability
 - Support integer column names for Dataframes
 - Support for 
pd.Timedeltavalues - Support for 
pd.tseries.offsets.DateOffsetandpd.tseries.offsets.Monthend - Support for Series.dt, Timestamp, and DateTimeIndex attributes
    (
is_month_start,is_month_end,is_quarter_start,is_quarter_end,is_year_start,is_year_end,week,weekofyear,weekday) - Support for Series.dt and Timestamp 
normalizemethod - Support for 
Timestamp.componentsandTimestamp.strftime - Support for 
Series.dt.ceilandSeries.dt.round - Support for 
pd.to_timedelta - Support 
Series.replacefor categorical arrays wherevalueandto_replaceare scalars or lists - Support for comparison operators on Decimal types
 - Support for Series.add() with String, datetime, and timedelta
 - Support for Series.mul() with string and int literal
 - Support for setting values in categorical arrays
 - Initial support for 
pd.get_dummies() - Support for 
Series.groupby() 
 - 
Scikit-learn: the following classes and functions are supported inside jit functions:
sklearn.linear_model.LinearRegressionsklearn.linear_model.LogisticRegressionsklearn.linear_model.Ridgesklearn.linear_model.Lassosklearn.svm.LinearSVCsklearn.naive_bayes.MultinomialNBsklearn.metrics.accuracy_scoresklearn.metrics.mean_squared_errorsklearn.metrics.mean_absolute_error
 - 
XGBoost: Training XGBoost model (with Scitkit-learn like API) is now supported inside jit functions:
xgboost.XGBClassifierxgboost.XGBRegressor
Visit <https://docs.bodo.ai/latest/source/ml.htmlfor more information about supported ML functions.
 - 
NumPy coverage:
- Support for 
numpy.anyandnumpy.allfor all array types - Support for 
numpy.cbrt - Support for 
numpy.linspaceargumentsendpoint,retstep, anddtype np.argminwith axis=1- Support for 
np.float32(str) 
 - Support for 
 - 
Support for
str.format,math.factorial,zlib.crc32