Bodo 2020.11 Release (Date: 11/19/2020)¶
This release includes many new features, bug fixes and performance improvements. Overall, 126 code patches were merged since the last release.
New Features and Improvements¶
-
Bodo is updated to use Apache Arrow 2.0 (latest)
-
Performance and memory optimizations
- Significant memory usage optimizations for several operations involving string arrays
- Up to 2x speedup for many string operations such as
Series.str.replace/get/containsandgroupby.sum()
-
User-defined functions (UDFs)
- Support for returning datafarames from
DataFrame.apply()andSeries.apply() - Support for returning nested arrays
- Support for returning datafarames from
-
Caching: for Bodo functions that receive CSV and JSON file names as string arguments, the cache will now be reused when file name arguments differ but have the same dataset type (schema).
-
Support for distributed deep learning with Tensorflow and PyTorch: https://docs.bodo.ai/latest/source/dl.html
-
Pandas coverage:
- Support for tuple values in Series and DataFrame columns
- Improvements to error checking and handling
- Automatic unrolling of loops over dataframe columns when necessary for type stability
- Support integer column names for Dataframes
- Support for
pd.Timedeltavalues - Support for
pd.tseries.offsets.DateOffsetandpd.tseries.offsets.Monthend - Support for Series.dt, Timestamp, and DateTimeIndex attributes
(
is_month_start,is_month_end,is_quarter_start,is_quarter_end,is_year_start,is_year_end,week,weekofyear,weekday) - Support for Series.dt and Timestamp
normalizemethod - Support for
Timestamp.componentsandTimestamp.strftime - Support for
Series.dt.ceilandSeries.dt.round - Support for
pd.to_timedelta - Support
Series.replacefor categorical arrays wherevalueandto_replaceare scalars or lists - Support for comparison operators on Decimal types
- Support for Series.add() with String, datetime, and timedelta
- Support for Series.mul() with string and int literal
- Support for setting values in categorical arrays
- Initial support for
pd.get_dummies() - Support for
Series.groupby()
-
Scikit-learn: the following classes and functions are supported inside jit functions:
sklearn.linear_model.LinearRegressionsklearn.linear_model.LogisticRegressionsklearn.linear_model.Ridgesklearn.linear_model.Lassosklearn.svm.LinearSVCsklearn.naive_bayes.MultinomialNBsklearn.metrics.accuracy_scoresklearn.metrics.mean_squared_errorsklearn.metrics.mean_absolute_error
-
XGBoost: Training XGBoost model (with Scitkit-learn like API) is now supported inside jit functions:
xgboost.XGBClassifierxgboost.XGBRegressor
Visit <https://docs.bodo.ai/latest/source/ml.htmlfor more information about supported ML functions.
-
NumPy coverage:
- Support for
numpy.anyandnumpy.allfor all array types - Support for
numpy.cbrt - Support for
numpy.linspaceargumentsendpoint,retstep, anddtype np.argminwith axis=1- Support for
np.float32(str)
- Support for
-
Support for
str.format,math.factorial,zlib.crc32