2021)¶

This release includes many new features, bug fixes and usability improvements. Overall, 70 code patches were merged since the last release.

New Features and Improvements¶

Bodo is updated to use pandas 1.2 and Arrow 3.0 (latest)
Many improvements to error checking and reporting
Several documentation improvements
Support tuple return from Bodo functions where elements of the tuple have a mix of distributed and replicated distributions
Improvements in automatic loop unrolling to support column names generated in loops, e.g. pd.DataFrame(X, columns=["y"] + ["x{}".format(i) for i in range(m)])
Improvements in caching to cover missing cases
Pandas coverage:
- Support column indices in read_csv() dtype argument. For example: df = pd.read_csv(fname, dtype={3: str})
- Support for df.to_string()
- Initial support for pd.Categorical()
- Support Series.min and Series.max for categorical data
- Support pd.to_datetime() with categorical string input
- Support pd.Series() constructor without data argument specified
- Support dtype="str" in Series constructor
- Support for Series.to_dict()
- Support for Series.between()
- Support Series.loc[] setitem with boolean array index, such as S.loc[idx] = val where idx is a boolean array or Series
- Support dictionary input in Series.map(), such as S.map({1.0: "A", 4.0: "DD"})
- Support for pd.TimedeltaIndex min and max
- Support for pd.tseries.offsets.Week
Numpy coverage:
- Support axis=1 in distributed np.concatenate
- Initial support for np.random.multivariate_normal
Scikit-learn:
- Add coef_ attribute to SGDClassifier model.
- Add coef_ attribute to LinearRegression model.
- Support for sklearn.preprocessing.LabelEncoder inside jit functions.
- Support for sklearn.metrics.r2_score inside jit functions.