Bodo 2020.09 Release (Date: 09/17/2020)¶
This release includes many new features, bug fixes and performance improvements. Overall, 88 code patches were merged since the last release.
New Features and Improvements¶
-
Bodo is updated to use the latest versions of Numba, pandas and Arrow:
- Numba 0.51.2
- pandas 1.1.2
- Arrow 1.0.1
-
Major improvements in memory management. Bodo's memory consumption is reduced significantly by releasing memory as soon as possible in various operations such as Join, GroupBy, and Sort.
-
Significant improvements in checking and handling various errors in I/O, providing clear error messages and graceful exits.
-
Improvements in speed and scalability of
read_parquetwhen reading from directories with large number of files. -
Distributed diagnostics is improved to provide clear messages on why a variable was assigned REP distribution.
-
Improvements in caching support for I/O calls and groupby user-defined functions (UDFs).
-
Support for more distributed getitem/setitem cases on arrays.
-
Improvements on checking for unsupported functions and optional arguments.
-
Significant performance improvements in groupby transformations (e.g.
GroupBy.cumsum). -
Enhanced support for
DataFrame.select_dtypes. -
Support for
axis=1inDataFrame.var/std. -
Support for
Series.autocorr. -
Support for
Series.is_monotonic_increasing/is_monotonic_decreasing. -
Support
pd.Series()constructor with a scalar data value. -
Support for
dayofweek,is_leap_yearanddays_in_monthinTimestampand `Series.dt]{.title-ref}. -
Support for
isocalendarinSeries.dtandDatetimeIndex. -
Support for
Series.cumsum/cummin/cummax. -
Support for
Decimalvalues in nested data structures. -
Improvements in table join performance.
-
Support for
Series.drop_duplicates. -
Support for
np.dotand@operator onSeries. -
Improvements in
pd.concatsupport. -
Optimized
Series.astype(str)forint64values. -
Support for
pd.Indexconstructor.