Bodo 2021.5 Release (Date: 5/19/2021)
This release includes many new features, optimizations, bug fixes and usability improvements. Overall, 70 code patches were merged since the last release.
New Features and Improvements
Bodo is updated to use Arrow 4.0 (latest)
Improved performance of
pd.read_parquetsignificantly for large multi-file datasets by optimizing Parquet metadata collection
Bodo nows reads only the first few rows from a Parquet dataset if the program only requires
df.shape. This helps with exploring large datasets without the need for a large cluster to load the full data in memory.
Visualization: Bodo now supports calling many Matplotlib plotting functions directly from JIT code. See the “Data Visualization” section of our documentation for more details. The current support gathers the data into one process but this will be avoided in future releases.
Improved compilation time for dataframe functions
Improved the performance and scalability of
Many improvements to error checking and reporting
Bodo now avoids printing empty slices of distributed data to make print output easier to read.
memory_usage()for DataFrame and Series
nbytesfor array and Index types
df.describe()with datetime data (assumes datetime_is_numeric=True)
Initial support for CategoricalIndex type and categorical keys in groupby
Support for groupby
idxmaxwith nullable Integer and Boolean arrays
Support for timedelta64 in
binsand other optional arguments in
df.astype(), for example:
Support for boolean
Support for passing list of quantile values to
sum()method of Boolean Arrays
Initial support for
String array comparison returns a Pandas nullable boolean array instead of a Numpy boolean array