Bodo 2021.5 Release (Date: 5/19/2021)¶
This release includes many new features, optimizations, bug fixes and usability improvements. Overall, 70 code patches were merged since the last release.
New Features and Improvements¶
- 
Bodo is updated to use Arrow 4.0 (latest)
 - 
Connectors:
- Improved performance of 
pd.read_parquetsignificantly for large multi-file datasets by optimizing Parquet metadata collection - Bodo nows reads only the first few rows from a Parquet dataset
    if the program only requires 
df.head(n)and/ordf.shape. This helps with exploring large datasets without the need for a large cluster to load the full data in memory. 
 - Improved performance of 
 - 
Visualization: Bodo now supports calling many Matplotlib plotting functions directly from JIT code. See the "Data Visualization" section of our documentation for more details. The current support gathers the data into one process but this will be avoided in future releases.
 - 
Improved compilation time for dataframe functions
 - 
Improved the performance and scalability of
groupby.nunique - 
Many improvements to error checking and reporting
 - 
Bodo now avoids printing empty slices of distributed data to make print output easier to read.
 - 
Pandas coverage:
- Support for 
DataFrame.info() - Support for 
memory_usage()for DataFrame and Series - Support for 
nbytesfor array and Index types - Support for 
df.describe()with datetime data (assumesdatetime_is_numeric=True) - Support for 
groupby.value_counts() - Support for 
pd.NamedAggwithnuniquein groupby - Initial support for CategoricalIndex type and categorical keys in groupby
 - Support for groupby 
idxminandidxmaxwith nullable Integer and Boolean arrays - Support for timedelta64 in 
Groupby.agg - Support for 
binsand other optional arguments inSeries.value_counts() - Support for 
df.dtypes - Support passing 
df.dtypestodf.astype(), for example:df1.astype(df2.dtypes) - Support for boolean 
pd.Index - Support for 
Series.sort_index() - Support for 
Timestamp.day_name()andSeries.dt.day_name() - Support for 
Series.quantile()with datetime - Support for passing list of quantile values to
    
Series.quantile() - Support for 
Series.to_frame() - Support for 
sum()method of Boolean Arrays - Initial support for 
MultiIndex.from_product - String array comparison returns a Pandas nullable boolean array instead of a Numpy boolean array
 
 - Support for