Bodo 2021.7 Release (Date: 7/23/2021)¶
This release includes many new features, optimizations, bug fixes and usability improvements. Overall, 109 code patches were merged since the last release.
New Features and Improvements¶
- 
Documentation has been reorganized and updated, with improved navigation and a detailed walkthrough of Pandas equivalents of PySpark functions.
 - 
Improvements to enable BodoSQL features
 - 
- Connectors:
 - 
- Improved performance of 
pd.read_parquetwhen reading from remote storage systems like S3 - Support reading categorical columns of Parquet
 
 - Improved performance of 
 
 - 
- Performance improvements:
 - 
- Improved performance and scalability of 
sort_values - Optimized 
pd.Series.isin(values)performance for long list ofvalues. 
 - Improved performance and scalability of 
 
 - 
UDFs in Series.apply and Dataframe.apply: the Bodo compiler transforms the code to pass main function values referenced in the UDF ("free variables") as arguments to
apply()automatically if possible (to simplify UDF usage). - 
Support passing Bodo data types to objmode directly (in addition to string representation of the data types). For example, the following code sets the return type an int64 type:
@bodo.jit def f(a, b): with bodo.objmode(res=bodo.int64): res = random.randint(a, b) return res - 
Compilation time improvements for some dataframe operations
 - 
Distributed support for
pd.RangeIndexcalls - 
- Pandas coverage:
 - 
- Initial support for binary arrays, including within series/dataframes
 
 
- Support for `groupby.transform` - Groupby: support repeated input columns. For example: df.groupby("A").agg( D=pd.NamedAgg(column="B", aggfunc=lambda A: A.sum()), F=pd.NamedAgg(column="C", aggfunc="max"), E=pd.NamedAgg(column="B", aggfunc="min"), ) - Support Groupby with `dropna=False` - Support for `dropna` in `Series.nunique`, `DataFrame.nunique`, and `groupby.nunique` - Support for `DataFrame.insert()` - Support `tolist()` for string and numpy arrays - Expanded `astype` support: : - str to timedelta64/datetime64 - timedelta64/datetime64 to int64 - date arrays - Numeric-like inputs to datetime/timedelta - Support for `pd.StringDtype()` in `astype` - numeric-like to nullable integer types - Support for `pd.Timestamp.now()` - Support Timestamp in `pd.to_datetime` - Support for Timestamp/Timedelta as the scalar value for a Series - Support for `Series.dt.month_name`, `Timestamp.month_name` - Support for min/max on timedelta64 series/arrays - 
- Python coverage:
 - 
- Support for 
bytes.fromhex() - Support for 
bytes.__hash__ - Support for 
minandmaxfor string values 
 - Support for