Bodo 2022.4 Release (Date: 4/29/2022)¶
This release includes many new features, usability and performance improvements, and bug fixes. Overall, 60 code patches were merged since the last release.
New Features and Improvements¶
- 
Support for Python 3.10 (Conda/pip packages will be available soon)
 - 
Support for Pandas 1.4 (along with continued support for v1.3)
 - 
Connectors:
- When passing a list of paths to 
pd.read_parquet, the paths can be a combination of paths to files and glob strings. - Improved performance of 
pd.read_parqueton remote filesystems when passing long lists of files. DataFrame.to_parquetnow supportsrow_group_sizeparameter, which can be used to specify the maximum number of rows in generated row groups. Bodo now has a default row group size of 1M rows, to improve performance when reading the generated parquet datasets in parallel.pd.read_parquet: string columns can be forced to be read with dictionary encoding by passing a list of column names with_bodo_read_as_dictparameter.- Support for S3 anonymous access with 
storage_options={"anon": True}inpd.read_csvandpd.read_json - Improved performance and memory utilization of 
pd.read_csvat compilation and run time (especially when reading first n rows from remote filesystems such as S3) 
 - When passing a list of paths to 
 - 
Parallel support for
pd.date_range: Bodo automatically creates a date range that is distributed across processes - 
Improved performance of
Series.str.startswith/endswith/containsfor dictionary-encoded string arrays - 
Reduced compilation time for
DataFrame.memory_usage() - 
Reduced compilation time when using
pandas.read_sql()with wide tables. - 
Pandas:
- Support for 
DataFrame.melt()andpd.melt() 
 - Support for