Bodo 2022.2 Release (Date: 2/28/2022)¶
This release includes many new features and usability improvements. Overall, 82 code patches were merged since the last release.
New Features and Improvements¶
-
Reduced the import time of the Bodo package substantially
-
Bodo is now available with
pip
on x86 Mac -
Bodo is upgraded to use Numba 0.55.1 (the latest release)
-
Bodo is upgraded to use scikit-learn v1
-
Bodo now supports MPICH version 3.4
-
Connectors:
pd.read_sql
: Support and getting start documentation for for Oracle DB and PostgreSQL.pd.read_parquet
now supports glob patterns- Support for
escapechar
argument inpd.read_csv
- Decreased compilation time when reading wide schemas with 1000s
of columns usings
pd.read_parquet
. - Optimized runtime of
pd.read_parquet
withhead(0)
to skip any unnecessary schema collection for each parquet file and just look at the metadata. This optimization is helpful when loading a DataFrame schema. - Support using filter pushdown with a single filter consisting of
Series.isna
,Series.isnull
,Series.notna
, orSeries.notnull
. - Full filter pushdown support with
hdfs
andgcs
usingpd.read_parquet
- Improved performance and error handling when using
DataFrame.to_sql
with Snowflake. - Bodo now prints a warning if the number of Parquet row groups is too small for effective parallel I/O.
-
Support for using lists and sets as constant global values.
-
Support for distributed global dataframe values
-
Added a compiler optimizations for forcing the columns in a DataFrame to match a DataFrame with an existing schema via
DataFrame.dtypes
. In particular when Bodo encounters code like:Bodo will automatically use the internal Bodo types for all columns in
df2
. This enables using astypes for conversions that are typically not possible in Pandas because the column has anobject
dtype. For example, this can be used to convert a column fromdatetime64[ns]
todatetime.date
withastype
. -
Improved runtime performance when copying a string data from one array to another or when computing an array of string lengths.
-
Pandas:
- Support for passing multiple columns to
values
andindex
withDataFrame.pivot()
andDataFrame.pivot_table()
- Support for using
pd.pivot()
andpd.pivot_table()
. Functionality is equivalent toDataFrame.pivot()
andDataFrame.pivot_table()
- Support for
DataFrame.explode()
- Support for
DataFrame.where()
andDataFrame.mask()
- Support for
Series.duplicated()
andIndex.duplicated()
. - Support for
Series.rename_axis()
- Support for using
object
inDataFrame.astype
. Bodo doesn't have a generic "object" type, so the type of the column remains the same.
- Support for passing multiple columns to