Bodo 2021.8 Release (Date: 8/30/2021)¶
This release includes many new features, optimizations, bug fixes and usability improvements. Overall, 74 code patches were merged since the last release.
New Features and Improvements¶
-
Bodo is updated to use pandas 1.3 and Arrow 5.0 (latest)
-
Updated
bodo.jit
flag handling to remove the need for thedistributed
flag in most cases. Arguments and return types are now automatically inferred as distributed in many cases. Automatic inference can be disabled using the newreturns_maybe_distributed
andargs_maybe_distributed
flags if necessary. -
Connectors:
- Improved
pd.read_sql
performance on Snowflake by using the Snowflake connector APIs directly. - Improved performance of
pd.read_parquet
when reading large partitioned datasets
- Improved
-
Performance improvements:
- Reduced compilation time for some DataFrame operations
- General performance improvements in Bodo's execution engine resulting in better speed and memory efficiency for a wide range of operations
- Improved performance of
merge
andjoin
operations - Improved performance and scalability of
groupby
operations - Improved performance of
groupby.apply
- Improved performance of
groupby.transform
- Significantly optimized
Series.str.contains(..., regex=True)
- Improved performance of filtering operations involving string arrays
-
Pandas:
- Support for passing string function names to
Series.apply
. The string can refer to a Series method or a Numpy ufunc. - Support for passing string function names to
DataFrame.apply
. The string can refer to a DataFrame method.axis
can be provided if the method takes anaxis
argument. - Enhanced support for binary arrays, including within series/dataframes
astype()
support for casting strings to nullable integers- Support for
operator.mul
between aTimedelta
scalar and integers Series - Support for
std
ingroupby.transform
.
- Support for passing string function names to
-
Scikit-learn:
- Support for
sklearn.feature_extraction.text.CountVectorizer
- Support for
coef_
attribute forRidge
- Support for
BodoSQL 2021.8beta Release (Date: 8/30/2021)
This release adds more SQL coverage, introduces new BodoSQL specific features, and fixes bugs. Overall, 53 code patches were merged since the last release.
New Features and Improvements¶
-
Parameterized Queries:
Parameterized queries allow replacing scalars with Python variables during runtime execution. This enables caching more complex BodoSQL queries When paired with Bodo jit: a query parameter can change without the need to recompile the query. More information and example usage can be found in our documentation.
-
SQL Coverage:
This release added the following additional SQL coverage to BodoSQL. Please refer to our documentation for more details regarding usage.
- Support for != and <=operators
- Support for CAST
- Support for LEAST
- Support for [NOT] IN with lists of literals
- Support for the offset optional argument in queries with LIMIT (i.e. SELECT A from table LIMIT 1, 4)
- Initial support for YEAR/MONTH interval literals/scalars. Currently these are only supported with addition and subtraction operators and cannot be used as a column type.
- Support the following string functions:
- CHAR (to convert a value to a string)
- LENGTH
- Support for the following Timestamp functions:
- ADDDATE
- SUBDATE
- TIMESTAMPDIFF
- WEEKDAY
- YEARWEEK
- LAST_DAY