Bodo 2022.8 Release (Date: 08/31/2022)¶
New Features and Improvements¶
Compilation / Performance improvements:
- BodoSQL generated plans are now more optimized to reduce runtime, compile time, and memory usage.
- Performance improvements to pivot_table by reducing the amount of data being shuffled.
- BodoSQL
CASEstatements are now faster to compile.
I/O:
- Bodo now uses a new optimized connector to write to Snowflake efficiently in parallel (with standard
DataFrame.to_sql()syntax). - Support for reading strings columns with dictionary encoding when fetching data from Snowflake.
- Bodo is now upgraded to use Arrow 8.
- Bodo can avoid loading any columns with parquet if only the length needs to be computed.
Iceberg:
- Support for limit pushdown with data read from Iceberg.
Pandas coverage:
- Added support for dictionary-encoded string arrays (that have reduced memory usage and execution time) with
pandas.concat - Support for
groupby.sum()with boolean columns. - Support for
MultiIndex.nbytes - Support for
Series.str.index - Support for
Series.str.rindex
BodoSQL:
-
Update the default null ordering with
ORDER BY(nulls first with ASC, nulls last with DESC). -
Updates aggregation without a
GROUP BYto return a replicated result. -
Improved runtime performance when computing a
SUMinside a window function. -
Added support for the following column functions
ACOSHASINHATANHBITANDBITORBITXORBITNOTBITSHIFTLEFTBITSHIFTRIGHTBOOLANDBOOLNOTBOOLORBOOLXORCBRTCOSHDATEADDDECODEDIV0EDITDISTANCEEQUAL_NULLFACTORIALGETBITHAVERSINEINITCAPREGEXPREGEXP_COUNTREGEXP_INSTRREGEXP_LIKEREGEXP_REPLACEREGEXP_SUBSTRREGR_VALXREGR_VALYRLIKESINHSPLIT_PARTSQUARESTRTOKTANHTRANSLATEWIDTH_BUCKET
-
Added support for binary data with the following functions:
LEFTLENLENGTHLPADREVERSERIGHTRPADSUBSTRSUBSTRING
-
Added support for the following window/aggregation functions
ANY_VALUECOUNT_IFCONDITIONAL_CHANGE_EVENTCONDITIONAL_TRUE_EVEN