Bodo 2022.7 Release (Date: 07/31/2022)¶
New Features and Improvements¶
Compilation / Performance improvements:
Groupbyoperations are now faster to compile and support super-wide DataFramesGroupby.apply()operations have improved compilation time, runtime memory usage and performance.- Most
BodoSQLselect statements are now faster to compile. - Cache is now automatically invalidated when upgrading Bodo.
Iceberg:
- Added support for writing Iceberg tables via
to_sql
I/O:
to_csv,to_json, andto_parquetnow support a custom argument_bodo_file_prefixto specify the prefix of files written in distributed cases.- Snowflake data load now supports filter pushdown with
Series.str.startswithandSeries.str.endswith.
Pandas coverage:
read_csvandread_jsonnow support argumentsample_nrowsto set the number of rows that are sampled to infer column dtypes (by defaultsample_nrows=100).- Support for
DataFrame.rank - Support for
Groupby.ngroup - Added support for dictionary-encoded string arrays (that have reduced memory usage and execution time) in the following functions:
Groupby.minGroupby.maxGroupby.firstGroupby.lastGroupby.shiftGroupby.headGroupby.nuniqueGroupby.sumGroupby.cumsumGroupby.transform
BodoSQL:
-
Added support for the following query syntax
QUALIFYGROUP BY GROUPING SETSGROUP BY CUBEGROUP BY ROLLING
-
Added support for the following functions:
IFFNULLIFZERONVL2ZEROIFNULL
-
Added support for the following windowed aggregation functions:
RANKDENSE_RANKPERCENT_RANKCUME_DIST
-
The following functions are much faster to compile:
ADDDATE/DATE_ADD/SUBDATE/DATE_SUBif the second argument is an integer columnASCIICHARCOALESCECONVDAYNAMEFORMATFROM_DAYSFROM_UNIXTIMEIFIFNULLINSTRLAST_DAYLEFTLOGLPADMAKEDATEMONTHNAMENULLIFNVLORDREPEATREPLACEREVERSERIGHTRPADSPACESTRCMPSUBSTRINGSUBSTRING_INDEXTIMESTAMPDIFF(if the unit is Month, Quarter, or Year)Unary -WEEKDAYYEAROFWEEKISO
-
Support for binary data in complex join operations
- Support for UTF-8 string literals in queries (previously just ASCII).