Bodo 2022.7 Release (Date: 07/31/2022)¶
New Features and Improvements¶
Compilation / Performance improvements:
Groupby
operations are now faster to compile and support super-wide DataFramesGroupby.apply()
operations have improved compilation time, runtime memory usage and performance.- Most
BodoSQL
select statements are now faster to compile. - Cache is now automatically invalidated when upgrading Bodo.
Iceberg:
- Added support for writing Iceberg tables via
to_sql
I/O:
to_csv
,to_json
, andto_parquet
now support a custom argument_bodo_file_prefix
to specify the prefix of files written in distributed cases.- Snowflake data load now supports filter pushdown with
Series.str.startswith
andSeries.str.endswith
.
Pandas coverage:
read_csv
andread_json
now support argumentsample_nrows
to set the number of rows that are sampled to infer column dtypes (by defaultsample_nrows=100
).- Support for
DataFrame.rank
- Support for
Groupby.ngroup
- Added support for dictionary-encoded string arrays (that have reduced memory usage and execution time) in the following functions:
Groupby.min
Groupby.max
Groupby.first
Groupby.last
Groupby.shift
Groupby.head
Groupby.nunique
Groupby.sum
Groupby.cumsum
Groupby.transform
BodoSQL:
-
Added support for the following query syntax
QUALIFY
GROUP BY GROUPING SETS
GROUP BY CUBE
GROUP BY ROLLING
-
Added support for the following functions:
IFF
NULLIFZERO
NVL2
ZEROIFNULL
-
Added support for the following windowed aggregation functions:
RANK
DENSE_RANK
PERCENT_RANK
CUME_DIST
-
The following functions are much faster to compile:
ADDDATE/DATE_ADD/SUBDATE/DATE_SUB
if the second argument is an integer columnASCII
CHAR
COALESCE
CONV
DAYNAME
FORMAT
FROM_DAYS
FROM_UNIXTIME
IF
IFNULL
INSTR
LAST_DAY
LEFT
LOG
LPAD
MAKEDATE
MONTHNAME
NULLIF
NVL
ORD
REPEAT
REPLACE
REVERSE
RIGHT
RPAD
SPACE
STRCMP
SUBSTRING
SUBSTRING_INDEX
TIMESTAMPDIFF
(if the unit is Month, Quarter, or Year)Unary -
WEEKDAY
YEAROFWEEKISO
-
Support for binary data in complex join operations
- Support for UTF-8 string literals in queries (previously just ASCII).