2022)¶

New Features and Improvements¶

Compilation / Performance improvements:

BodoSQL generated plans are now more optimized to reduce runtime, compile time, and memory usage.
Performance improvements to pivot_table by reducing the amount of data being shuffled.
BodoSQL CASE statements are now faster to compile.

I/O:

Bodo now uses a new optimized connector to write to Snowflake efficiently in parallel (with standard DataFrame.to_sql() syntax).
Support for reading strings columns with dictionary encoding when fetching data from Snowflake.
Bodo is now upgraded to use Arrow 8.
Bodo can avoid loading any columns with parquet if only the length needs to be computed.

Iceberg:

Pandas coverage:

Added support for dictionary-encoded string arrays (that have reduced memory usage and execution time) with pandas.concat
Support for groupby.sum() with boolean columns.
Support for MultiIndex.nbytes
Support for Series.str.index
Support for Series.str.rindex

BodoSQL:

Update the default null ordering with ORDER BY (nulls first with ASC, nulls last with DESC).
Updates aggregation without a GROUP BY to return a replicated result.
Improved runtime performance when computing a SUM inside a window function.
Added support for the following column functions
- ACOSH
- ASINH
- ATANH
- BITAND
- BITOR
- BITXOR
- BITNOT
- BITSHIFTLEFT
- BITSHIFTRIGHT
- BOOLAND
- BOOLNOT
- BOOLOR
- BOOLXOR
- CBRT
- COSH
- DATEADD
- DECODE
- DIV0
- EDITDISTANCE
- EQUAL_NULL
- FACTORIAL
- GETBIT
- HAVERSINE
- INITCAP
- REGEXP
- REGEXP_COUNT
- REGEXP_INSTR
- REGEXP_LIKE
- REGEXP_REPLACE
- REGEXP_SUBSTR
- REGR_VALX
- REGR_VALY
- RLIKE
- SINH
- SPLIT_PART
- SQUARE
- STRTOK
- TANH
- TRANSLATE
- WIDTH_BUCKET
Added support for binary data with the following functions:
- LEFT
- LEN
- LENGTH
- LPAD
- REVERSE
- RIGHT
- RPAD
- SUBSTR
- SUBSTRING
Added support for the following window/aggregation functions
- ANY_VALUE
- COUNT_IF
- CONDITIONAL_CHANGE_EVENT
- CONDITIONAL_TRUE_EVEN