Skip to content

Bodo 2022.7 Release (Date: 07/31/2022)

New Features and Improvements

Compilation / Performance improvements:

  • Groupby operations are now faster to compile and support super-wide DataFrames
  • Groupby.apply() operations have improved compilation time, runtime memory usage and performance.
  • Most BodoSQL select statements are now faster to compile.
  • Cache is now automatically invalidated when upgrading Bodo.

Iceberg:

  • Added support for writing Iceberg tables via to_sql

I/O:

  • to_csv, to_json, and to_parquet now support a custom argument _bodo_file_prefix to specify the prefix of files written in distributed cases.
  • Snowflake data load now supports filter pushdown with Series.str.startswith and Series.str.endswith.

Pandas coverage:

  • read_csv and read_json now support argument sample_nrows to set the number of rows that are sampled to infer column dtypes (by default sample_nrows=100).
  • Support for DataFrame.rank
  • Support for Groupby.ngroup
  • Added support for dictionary-encoded string arrays (that have reduced memory usage and execution time) in the following functions:
    • Groupby.min
    • Groupby.max
    • Groupby.first
    • Groupby.last
    • Groupby.shift
    • Groupby.head
    • Groupby.nunique
    • Groupby.sum
    • Groupby.cumsum
    • Groupby.transform

BodoSQL:

  • Added support for the following query syntax

    • QUALIFY
    • GROUP BY GROUPING SETS
    • GROUP BY CUBE
    • GROUP BY ROLLING
  • Added support for the following functions:

    • IFF
    • NULLIFZERO
    • NVL2
    • ZEROIFNULL
  • Added support for the following windowed aggregation functions:

    • RANK
    • DENSE_RANK
    • PERCENT_RANK
    • CUME_DIST
  • The following functions are much faster to compile:

    • ADDDATE/DATE_ADD/SUBDATE/DATE_SUB if the second argument is an integer column
    • ASCII
    • CHAR
    • COALESCE
    • CONV
    • DAYNAME
    • FORMAT
    • FROM_DAYS
    • FROM_UNIXTIME
    • IF
    • IFNULL
    • INSTR
    • LAST_DAY
    • LEFT
    • LOG
    • LPAD
    • MAKEDATE
    • MONTHNAME
    • NULLIF
    • NVL
    • ORD
    • REPEAT
    • REPLACE
    • REVERSE
    • RIGHT
    • RPAD
    • SPACE
    • STRCMP
    • SUBSTRING
    • SUBSTRING_INDEX
    • TIMESTAMPDIFF (if the unit is Month, Quarter, or Year)
    • Unary -
    • WEEKDAY
    • YEAROFWEEKISO
  • Support for binary data in complex join operations

  • Support for UTF-8 string literals in queries (previously just ASCII).