Bodo 2021.1 Release (Date: 1/26/2021)

This release includes many new features, bug fixes and performance improvements. Overall, 61 code patches were merged since the last release.

New Features and Improvements

  • Connectors:

    • Support filter pushdown when reading partitioned parquet datasets: at compile time, Bodo detects if filters are applied to a dataframe after read_parquet, and generates code that applies those filters at read time so that only the required parquet files are read.

    • Support for Series.to_csv()

    • Supports passing file and dtype arguments of np.fromfile as kwargs.

  • Support for f-strings in Bodo jitted functions

  • Support passing Bodo distributed JIT functions to other Bodo JIT functions

  • Pandas coverage:

    • Support groupby with pd.NamedAgg()

    • Support for groupby.size

    • Support for groupby.shift

    • Match input row order of pandas in groupby.apply when applicable

    • Support min_periods in rolling calls

    • Support passing a dictionary of data types to df.astype()

    • Support dataframe setitem of multiple columns. For example: df[["A", "B"]] = 1.3

    • Support for Index.get_loc()

    • Support ddof argument (delta degrees of freedom) of Series.cov

    • Support Series.is_monotonic property

    • Initial support for dictionaries in Series.replace

    • Support Series.reset_index(drop=True)

    • Support level argument with all levels in reset_index()

    • Several documentation improvements

  • Scikit-learn:

    • Support for sklearn.model_selection.train_test_split inside jit functions.

    • Support for sklearn.preprocessing.MinMaxScaler inside jit functions.