Bodo 2020.07 Release (Date: 07/16/2020)¶
This release includes many new features and bug fixes. Overall, 59 code patches were merged since the last release, including the major addition of support for columns of array and struct values with arbitrary nesting.
New Features and Improvements¶
-
Bodo is updated to use the latest version of Numba (Numba 0.50.1)
-
Series and dataframe columns can have values that are arrays. For example:
A B 0 0 [1, 2, 3] 1 1 [4, 5] 2 2 [6] 3 3 [7, 8, 9]
-
Series and dataframe columns can have values that are structs. For example:
A B 0 0 {'A': 1, 'B': 2.1} 1 1 {'A': 3, 'B': 4.3} 2 2 {'A': 5, 'B': 6.5} 3 3 {'A': 7, 'B': 8.4}
-
Array and Struct values can contain other arrays/structs with arbitrary nesting. For example:
A B 0 0 {'A': [1, 2], 'B': [3]} 1 1 {'A': [4, 5, 6], 'B': [7]} 2 2 {'A': [8, 9, 10, 11], 'B': [12, 13]} 3 3 {'A': [14], 'B': [15]}
-
df.drop_duplicates()
anddf.merge()
is supported for nested array/struct columns. -
Added support for categorical array data type without explicit categories. Added support for
Series.astype(cat_dtype)
andSeries.astype("category")
. -
Generalized
df.dropna()
to all arrays, and added support for 'how' and 'subset' options. -
Support for
Series.explode()
-
series.median()
: support 'skipna' option and Decimal type, and bug fixes. -
Added Series.radd/rsub/rmul/rdiv/rpow/rmod
-
Support for Series.dot/kurt/kurtosis/skew/sem
-
Added Series.mad (mean absolute deviation)
-
Series.var()
andSeries.std()
: Added support for 'skipna' and 'ddof' options -
Support Series.equals
-
Series product/sum: added support for 'min_count' and 'skipna' options
-
Support Index.map() for all Index type
-
Support all Bodo array types as output of Series.map/apply, df.apply
-
Support df.values with nullable int columns
-
Bodo release builds now enforce licensing (expiration date and maximum core count) via license keys provided as a file or an environment variable called "BODO_LICENSE".