Bodo 2021.12 Release (Date: 12/29/2021)¶
This release includes many new features and usability improvements. Overall, 67 code patches were merged since the last release.
New Features and Improvements¶
-
Significantly upgrades to the Bodo documentation to improve the developer experience
-
Improvements to documentation and unsupported attribute handling for Pandas APIs
-
Significant enhancements to objmode user experience and robustness, such as automatic output data type checking and automatic conversion if possible
-
Improved support for
repackage, such as support forreflags, better support for returningNonewhen necessary, and better catching of unsupported corner cases -
Support caching functions that take a string as input and create a file path using concatenation. For example:
-
Connectors:
- Improved
read_parquetruntime performance when reading from S3 - Decreased compilation time for
read_csvon DataFrames with large number of columns (100)
- Improved
-
Improved compilation time for dataframes with large number of columns (>10,000)
-
Improved NA handling in User Defined Functions with df.apply when functions are not inlined
-
Support for using
logging.RootLogger.infowhen passing the logger as an argument to a JIT function -
Support for
datetime.datetime.today -
Simpler
bodo.scattervusage from regular Python. Other ranks are ignored but not required to haveNoneas their data -
Improved support for map arrays in various operations
-
Support
feature_importances_of XGBoost -
Support
predict_probaandpredict_log_probain Scikit-learn classifier algorithms -
Pandas:
- Support for Bodo specific argument
_bodo_upcast_to_float64in pd.read_csv. This can be used when all data is numeric but schema inference cannot accurate predict data types. - Support for using
DataFrame.to_parquetwith "wide" DataFrames with large number of columns - Support for storing a
DateTimeIndexwithDataFrame.to_parquet - Support for the 'method' argument in
DataFrame.fillnaandSeries.fillna - Support for
Series.bfill,Series.ffill,Series.pad, andSeries.backfill - Support for
Series.keys - Support for
Series.infer_objectsandDataFrame.infer_objects - Decreased runtime when calling
.astype("categorical")on Series with large numbers of categories
- Support for Bodo specific argument