2020)¶

New Features and Improvements¶

Bodo is updated to use the latest versions of Numba and Apache Arrow packages:
- numba 0.49.0
- Apache Arrow 0.17.0
Various improvements to clarity and conciseness of error messages
Initial support for pandas.DataFrame.to_sql()
pandas.read_sql() support sql and con passed to Bodo-decorated functions
Added support for pandas.read_json() and pandas.DataFrame.to_json() from & to POSIX, S3, and Hadoop File Systems.
Initial support for pandas.read_excel()
numpy.fromfile() and numpy.tofile() from and to S3, and Hadoop File Systems.
Reduction in number of requests in I/O read calls
Initial support for array of lists of fixed sized values
List of strings data type support for pandas.DataFrame.join(), pandas.DataFrame.drop_duplicates(), and pandas.DataFrame.groupby()
pandas.Timestamp subtraction, min and max
Improved support for null values in datetime and timedelta operations
Support copy() function for Series of decimal.Decimal and datetime.date data types and most Index types
Improved support for Series decimal.Decimal dtype
String Series and Dataframe Column are now mutable and support inplace fillna()
pandas.Series.round()
pandas.Dataframe.assign()
Support groupby(...).first() operation
pandas.Dataframe.iloc support for extracting a subset of columns
numpy.array.sum(axis=0)
numpy.reshape() multi-dimensional distributed arrays
Initial implementation of experimental legacy mode
Proper error when using unsupported pandas.(...) & pandas.Series.(...) functions
Improved robustness of pandas.DataFrame inplace operations
Memory usage improvements
Type safety improvements
Compilation time improvements

Bug Fixes¶

Fixed an issue in pandas.read_csv() reading a large CSV file in specific distributed cases
numpy.dot() with empty vector/matrix input