Bodo 2020.05 Release (Date: 05/06/2020)¶
New Features and Improvements¶
-
- Bodo is updated to use the latest versions of Numba and Apache Arrow packages:
-
- numba 0.49.0
- Apache Arrow 0.17.0
-
Various improvements to clarity and conciseness of error messages
-
Initial support for
pandas.DataFrame.to_sql() -
pandas.read_sql()supportsqlandconpassed to Bodo-decorated functions -
Added support for
pandas.read_json()andpandas.DataFrame.to_json()from & to POSIX, S3, and Hadoop File Systems. -
Initial support for
pandas.read_excel() -
numpy.fromfile()andnumpy.tofile()from and to S3, and Hadoop File Systems. -
Reduction in number of requests in I/O read calls
-
Initial support for array of lists of fixed sized values
-
List of strings data type support for
pandas.DataFrame.join(),pandas.DataFrame.drop_duplicates(), andpandas.DataFrame.groupby() -
pandas.Timestampsubtraction, min and max -
Improved support for null values in datetime and timedelta operations
-
Support
copy()function for Series ofdecimal.Decimalanddatetime.datedata types and most Index types -
Improved support for Series
decimal.Decimaldtype -
String Series and Dataframe Column are now mutable and support inplace
fillna() -
pandas.Series.round() -
pandas.Dataframe.assign() -
Support
groupby(...).first()operation -
pandas.Dataframe.ilocsupport for extracting a subset of columns -
numpy.array.sum(axis=0) -
numpy.reshape()multi-dimensional distributed arrays -
Initial implementation of experimental legacy mode
-
Proper error when using unsupported
pandas.(...)&pandas.Series.(...)functions -
Improved robustness of
pandas.DataFrameinplace operations -
Memory usage improvements
-
Type safety improvements
-
Compilation time improvements
Bug Fixes¶
- Fixed an issue in
pandas.read_csv()reading a large CSV file in specific distributed cases numpy.dot()with empty vector/matrix input