Bodo 2020.05 Release (Date: 05/06/2020)¶
New Features and Improvements¶
- 
- Bodo is updated to use the latest versions of Numba and Apache Arrow packages:
- 
- numba 0.49.0
- Apache Arrow 0.17.0
 
 
- 
Various improvements to clarity and conciseness of error messages 
- 
Initial support for pandas.DataFrame.to_sql()
- 
pandas.read_sql()supportsqlandconpassed to Bodo-decorated functions
- 
Added support for pandas.read_json()andpandas.DataFrame.to_json()from & to POSIX, S3, and Hadoop File Systems.
- 
Initial support for pandas.read_excel()
- 
numpy.fromfile()andnumpy.tofile()from and to S3, and Hadoop File Systems.
- 
Reduction in number of requests in I/O read calls 
- 
Initial support for array of lists of fixed sized values 
- 
List of strings data type support for pandas.DataFrame.join(),pandas.DataFrame.drop_duplicates(), andpandas.DataFrame.groupby()
- 
pandas.Timestampsubtraction, min and max
- 
Improved support for null values in datetime and timedelta operations 
- 
Support copy()function for Series ofdecimal.Decimalanddatetime.datedata types and most Index types
- 
Improved support for Series decimal.Decimaldtype
- 
String Series and Dataframe Column are now mutable and support inplace fillna()
- 
pandas.Series.round()
- 
pandas.Dataframe.assign()
- 
Support groupby(...).first()operation
- 
pandas.Dataframe.ilocsupport for extracting a subset of columns
- 
numpy.array.sum(axis=0)
- 
numpy.reshape()multi-dimensional distributed arrays
- 
Initial implementation of experimental legacy mode 
- 
Proper error when using unsupported pandas.(...)&pandas.Series.(...)functions
- 
Improved robustness of pandas.DataFrameinplace operations
- 
Memory usage improvements 
- 
Type safety improvements 
- 
Compilation time improvements 
Bug Fixes¶
- Fixed an issue in pandas.read_csv()reading a large CSV file in specific distributed cases
- numpy.dot()with empty vector/matrix input