Bodo 2020.05 Release (Date: 05/06/2020)¶
New Features and Improvements¶
-
- Bodo is updated to use the latest versions of Numba and Apache Arrow packages:
-
- numba 0.49.0
- Apache Arrow 0.17.0
-
Various improvements to clarity and conciseness of error messages
-
Initial support for
pandas.DataFrame.to_sql()
-
pandas.read_sql()
supportsql
andcon
passed to Bodo-decorated functions -
Added support for
pandas.read_json()
andpandas.DataFrame.to_json()
from & to POSIX, S3, and Hadoop File Systems. -
Initial support for
pandas.read_excel()
-
numpy.fromfile()
andnumpy.tofile()
from and to S3, and Hadoop File Systems. -
Reduction in number of requests in I/O read calls
-
Initial support for array of lists of fixed sized values
-
List of strings data type support for
pandas.DataFrame.join()
,pandas.DataFrame.drop_duplicates()
, andpandas.DataFrame.groupby()
-
pandas.Timestamp
subtraction, min and max -
Improved support for null values in datetime and timedelta operations
-
Support
copy()
function for Series ofdecimal.Decimal
anddatetime.date
data types and most Index types -
Improved support for Series
decimal.Decimal
dtype -
String Series and Dataframe Column are now mutable and support inplace
fillna()
-
pandas.Series.round()
-
pandas.Dataframe.assign()
-
Support
groupby(...).first()
operation -
pandas.Dataframe.iloc
support for extracting a subset of columns -
numpy.array.sum(axis=0)
-
numpy.reshape()
multi-dimensional distributed arrays -
Initial implementation of experimental legacy mode
-
Proper error when using unsupported
pandas.(...)
&pandas.Series.(...)
functions -
Improved robustness of
pandas.DataFrame
inplace operations -
Memory usage improvements
-
Type safety improvements
-
Compilation time improvements
Bug Fixes¶
- Fixed an issue in
pandas.read_csv()
reading a large CSV file in specific distributed cases numpy.dot()
with empty vector/matrix input