Bodo 2020.04 Release (Date: 04/08/2020)¶
New Features and Improvements¶
- Support for
scatterv
operation - Improved memory management for DataFrame and Series data
- Initial support for
pandas.read_sql()
pandas.read_csv()
reads a directory of csv filespandas.read_csv()
reads from S3, and Hadoop Distributed File System (HDFS)pandas.read_parquet()
now reads all integer types (like int16) and gets nullable information for integer columns from pandas metadatapandas.read_parquet()
now supports reading columns of list of string elements- avoid type error for unselected columns in Parquet files
- support
pandas.RangeIndex
when reading a non-partitioned parquet dataset pandas.Dataframe.to_parquet()
to Hadoop Distributed File System (HDFS)pandas.Dataframe.to_parquet()
always writespandas.RangeIndex
to Parquet metadata- support
pandas.Dataframe.to_parquet()
writing datetime64 (default in Pandas) anddatatime.date
types to Parquet files - support
decimal.Decimal
type in dataframes and Parquet I/O - Support for
&
,|
, andpandas.Series.dt
inpandas.Dataframe.query()
- Support added for groupby
last
operation min
,max
, andsum
support ingroupby()
for string columns- non-constant list of column names as argument support for functions
like
groupby()
- MultiIndex support for
groupby(...).agg(as_index=False)
pandas.Dataframe.merge()
one dataframe on index, and the other on a column- sorting compilation time improvement
- supports for integer, float, string, string list,
datetime.date
,datetime.datetime
, anddatetime.timedelta
types inpandas.Series.cummin()
,pandas.DataFrame.cummin()
,pandas.Series.cummax()
, andpandas.DataFrame.cummax()
NA
s indatetime.date
array- better
datetime.timedelta
support - Support for
min
andmax
inpandas.Timestamp
anddatetime.date
pandas.DataFrame.all()
for boolean seriespandas.Series.astype()
to float, int, str- Convert string columns to float using
astype()
NA
support forSeries.str.split()
- refactored and improved Dataframe indexing:
pandas.loc()
,pandas.Dataframe.iloc()
, andpandas.Dataframe.iat()
- better support for
pandas.Series.shift()
,pandas.Series.pct_change()
,pandas.Dataframe.drop()
- set dataframe column using a scalar
- support for
Index.values
- Addition support for String columns
Bug Fix¶
pandas.join()
produce the correct index.pandas.groupby()
use the latest schemagroupby(...).cumsum()
preserves indexgroupby(...).agg()
when passing a dictionary of functions: support mix of multi-function lists and single functions- Fixed Numpy slicing error in a corner case when the slice is equivalent to array and array size is a constant
- proper construction of dataframe from slicing Numpy 2D array
pandas.read_csv
reads a dataframe containing only datetime like columns- When using
pandas.merge()
andpandas.join()
integer columns which can have a missing valueNA
are returned as nullable integer array (as opposed to0
and-1
before) - avoid errors in comparing Pandas and Numpy