Bodo 2020.04 Release (Date: 04/08/2020)¶
New Features and Improvements¶
- Support for
scattervoperation - Improved memory management for DataFrame and Series data
- Initial support for
pandas.read_sql() pandas.read_csv()reads a directory of csv filespandas.read_csv()reads from S3, and Hadoop Distributed File System (HDFS)pandas.read_parquet()now reads all integer types (like int16) and gets nullable information for integer columns from pandas metadatapandas.read_parquet()now supports reading columns of list of string elements- avoid type error for unselected columns in Parquet files
- support
pandas.RangeIndexwhen reading a non-partitioned parquet dataset pandas.Dataframe.to_parquet()to Hadoop Distributed File System (HDFS)pandas.Dataframe.to_parquet()always writespandas.RangeIndexto Parquet metadata- support
pandas.Dataframe.to_parquet()writing datetime64 (default in Pandas) anddatatime.datetypes to Parquet files - support
decimal.Decimaltype in dataframes and Parquet I/O - Support for
&,|, andpandas.Series.dtinpandas.Dataframe.query() - Support added for groupby
lastoperation min,max, andsumsupport ingroupby()for string columns- non-constant list of column names as argument support for functions
like
groupby() - MultiIndex support for
groupby(...).agg(as_index=False) pandas.Dataframe.merge()one dataframe on index, and the other on a column- sorting compilation time improvement
- supports for integer, float, string, string list,
datetime.date,datetime.datetime, anddatetime.timedeltatypes inpandas.Series.cummin(),pandas.DataFrame.cummin(),pandas.Series.cummax(), andpandas.DataFrame.cummax() NAs indatetime.datearray- better
datetime.timedeltasupport - Support for
minandmaxinpandas.Timestampanddatetime.date pandas.DataFrame.all()for boolean seriespandas.Series.astype()to float, int, str- Convert string columns to float using
astype() NAsupport forSeries.str.split()- refactored and improved Dataframe indexing:
pandas.loc(),pandas.Dataframe.iloc(), andpandas.Dataframe.iat() - better support for
pandas.Series.shift(),pandas.Series.pct_change(),pandas.Dataframe.drop() - set dataframe column using a scalar
- support for
Index.values - Addition support for String columns
Bug Fix¶
pandas.join()produce the correct index.pandas.groupby()use the latest schemagroupby(...).cumsum()preserves indexgroupby(...).agg()when passing a dictionary of functions: support mix of multi-function lists and single functions- Fixed Numpy slicing error in a corner case when the slice is equivalent to array and array size is a constant
- proper construction of dataframe from slicing Numpy 2D array
pandas.read_csvreads a dataframe containing only datetime like columns- When using
pandas.merge()andpandas.join()integer columns which can have a missing valueNAare returned as nullable integer array (as opposed to0and-1before) - avoid errors in comparing Pandas and Numpy