Bodo 2020.04 Release (Date: 04/08/2020)¶
New Features and Improvements¶
- Support for scattervoperation
- Improved memory management for DataFrame and Series data
- Initial support for pandas.read_sql()
- pandas.read_csv()reads a directory of csv files
- pandas.read_csv()reads from S3, and Hadoop Distributed File System (HDFS)
- pandas.read_parquet()now reads all integer types (like int16) and gets nullable information for integer columns from pandas metadata
- pandas.read_parquet()now supports reading columns of list of string elements
- avoid type error for unselected columns in Parquet files
- support pandas.RangeIndexwhen reading a non-partitioned parquet dataset
- pandas.Dataframe.to_parquet()to Hadoop Distributed File System (HDFS)
- pandas.Dataframe.to_parquet()always writes- pandas.RangeIndexto Parquet metadata
- support pandas.Dataframe.to_parquet()writing datetime64 (default in Pandas) anddatatime.datetypes to Parquet files
- support decimal.Decimaltype in dataframes and Parquet I/O
- Support for &,|, andpandas.Series.dtinpandas.Dataframe.query()
- Support added for groupby lastoperation
- min,- max, and- sumsupport in- groupby()for string columns
- non-constant list of column names as argument support for functions
    like groupby()
- MultiIndex support for groupby(...).agg(as_index=False)
- pandas.Dataframe.merge()one dataframe on index, and the other on a column
- sorting compilation time improvement
- supports for integer, float, string, string list, datetime.date,datetime.datetime, anddatetime.timedeltatypes inpandas.Series.cummin(),pandas.DataFrame.cummin(),pandas.Series.cummax(), andpandas.DataFrame.cummax()
- NAs in- datetime.datearray
- better datetime.timedeltasupport
- Support for minandmaxinpandas.Timestampanddatetime.date
- pandas.DataFrame.all()for boolean series
- pandas.Series.astype()to float, int, str
- Convert string columns to float using astype()
- NAsupport for- Series.str.split()
- refactored and improved Dataframe indexing: pandas.loc(),pandas.Dataframe.iloc(), andpandas.Dataframe.iat()
- better support for pandas.Series.shift(),pandas.Series.pct_change(),pandas.Dataframe.drop()
- set dataframe column using a scalar
- support for Index.values
- Addition support for String columns
Bug Fix¶
- pandas.join()produce the correct index.
- pandas.groupby()use the latest schema
- groupby(...).cumsum()preserves index
- groupby(...).agg()when passing a dictionary of functions: support mix of multi-function lists and single functions
- Fixed Numpy slicing error in a corner case when the slice is equivalent to array and array size is a constant
- proper construction of dataframe from slicing Numpy 2D array
- pandas.read_csvreads a dataframe containing only datetime like columns
- When using pandas.merge()andpandas.join()integer columns which can have a missing valueNAare returned as nullable integer array (as opposed to0and-1before)
- avoid errors in comparing Pandas and Numpy