DataFrame¶
Bodo provides extensive DataFrame support. This section covers the DataFrame API.
Creation¶
| Function | Description | 
|---|---|
| pd.DataFrame | Create a DataFrame | 
Attributes and underlying data¶
| Function | Description | 
|---|---|
| pd.DataFrame.columns | The column labels of the DataFrame | 
| pd.DataFrame.dtypes | Return the dtypes in the DataFrame | 
| pd.DataFrame.empty | Indicator whether DataFrame is empty | 
| pd.DataFrame.index | The index (row labels) of the DataFrame | 
| pd.DataFrame.ndim | Number of axes / array dimensions | 
| pd.DataFrame.select_dtypes | Return a subset of the DataFrame's columns based on the column dtypes | 
| pd.DataFrame.filter | Subset the DataFrame rows or columns according to the specified index labels | 
| pd.DataFrame.shape | Return a tuple representing the dimensionality of the DataFrame | 
| pd.DataFrame.size | Number of elements in the DataFrame | 
| pd.DataFrame.to_numpy | Return a Numpy representation of the DataFrame | 
| pd.DataFrame.values | Return a Numpy representation of the DataFrame | 
Conversion¶
| Function | Description | 
|---|---|
| pd.DataFrame.astype | Cast a pandas object to a specified dtype | 
| pd.DataFrame.copy | Make a copy of the DataFrame | 
| pd.DataFrame.isna | Detect missing values | 
| pd.DataFrame.isnull | Detect missing values | 
| pd.DataFrame.notna | Detect existing (non-missing) values | 
| pd.DataFrame.notnull | Detect existing (non-missing) values | 
| pd.DataFrame.info | Print a concise summary of a DataFrame | 
| pd.DataFrame.infer_objects | Attempt to infer better dtypes for object columns | 
Indexing, iteration¶
| Function | Description | 
|---|---|
| pd.DataFrame.head | Return the first nrows | 
| pd.DataFrame.iat | Access a single value for a row/column pair by integer position | 
| pd.DataFrame.iloc | Purely integer-location based indexing for selection by position | 
| pd.DataFrame.insert | Insert column into DataFrame at specified location | 
| pd.DataFrame.isin | Determine if values are contained in a Series or DataFrame | 
| pd.DataFrame.itertuples | Iterate over DataFrame rows as namedtuples | 
| pd.DataFrame.query | Query the columns of a DataFrame with a boolean expression | 
| pd.DataFrame.tail | Return the last nrows | 
| pd.DataFrame.where | Replace values where the condition is False | 
| pd.DataFrame.mask | Replace values where the condition is True | 
Function Application, GroupBy & Window¶
| Function | Description | 
|---|---|
| pd.DataFrame.apply | Apply a function along an axis of the DataFrame | 
| pd.DataFrame.groupby | Group DataFrame using a mapper or by a Series of columns | 
| pd.DataFrame.rolling | Provide rolling window calculations | 
Computations / Descriptive Stats¶
| Function | Description | 
|---|---|
| pd.DataFrame.abs | Return a DataFrame with absolute numeric value of each element | 
| pd.DataFrame.corr | Compute pairwise correlation of columns, excluding NA/null values | 
| pd.DataFrame.count | Count non-NA cells for each column or row | 
| pd.DataFrame.cov | Compute pairwise covariance of columns, excluding NA/null values | 
| pd.DataFrame.cumprod | Return cumulative product over a DataFrame or Series axis | 
| pd.DataFrame.cumsum | Return cumulative sum over a DataFrame or Series axis | 
| pd.DataFrame.describe | Generate descriptive statistics | 
| pd.DataFrame.diff | First discrete difference of element | 
| pd.DataFrame.max | Return the maximum of the values for the requested axis | 
| pd.DataFrame.mean | Return the mean of the values for the requested axis | 
| pd.DataFrame.median | Return the median of the values for the requested axis | 
| pd.DataFrame.min | Return the minimum of the values for the requested axis | 
| pd.DataFrame.nunique | Count distinct observations over requested axis | 
| pd.DataFrame.pct_change | Percentage change between the current and a prior element | 
| pd.DataFrame.pipe | Apply func(self, *args, **kwargs) | 
| pd.DataFrame.prod | Return the product of the values for the requested axis | 
| pd.DataFrame.product | Return the product of the values for the requested axis | 
| pd.DataFrame.quantile | Return values at the given quantile over requested axis | 
| pd.DataFrame.rank | Compute numerical data ranks (1 through n) along axis | 
| pd.DataFrame.std | Return sample standard deviation over requested axis | 
| pd.DataFrame.sum | Return the sum of the values for the requested axis | 
| pd.DataFrame.var | Return unbiased variance over requested axis | 
| pd.DataFrame.memory_usage | Return the memory usage of each column in bytes | 
Reindexing / Selection / Label manipulation¶
| Function | Description | 
|---|---|
| pd.DataFrame.drop | Drop specified labels from rows or columns | 
| pd.DataFrame.drop_duplicates | Return DataFrame with duplicate rows removed | 
| pd.DataFrame.duplicated | Return boolean Series denoting duplicate rows | 
| pd.DataFrame.first | Select initial periods of time series data based on a date offset | 
| pd.DataFrame.idxmax | Return the row label of the maximum value | 
| pd.DataFrame.idxmin | Return the row label of the minimum value | 
| pd.DataFrame.last | Select final periods of time series data based on a date offset | 
| pd.DataFrame.rename | Alter axes labels | 
| pd.DataFrame.reset_index | Reset the index of the DataFrame | 
| pd.DataFrame.set_index | Set the DataFrame index using existing columns | 
| pd.DataFrame.take | Return the elements in the given positional indices along an axis | 
Missing data handling¶
| Function | Description | 
|---|---|
| pd.DataFrame.dropna | Remove missing values | 
| pd.DataFrame.fillna | Fill NA/NaN values using the specified method | 
| pd.DataFrame.replace | Replace values given in to_replace with value | 
Reshaping, sorting, transposing¶
| Function | Description | 
|---|---|
| pd.DataFrame.explode | Transform each element of a list-like to a row, replicating index values | 
| pd.DataFrame.melt | Unpivot a DataFrame from wide to long format | 
| pd.DataFrame.pivot | Return reshaped DataFrame organized by given index / column values | 
| pd.DataFrame.pivot_table | Create a spreadsheet-style pivot table as a DataFrame | 
| pd.DataFrame.sample | Return a random sample of items from an axis of object | 
| pd.DataFrame.sort_index | Sort object by labels (along an axis) | 
| pd.DataFrame.sort_values | Sort by the values along either axis | 
| pd.DataFrame.to_string | Render a DataFrame to a console-friendly tabular output | 
Combining / joining / merging¶
| Function | Description | 
|---|---|
| pd.DataFrame.assign | Assign new columns to a DataFrame | 
| pd.DataFrame.join | Join columns with other DataFrame either on index or on a key column | 
| pd.DataFrame.merge | Merge DataFrame or named Series objects with a database-style join | 
Time series-related¶
| Function | Description | 
|---|---|
| pd.DataFrame.shift | Shift index by desired number of periods with an optional time freq | 
Serialization, IO, Conversion¶
| Function | Description | 
|---|---|
| pd.DataFrame.to_csv | Write object to a comma-separated values (csv) file | 
| pd.DataFrame.to_json | Convert the object to a JSON string | 
| pd.DataFrame.to_parquet | Write a DataFrame to the binary parquet format | 
| pd.DataFrame.to_sql | Write records stored in a DataFrame to a SQL database | 
Plotting¶
| Function | Description | 
|---|---|
| pd.DataFrame.plot | Plot data |