Skip to content

DataFrame

Bodo provides extensive DataFrame support. This section covers the DataFrame API.

Creation

Function Description
pd.DataFrame Create a DataFrame

Attributes and underlying data

Function Description
pd.DataFrame.columns The column labels of the DataFrame
pd.DataFrame.dtypes Return the dtypes in the DataFrame
pd.DataFrame.empty Indicator whether DataFrame is empty
pd.DataFrame.index The index (row labels) of the DataFrame
pd.DataFrame.ndim Number of axes / array dimensions
pd.DataFrame.select_dtypes Return a subset of the DataFrame's columns based on the column dtypes
pd.DataFrame.filter Subset the DataFrame rows or columns according to the specified index labels
pd.DataFrame.shape Return a tuple representing the dimensionality of the DataFrame
pd.DataFrame.size Number of elements in the DataFrame
pd.DataFrame.to_numpy Return a Numpy representation of the DataFrame
pd.DataFrame.values Return a Numpy representation of the DataFrame

Conversion

Function Description
pd.DataFrame.astype Cast a pandas object to a specified dtype
pd.DataFrame.copy Make a copy of the DataFrame
pd.DataFrame.isna Detect missing values
pd.DataFrame.isnull Detect missing values
pd.DataFrame.notna Detect existing (non-missing) values
pd.DataFrame.notnull Detect existing (non-missing) values
pd.DataFrame.info Print a concise summary of a DataFrame
pd.DataFrame.infer_objects Attempt to infer better dtypes for object columns

Indexing, iteration

Function Description
pd.DataFrame.head Return the first n rows
pd.DataFrame.iat Access a single value for a row/column pair by integer position
pd.DataFrame.iloc Purely integer-location based indexing for selection by position
pd.DataFrame.insert Insert column into DataFrame at specified location
pd.DataFrame.isin Determine if values are contained in a Series or DataFrame
pd.DataFrame.itertuples Iterate over DataFrame rows as namedtuples
pd.DataFrame.query Query the columns of a DataFrame with a boolean expression
pd.DataFrame.tail Return the last n rows
pd.DataFrame.where Replace values where the condition is False
pd.DataFrame.mask Replace values where the condition is True

Function Application, GroupBy & Window

Function Description
pd.DataFrame.apply Apply a function along an axis of the DataFrame
pd.DataFrame.groupby Group DataFrame using a mapper or by a Series of columns
pd.DataFrame.rolling Provide rolling window calculations

Computations / Descriptive Stats

Function Description
pd.DataFrame.abs Return a DataFrame with absolute numeric value of each element
pd.DataFrame.corr Compute pairwise correlation of columns, excluding NA/null values
pd.DataFrame.count Count non-NA cells for each column or row
pd.DataFrame.cov Compute pairwise covariance of columns, excluding NA/null values
pd.DataFrame.cumprod Return cumulative product over a DataFrame or Series axis
pd.DataFrame.cumsum Return cumulative sum over a DataFrame or Series axis
pd.DataFrame.describe Generate descriptive statistics
pd.DataFrame.diff First discrete difference of element
pd.DataFrame.max Return the maximum of the values for the requested axis
pd.DataFrame.mean Return the mean of the values for the requested axis
pd.DataFrame.median Return the median of the values for the requested axis
pd.DataFrame.min Return the minimum of the values for the requested axis
pd.DataFrame.nunique Count distinct observations over requested axis
pd.DataFrame.pct_change Percentage change between the current and a prior element
pd.DataFrame.pipe Apply func(self, *args, **kwargs)
pd.DataFrame.prod Return the product of the values for the requested axis
pd.DataFrame.product Return the product of the values for the requested axis
pd.DataFrame.quantile Return values at the given quantile over requested axis
pd.DataFrame.rank Compute numerical data ranks (1 through n) along axis
pd.DataFrame.std Return sample standard deviation over requested axis
pd.DataFrame.sum Return the sum of the values for the requested axis
pd.DataFrame.var Return unbiased variance over requested axis
pd.DataFrame.memory_usage Return the memory usage of each column in bytes

Reindexing / Selection / Label manipulation

Function Description
pd.DataFrame.drop Drop specified labels from rows or columns
pd.DataFrame.drop_duplicates Return DataFrame with duplicate rows removed
pd.DataFrame.duplicated Return boolean Series denoting duplicate rows
pd.DataFrame.first Select initial periods of time series data based on a date offset
pd.DataFrame.idxmax Return the row label of the maximum value
pd.DataFrame.idxmin Return the row label of the minimum value
pd.DataFrame.last Select final periods of time series data based on a date offset
pd.DataFrame.rename Alter axes labels
pd.DataFrame.reset_index Reset the index of the DataFrame
pd.DataFrame.set_index Set the DataFrame index using existing columns
pd.DataFrame.take Return the elements in the given positional indices along an axis

Missing data handling

Function Description
pd.DataFrame.dropna Remove missing values
pd.DataFrame.fillna Fill NA/NaN values using the specified method
pd.DataFrame.replace Replace values given in to_replace with value

Reshaping, sorting, transposing

Function Description
pd.DataFrame.explode Transform each element of a list-like to a row, replicating index values
pd.DataFrame.melt Unpivot a DataFrame from wide to long format
pd.DataFrame.pivot Return reshaped DataFrame organized by given index / column values
pd.DataFrame.pivot_table Create a spreadsheet-style pivot table as a DataFrame
pd.DataFrame.sample Return a random sample of items from an axis of object
pd.DataFrame.sort_index Sort object by labels (along an axis)
pd.DataFrame.sort_values Sort by the values along either axis
pd.DataFrame.to_string Render a DataFrame to a console-friendly tabular output

Combining / joining / merging

Function Description
pd.DataFrame.assign Assign new columns to a DataFrame
pd.DataFrame.join Join columns with other DataFrame either on index or on a key column
pd.DataFrame.merge Merge DataFrame or named Series objects with a database-style join
Function Description
pd.DataFrame.shift Shift index by desired number of periods with an optional time freq

Serialization, IO, Conversion

Function Description
pd.DataFrame.to_csv Write object to a comma-separated values (csv) file
pd.DataFrame.to_json Convert the object to a JSON string
pd.DataFrame.to_parquet Write a DataFrame to the binary parquet format
pd.DataFrame.to_sql Write records stored in a DataFrame to a SQL database

Plotting

Function Description
pd.DataFrame.plot Plot data