DataFrame¶
Bodo provides extensive DataFrame support. This section covers the DataFrame API.
Creation¶
Function | Description |
---|---|
pd.DataFrame |
Create a DataFrame |
Attributes and underlying data¶
Function | Description |
---|---|
pd.DataFrame.columns |
The column labels of the DataFrame |
pd.DataFrame.dtypes |
Return the dtypes in the DataFrame |
pd.DataFrame.empty |
Indicator whether DataFrame is empty |
pd.DataFrame.index |
The index (row labels) of the DataFrame |
pd.DataFrame.ndim |
Number of axes / array dimensions |
pd.DataFrame.select_dtypes |
Return a subset of the DataFrame's columns based on the column dtypes |
pd.DataFrame.filter |
Subset the DataFrame rows or columns according to the specified index labels |
pd.DataFrame.shape |
Return a tuple representing the dimensionality of the DataFrame |
pd.DataFrame.size |
Number of elements in the DataFrame |
pd.DataFrame.to_numpy |
Return a Numpy representation of the DataFrame |
pd.DataFrame.values |
Return a Numpy representation of the DataFrame |
Conversion¶
Function | Description |
---|---|
pd.DataFrame.astype |
Cast a pandas object to a specified dtype |
pd.DataFrame.copy |
Make a copy of the DataFrame |
pd.DataFrame.isna |
Detect missing values |
pd.DataFrame.isnull |
Detect missing values |
pd.DataFrame.notna |
Detect existing (non-missing) values |
pd.DataFrame.notnull |
Detect existing (non-missing) values |
pd.DataFrame.info |
Print a concise summary of a DataFrame |
pd.DataFrame.infer_objects |
Attempt to infer better dtypes for object columns |
Indexing, iteration¶
Function | Description |
---|---|
pd.DataFrame.head |
Return the first n rows |
pd.DataFrame.iat |
Access a single value for a row/column pair by integer position |
pd.DataFrame.iloc |
Purely integer-location based indexing for selection by position |
pd.DataFrame.insert |
Insert column into DataFrame at specified location |
pd.DataFrame.isin |
Determine if values are contained in a Series or DataFrame |
pd.DataFrame.itertuples |
Iterate over DataFrame rows as namedtuples |
pd.DataFrame.query |
Query the columns of a DataFrame with a boolean expression |
pd.DataFrame.tail |
Return the last n rows |
pd.DataFrame.where |
Replace values where the condition is False |
pd.DataFrame.mask |
Replace values where the condition is True |
Function Application, GroupBy & Window¶
Function | Description |
---|---|
pd.DataFrame.apply |
Apply a function along an axis of the DataFrame |
pd.DataFrame.groupby |
Group DataFrame using a mapper or by a Series of columns |
pd.DataFrame.rolling |
Provide rolling window calculations |
Computations / Descriptive Stats¶
Function | Description |
---|---|
pd.DataFrame.abs |
Return a DataFrame with absolute numeric value of each element |
pd.DataFrame.corr |
Compute pairwise correlation of columns, excluding NA/null values |
pd.DataFrame.count |
Count non-NA cells for each column or row |
pd.DataFrame.cov |
Compute pairwise covariance of columns, excluding NA/null values |
pd.DataFrame.cumprod |
Return cumulative product over a DataFrame or Series axis |
pd.DataFrame.cumsum |
Return cumulative sum over a DataFrame or Series axis |
pd.DataFrame.describe |
Generate descriptive statistics |
pd.DataFrame.diff |
First discrete difference of element |
pd.DataFrame.max |
Return the maximum of the values for the requested axis |
pd.DataFrame.mean |
Return the mean of the values for the requested axis |
pd.DataFrame.median |
Return the median of the values for the requested axis |
pd.DataFrame.min |
Return the minimum of the values for the requested axis |
pd.DataFrame.nunique |
Count distinct observations over requested axis |
pd.DataFrame.pct_change |
Percentage change between the current and a prior element |
pd.DataFrame.pipe |
Apply func(self, *args, **kwargs) |
pd.DataFrame.prod |
Return the product of the values for the requested axis |
pd.DataFrame.product |
Return the product of the values for the requested axis |
pd.DataFrame.quantile |
Return values at the given quantile over requested axis |
pd.DataFrame.rank |
Compute numerical data ranks (1 through n) along axis |
pd.DataFrame.std |
Return sample standard deviation over requested axis |
pd.DataFrame.sum |
Return the sum of the values for the requested axis |
pd.DataFrame.var |
Return unbiased variance over requested axis |
pd.DataFrame.memory_usage |
Return the memory usage of each column in bytes |
Reindexing / Selection / Label manipulation¶
Function | Description |
---|---|
pd.DataFrame.drop |
Drop specified labels from rows or columns |
pd.DataFrame.drop_duplicates |
Return DataFrame with duplicate rows removed |
pd.DataFrame.duplicated |
Return boolean Series denoting duplicate rows |
pd.DataFrame.first |
Select initial periods of time series data based on a date offset |
pd.DataFrame.idxmax |
Return the row label of the maximum value |
pd.DataFrame.idxmin |
Return the row label of the minimum value |
pd.DataFrame.last |
Select final periods of time series data based on a date offset |
pd.DataFrame.rename |
Alter axes labels |
pd.DataFrame.reset_index |
Reset the index of the DataFrame |
pd.DataFrame.set_index |
Set the DataFrame index using existing columns |
pd.DataFrame.take |
Return the elements in the given positional indices along an axis |
Missing data handling¶
Function | Description |
---|---|
pd.DataFrame.dropna |
Remove missing values |
pd.DataFrame.fillna |
Fill NA/NaN values using the specified method |
pd.DataFrame.replace |
Replace values given in to_replace with value |
Reshaping, sorting, transposing¶
Function | Description |
---|---|
pd.DataFrame.explode |
Transform each element of a list-like to a row, replicating index values |
pd.DataFrame.melt |
Unpivot a DataFrame from wide to long format |
pd.DataFrame.pivot |
Return reshaped DataFrame organized by given index / column values |
pd.DataFrame.pivot_table |
Create a spreadsheet-style pivot table as a DataFrame |
pd.DataFrame.sample |
Return a random sample of items from an axis of object |
pd.DataFrame.sort_index |
Sort object by labels (along an axis) |
pd.DataFrame.sort_values |
Sort by the values along either axis |
pd.DataFrame.to_string |
Render a DataFrame to a console-friendly tabular output |
Combining / joining / merging¶
Function | Description |
---|---|
pd.DataFrame.append |
Append rows of other to the end of caller, returning a new object |
pd.DataFrame.assign |
Assign new columns to a DataFrame |
pd.DataFrame.join |
Join columns with other DataFrame either on index or on a key column |
pd.DataFrame.merge |
Merge DataFrame or named Series objects with a database-style join |
Time series-related¶
Function | Description |
---|---|
pd.DataFrame.shift |
Shift index by desired number of periods with an optional time freq |
Serialization, IO, Conversion¶
Function | Description |
---|---|
pd.DataFrame.to_csv |
Write object to a comma-separated values (csv) file |
pd.DataFrame.to_json |
Convert the object to a JSON string |
pd.DataFrame.to_parquet |
Write a DataFrame to the binary parquet format |
pd.DataFrame.to_sql |
Write records stored in a DataFrame to a SQL database |
Plotting¶
Function | Description |
---|---|
pd.DataFrame.plot |
Plot data |