DataFrame¶
Bodo provides extensive DataFrame support documented below.
pd.Dataframe¶
-
pandas. DataFrame (data=None, index=None, columns=None, dtype=None, copy=None)Supported Arguments
data: constant key dictionary, 2D Numpy arraycolumnsargument is required when using a 2D Numpy array
index: List, Tuple, Pandas index types, Pandas array types, Pandas series types, Numpy array typescolumns: Constant list of String, Constant tuple of String- Must be constant at Compile Time
dtype: All values supported withdataframe.astype(see below)copy: boolean- Must be constant at Compile Time
Attributes and underlying data¶
`pd.DataFrame.columns++
-
pandas.DataFrame. columns
Example Usage
pd.DataFrame.dtypes¶
-
pandas.DataFrame. dtypes
Example Usage
pd.DataFrame.empty¶
-
pandas.DataFrame. empty
Example Usage
pd.DataFrame.index¶
-
pandas.DataFrame. index
Example Usage
pd.DataFrame.ndim¶
-
pandas.DataFrame. ndim
Example Usage
pd.DataFrame.select_dtypes¶
-
pandas.DataFrame. select_dtypes (include=None, exclude=None)
Supported Argumentsinclude: string, type, List or tuple of string/type- Must be constant at Compile Time
exclude: string, type, List or tuple of string/type- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df= pd.DataFrame({"A": [1], "B": ["X"], "C": [pd.Timedelta(10, unit="D")], "D": [True], "E": [3.1]}) ... out_1 = df_l.select_dtypes(exclude=[np.float64, "bool"]) ... out_2 = df_l.select_dtypes(include="int") ... out_3 = df_l.select_dtypes(include=np.bool_, exclude=(np.int64, "timedelta64[ns]")) ... formated_out = "\n".join([out_1.to_string(), out_2.to_string(), out_3.to_string()]) ... return formated_out >>> f() A B C 0 1 X 10 days A 0 1 D 0 True
pd.DataFrame.filter¶
-
pandas.DataFrame. filter (items=None, like=None, regex=None, axis=None)Supported Arguments
items: Constant list of Stringlike: Constant stringregex: Constant Stringaxis(only supports the "column" axis): Constant String, Constant integer
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"ababab": [1], "hello world": [2], "A": [3]}) ... filtered_df_1 = pd.DataFrame({"ababab": [1], "hello world": [2], "A": [3]}).filter(items = ["A"]) ... filtered_df_2 = pd.DataFrame({"ababab": [1], "hello world": [2], "A": [3]}).filter(like ="hello", axis = "columns") ... filtered_df_3 = pd.DataFrame({"ababab": [1], "hello world": [2], "A": [3]}).filter(regex="(ab){3}", axis = 1) ... formated_out = "\n".join([filtered_df_1.to_string(), filtered_df_2.to_string(), filtered_df_3.to_string()]) ... return formated_out >>> f() A 0 3 hello world 0 2 ababab 0 1
pd.DataFrame.shape¶
-
pandas.DataFrame. shape
Example Usage
pd.DataFrame.size¶
-
pandas.DataFrame. size
Example Usage
pd.DataFrame.to_numpy¶
-
pandas.DataFrame. to_numpy (dtype=None, copy=False, na_value=NoDefault.no_default)
Supported Argumentscopy: boolean
Example Usage
pd.DataFrame.values¶
-
pandas.DataFrame. (only for numeric dataframes)values
Example Usage
Conversion¶
pd.DataFrame.astype¶
-
pandas.DataFrame. astype (dtype, copy=True, errors='raise')
Supported Argumentsdtype: dict of string column names keys, and Strings/types values. String (string must be parsable bynp.dtype), Valid type (see types), The following functions: float, int, bool, str- Must be constant at Compile Time
Example Usage
pd.DataFrame.copy¶
-
pandas.DataFrame. copy (deep=True)Supported Arguments
copy: boolean
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3]}) ... shallow_df = df.copy(deep=False) ... deep_df = df.copy() ... shallow_df["A"][0] = -1 ... formated_out = "\n".join([df.to_string(), shallow_df.to_string(), deep_df.to_string()]) ... return formated_out >>> f() A 0 -1 1 2 2 3 A 0 -1 1 2 2 3 A 0 1 1 2 2 3
pd.DataFrame.isna¶
-
pandas.DataFrame. isna ()
Example Usage
pd.DataFrame.isnull¶
-
pandas.DataFrame. isnull ()
Example Usage
pd.DataFrame.notna¶
-
pandas.DataFrame. notna ()
Example Usage
pd.DataFrame.notnull¶
-
pandas.DataFrame. notnull ()
Example Usage
pd.DataFrame.info¶
-
pandas.DataFrame. info (verbose=None, buf=None, max_cols=None, memory_usage=None, show_counts=None, null_counts=None)
Supported Arguments: NoneExample Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": ["X", "Y", "Z"], "C": [pd.Timedelta(10, unit="D"), pd.Timedelta(10, unit="H"), pd.Timedelta(10, unit="S")]}) ... return df.info() >>> f() <class 'DataFrameType'> RangeIndexType(none): 3 entries, 0 to 2 Data columns (total 3 columns): # Column Non-Null Count Dtype 0 A 3 non-null int64 1 B 3 non-null unicode_type 2 C 3 non-null timedelta64[ns] dtypes: int64(1), timedelta64[ns](1), unicode_type(1) memory usage: 108.0 bytesNote
The exact output string may vary slightly from Pandas.
pd.DataFrame.infer_objects¶
-
pandas.DataFrame. infer_objects ()
Example Usage>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3]}) ... return df.infer_objects() A 0 1 1 2 2 3Note
Bodo does not internally use the object dtype, so types are never inferred. As a result, this API just produces a deep copy, consistent with Pandas.
Indexing, iteration¶
pd.DataFrame.head¶
-
pandas.DataFrame. head (n=5)Supported Arguments
head: integer
Example Usage
pd.DataFrame.iat¶
-
pandas.DataFrame. iat Note
We only support indexing using
iatusing a pair of integers. We require that the second int (the column integer) is a compile time constantExample Usage
pd.DataFrame.iloc¶
-
pandas.DataFrame. iloc getitem:
df.ilocsupports single integer indexing (returns row as series)df.iloc[0]df.ilocsupports single list/array/series of integers/booldf.iloc[[0,1,2]]- for tuples indexing
df.iloc[row_idx, col_idx]we allow:row_idxto be int list/array/series of integers/bool slicecol_idxto be constant int, constant list of integers, or constant slice
- e.g.:
df.iloc[[0,1,2], :]
setitem:
df.iloconly supports scalar setitemdf.iloconly supports tuple indexingdf.iloc[row_idx, col_idx]row_idxcan be anything supported for series setitem:- int
- list/array/series of integers/bool
- slice
col_idxcan be: constant int, constant list/tuple of integers
Example Usage
pd.DataFrame.insert¶
-
pandas.DataFrame. insert (loc, column, value, allow_duplicates=False)
Supported Argumentsloc: constant integercolumn: constant stringvalue: scalar, list/tuple, Pandas/Numpy array, Pandas index types, seriesallow_duplicates: constant boolean
Example Usage
pd.DataFrame.isin¶
-
pandas.DataFrame. isin (values)
Supported Argumentsvalues: DataFrame (must have same indices) + iterable type, Numpy array types, Pandas array types, List/Tuple, Pandas Index Types (excluding interval Index and MultiIndex)
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... isin_1 = df.isin([1,5,9]) ... isin_2 = df.isin(pd.DataFrame({"A": [4,5,6], "C": [7,8,9]})) ... formated_out = "\n".join([isin_1.to_string(), isin_2.to_string()]) ... return formated_out >>> f() A B C 0 True False False 1 False True False 2 False False True A B C 0 False False True 1 False False True 2 False False TrueNote
DataFrame.isinignores DataFrame indices. For example:>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.isin(pd.DataFrame({"A": [1,2,3]}, index=["A", "B", "C"])) >>> f() A B C 0 True False False 1 True False False 2 True False False >>> def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.isin(pd.DataFrame({"A": [1,2,3]}, index=["A", "B", "C"])) >>> f() A B C 0 False False False 1 False False False 2 False False False
pd.DataFrame.itertuples¶
-
pandas.DataFrame. itertuples (index=True, name='Pandas')
Supported Arguments: NoneExample Usage
pd.DataFrame.query¶
-
pandas.DataFrame. query (expr, inplace=False, **kwargs)Supported Arguments
expr: Constant String
Example Usage
>>> @bodo.jit ... def f(a): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.query('A > @a') >>> f(1) A B C 1 2 5 8 2 3 6 9Note
- The output of the query must evaluate to a 1d boolean array.
- Cannot refer to the index by name in the query string.
- Query must be one line.
- If using environment variables, they should be passed as arguments to the function.
pd.DataFrame.tail¶
-
pandas.DataFrame. tail (n=5)Supported Arguments
n: Integer
Example Usage
-
pandas.DataFrame. where (cond, other=np.nan, inplace=False, axis=1, level=None, errors='raise', try_cast=NoDefault.no_default)
Supported Argumentscond: Boolean DataFrame, Boolean Series, Boolean Array- If 1-dimensional array or Series is provided, equivalent to Pandas
df.wherewithaxis=1.
- If 1-dimensional array or Series is provided, equivalent to Pandas
other: Scalar, DataFrame, Series, 1 or 2-D Array,None- Data types in
othermust match corresponding entries in DataFrame. Noneor omitting argument defaults to the respectiveNAvalue for each type.
- Data types in
Note
DataFrame can contain categorical data if
otheris a scalar.Example Usage
-
pandas.DataFrame. mask (cond, other=np.nan, inplace=False, axis=1, level=None, errors='raise', try_cast=NoDefault.no_default)Supported Arguments
cond: Boolean DataFrame,Boolean Series,Boolean Array- If 1-dimensional array or Series is provided, equivalent to Pandas
df.maskwithaxis=1. other: Scalar, DataFrame, Series, 1 or 2-D ArrayNone, - Data types inothermust match corresponding entries in DataFrame.Noneor omitting argument defaults to the respectiveNAvalue for each type.
Note
DataFrame can contain categorical data if
otheris a scalar.Example Usage
Function application, GroupBy & Window¶
pd.DataFrame.apply¶
-
pandas.DataFrame. apply (func, axis=0, raw=False, result_type=None, args=(), _bodo_inline=False, **kwargs)Supported Arguments
func: function (e.g. lambda) (axis must = 1), jit function (axis must = 1), String which refers to a supported DataFrame method- Must be constant at Compile Time
axis: Integer (0, 1), String (only if the method takes axis as an argument )- Must be constant at Compile Time
_bodo_inline: boolean- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.apply(lambda x: x["A"] * (x["B"] + x["C"])) >>> f() 0 11 1 26 2 45 dtype: int64Note
Supports extra
_bodo_inlineboolean argument to manually control bodo's inlining behavior. Inlining user-defined functions (UDFs) can potentially improve performance at the expense of extra compilation time. Bodo uses heuristics to make a decision automatically if_bodo_inlineis not provided.
pd.DataFrame.groupby¶
-
pandas.DataFrame. groupby (by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)
Supported Argumentsby: String column label, List/Tuple of column labels- Must be constant at Compile Time
as_index: boolean- Must be constant at Compile Time
dropna: boolean- Must be constant at Compile Time
Note
sort=Falseandobserved=Trueare set by default. These are the only support values for sort and observed. For more information on using groupby, see the groupby section.Example Usage
pd.DataFrame.rolling¶
-
pandas.DataFrame. rolling (window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')
Supported Argumentswindow: Integer, String (must be parsable as a time offset),datetime.timedelta,pd.Timedelta`, List/Tuple of column labelsmin_periods: Integercenter: booleanon: Scalar column label- Must be constant at Compile Time
dropna:boolean- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3,4,5]}) ... return df.rolling(3,center=True).mean() >>> f() A 0 NaN 1 2.0 2 3.0 3 4.0 4 NaNFor more information, please see the Window section.
Computations / Descriptive Stats¶
pd.DataFrame.abs¶
-
pandas.DataFrame. abs ()Note
Only supported for dataframes containing numerical data and Timedeltas
Example Usage
pd.DataFrame.corr¶
-
pandas.DataFrame. corr (method='pearson', min_periods=1)Supported Arguments
min_periods: Integer
Example Usage
pd.DataFrame.count¶
-
pandas.DataFrame. count (axis=0, level=None, numeric_only=False)
Supported Arguments : NoneExample Usage
pd.DataFrame.cov¶
-
pandas.DataFrame. cov (min_periods=None, ddof=1)Supported Arguments
min_periods: Integer
Example Usage
pd.DataFrame.cumprod¶
-
pandas.DataFrame. cumprod (axis=None, skipna=True)
Supported Arguments : NoneExample Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1, 2, 3], "B": [.1,np.NaN,12.3],}) ... return df.cumprod() >>> f() A B 0 1 0.1 1 2 NaN 2 6 NaNNote
Not supported for dataframe with nullable integer.
pd.DataFrame.cumsum¶
-
pandas.DataFrame. cumsum (axis=None, skipna=True)
Supported Arguments : NoneExample Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1, 2, 3], "B": [.1,np.NaN,12.3],}) ... return df.cumsum() >>> f() A B 0 1 0.1 1 3 NaN 2 6 NaNNote
Not supported for dataframe with nullable integer.
pd.DataFrame.describe¶
-
pandas.DataFrame. describe (percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
Supported Arguments : NoneExample Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [pd.Timestamp(2000, 10, 2), pd.Timestamp(2001, 9, 5), pd.Timestamp(2002, 3, 11)]}) ... return df.describe() >>> f() A B count 3.0 3 mean 2.0 2001-07-16 16:00:00 min 1.0 2000-10-02 00:00:00 25% 1.5 2001-03-20 00:00:00 50% 2.0 2001-09-05 00:00:00 75% 2.5 2001-12-07 12:00:00 max 3.0 2002-03-11 00:00:00 std 1.0 NaNNote
Only supported for dataframes containing numeric data, and datetime data. Datetime_is_numeric defaults to True in JIT code.
pd.DataFrame.diff¶
-
pandas.DataFrame. diff (periods=1, axis=0)Supported Arguments
periods: Integer
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [pd.Timestamp(2000, 10, 2), pd.Timestamp(2001, 9, 5), pd.Timestamp(2002, 3, 11)]}) ... return df.diff(1) >>> f() A B 0 NaN NaT 1 1.0 338 days 2 1.0 187 daysNote
Only supported for dataframes containing float, non-null int, and datetime64ns values
pd.DataFrame.max¶
-
pandas.DataFrame. max (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.max(axis=1) >>> f() 0 7 1 8 2 9Note
Only supported for dataframes containing float, non-null int, and datetime64ns values.
pd.DataFrame.mean¶
-
pandas.DataFrame. mean (axis=None, skipna=None, level=None, numeric_only=None)
Supported Argumentsaxis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.mean(axis=1) >>> f() 0 4.0 1 5.0 2 6.0Note
Only supported for dataframes containing float, non-null int, and datetime64ns values.
pd.DataFrame.median¶
-
pandas.DataFrame. median (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.median(axis=1) >>> f() 0 4.0 1 5.0 2 6.0Note
Only supported for dataframes containing float, non-null int, and datetime64ns values.
pd.DataFrame.min¶
-
pandas.DataFrame. min (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.min(axis=1) >>> f() 0 1 1 2 2 3Note
Only supported for dataframes containing float, non-null int, and datetime64ns values.
pd.DataFrame.nunique¶
-
pandas.DataFrame. nunique (axis=0, dropna=True)Supported Arguments
dropna: boolean
Example Usage
pd.DataFrame.pct_change¶
-
pandas.DataFrame. pct_change (periods=1, fill_method='pad', limit=None, freq=None)Supported Arguments
periods: Integer
Example Usage
pd.DataFrame.pipe¶
-
pandas.DataFrame. pipe (func, *args, **kwargs)Supported Arguments
func: JIT function or callable defined within a JIT function.- Additional arguments for
funccan be passed as additional arguments.
- Additional arguments for
Note
funccannot be a tupleExample Usage
pd.DataFrame.prod¶
-
pandas.DataFrame. prod (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.product¶
-
pandas.DataFrame. product (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.quantile¶
-
pandas.DataFrame. quantile (q=0.5, axis=0, numeric_only=True, interpolation='linear')Supported Arguments
q: Float or Int- must be 0<= q <= 1
axis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.std¶
-
pandas.DataFrame. std (axis=None, skipna=None, level=None, ddof=1, numeric_only=None)Supported Arguments
axis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.sum¶
-
pandas.DataFrame. sum (axis=None, skipna=None, level=None, numeric_only=None, min_count=0)
Supported Argumentsaxis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.var¶
-
pandas.DataFrame. var (axis=None, skipna=None, level=None, ddof=1, numeric_only=None)Supported Arguments
axis: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.memory_usage¶
-
pandas.DataFrame. memory_usage (index=True, deep=False)Supported Arguments
index: boolean
Example Usage
Reindexing / Selection / Label manipulation¶
pd.DataFrame.drop¶
-
pandas.DataFrame. drop (labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')- Only dropping columns supported, either using
columnsargument or settingaxis=1and using thelabelsargument labelsandcolumnsrequire constant string, or constant list/tuple of string valuesinplacesupported with a constant boolean value- All other arguments are unsupported
Example Usage
- Only dropping columns supported, either using
pd.DataFrame.drop_duplicates¶
-
pandas.DataFrame. drop_duplicates (subset=None, keep='first', inplace=False, ignore_index=False)Supported Arguments
subset: Constant list/tuple of String column names, Constant list/tuple of Integer column names, Constant String column names, Constant Integer column names
Example Usage
pd.DataFrame.duplicated¶
-
pandas.DataFrame. duplicated (subset=None, keep='first')
Supported Arguments : NoneExample Usage
pd.DataFrame.first¶
-
pandas.DataFrame. first (offset)
Supported Argumentsoffset: String or Offset type- String argument must be a valid frequency alias.
Note
DataFrame must have a valid DatetimeIndex and is assumed to already be sorted. This function have undefined behavior if the DatetimeIndex is not sorted.
Example Usage
>>> @bodo.jit ... def f(df, offset): ... return df.first(offset) >>> df = pd.DataFrame({"A": np.arange(100), "B": np.arange(100, 200)}, index=pd.date_range(start='1/1/2022', end='12/31/2024', periods=100)) >>> f(df, "2M") A B 2022-01-01 00:00:00.000000000 0 100 2022-01-12 01:27:16.363636363 1 101 2022-01-23 02:54:32.727272727 2 102 2022-02-03 04:21:49.090909091 3 103 2022-02-14 05:49:05.454545454 4 104 2022-02-25 07:16:21.818181818 5 105
pd.DataFrame.idxmax¶
-
pandas.DataFrame. idxmax (axis=0, skipna=True)
Supported Arguments : NoneExample Usage
pd.DataFrame.idxmin¶
-
pandas.DataFrame. idxmin (axis=0, skipna=True)
Supported Arguments : NoneExample Usage
pd.DataFrame.last¶
-
pandas.DataFrame. last (offset)
Supported Argumentsoffset: String or Offset type- String argument must be a valid frequency alias
Note
DataFrame must have a valid DatetimeIndex and is assumed to already be sorted. This function have undefined behavior if the DatetimeIndex is not sorted.
Example Usage
>>> @bodo.jit ... def f(df, offset): ... return df.last(offset) >>> df = pd.DataFrame({"A": np.arange(100), "B": np.arange(100, 200)}, index=pd.date_range(start='1/1/2022', end='12/31/2024', periods=100)) >>> f(df, "2M") A B 2024-11-05 16:43:38.181818176 94 194 2024-11-16 18:10:54.545454544 95 195 2024-11-27 19:38:10.909090912 96 196 2024-12-08 21:05:27.272727264 97 197 2024-12-19 22:32:43.636363632 98 198 2024-12-31 00:00:00.000000000 99 199
pd.DataFrame.rename¶
-
pandas.DataFrame. rename (mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')
Supported Argumentsmapper: must be constant dictionary.- Can only be used alongside axis=1
columns: must be constant dictionaryaxis: Integer- Can only be used alongside mapper argument
copy: booleaninplace: must be constant boolean
Example Usage
pd.DataFrame.reset_index¶
-
pandas.DataFrame. reset_index (level=None, drop=False, inplace=False, col_level=0, col_fill='')
Supported Argumentslevel: Integer- If specified, must drop all levels.
drop: Constant booleaninplace: Constant boolean
Example Usage
pd.DataFrame.set_index¶
-
pandas.DataFrame. set_index (keys, drop=True, append=False, inplace=False, verify_integrity=False)
Supported Arguments- keys: must be a constant string
Example Usage
pd.DataFrame.take¶
-
pandas.DataFrame. take (indices, axis=0, is_copy=None)Supported Arguments
- indices: scalar Integer, Pandas Integer Array, Numpy Integer Array, Integer Series
Example Usage
Missing data handling¶
pd.DataFrame.dropna¶
-
pandas.DataFrame. dropna (axis=0, how='any', thresh=None, subset=None, inplace=False)
Supported Argumentshow: Constant String: either "all" or "any"thresh: Integersubset: Constant list/tuple of String column names, Constant list/tuple of Integer column names, Constant String column names, Constant Integer column names
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3,None], "B": [4, 5,None, None], "C": [6, None, None, None]}) ... df_1 = df.dropna(how="all", subset=["B", "C"]) ... df_2 = df.dropna(thresh=3) ... formated_out = "\n".join([df_1.to_string(), df_2.to_string()]) ... return formated_out >>> f() A B C 0 1 4 6 1 2 5 <NA> A B C 0 1 4 6
pd.DataFrame.fillna¶
-
pandas.DataFrame. fillna (value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Supported Argumentsvalue: various scalars- Must be of the same type as the filled column
inplace: Constant booleaninplaceis not supported alongside method
method: One ofbfill,backfill,ffill, orpad- Must be constant at Compile Time
inplaceis not supported alongside method
Example Usage
pd.DataFrame.replace¶
-
pandas.DataFrame. replace (to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')Supported Arguments
to_replace: various scalars- Required argument
value: various scalars- Must be of the same type as to_replace
Example Usage
Reshaping, sorting, transposing¶
pd.DataFrame.explode¶
-
pandas.DataFrame. explode (column, ignore_index=False)
Supported Arguments
column: Constant Column label or list of labels
Example Usage
pd.DataFrame.pivot¶
-
pandas.DataFrame. pivot (values=None, index=None, columns=None)Supported Arguments
values: Constant Column Label or list of labelsindex: Constant Column Label or list of labelscolumns: Constant Column Label
Note
The the number of columns and names of the output DataFrame won't be known at compile time. To update typing information on DataFrame you should pass it back to Python.
Example Usage
pd.DataFrame.pivot_table¶
-
pandas.DataFrame. pivot_table (values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)Supported Arguments
values: Constant Column Label or list of labelsindex: Constant Column Label or list of labelscolumns: Constant Column Labelaggfunc: String Constant
Note
This code takes two different paths depending on if pivot values are annotated. When pivot values are annotated then output columns are set to the annotated values. For example,
@bodo.jit(pivots={'pt': ['small', 'large']})declares the output pivot tableptwill have columns calledsmallandlarge.If pivot values are not annotated, then the number of columns and names of the output DataFrame won't be known at compile time. To update typing information on DataFrame you should pass it back to Python.
Example Usage
>>> @bodo.jit(pivots={'pivoted_tbl': ['X', 'Y']}) ... def f(): ... df = pd.DataFrame({"A": ["X","X","X","X","Y","Y"], "B": [1,2,3,4,5,6], "C": [10,11,12,20,21,22]}) ... pivoted_tbl = df.pivot_table(columns="A", index="B", values="C", aggfunc="mean") ... return pivoted_tbl >>> f() X Y B 1 10.0 NaN 2 11.0 NaN 3 12.0 NaN 4 20.0 NaN 5 NaN 21.0 6 NaN 22.0
pd.DataFrame.sample¶
-
pandas.DataFrame. sample (n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)Supported Arguments
n: Integerfrac: Floatreplace: boolean
Example Usage
pd.DataFrame.sort_index¶
-
pandas.DataFrame. sort_index (axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)Supported Arguments
ascending: booleanna_position:constant String ("first" or "last")
Example Usage
pd.DataFrame.sort_values¶
-
pandas.DataFrame. sort_values (by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)
Supported Argumentsby: constant String or constant list of stringsascending: boolean, list/tuple of boolean, with length equal to the number of key columnsinplace: Constant booleanna_position: constant String ("first" or "last"), constant list/tuple of String, with length equal to the number of key columns
Example Usage
pd.DataFrame.to_string¶
-
pandas.DataFrame. to_string (buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, min_rows=None, max_cols=None, show_dimensions=False, decimal='.', line_width=None, max_colwidth=None, encoding=None)
Supported Argumentsbufcolumnscol_spaceheaderindexna_repformattersfloat_formatsparsifyindex_namesjustifymax_rowsmin_rowsmax_colshow_dimensionsdecimalline_widthmax_colwidthencoding
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3]}) ... return df.to_string() >>> f() A 0 1 1 2 2 3Note
- This function is not optimized.
- When called on a distributed dataframe, the string returned for each rank will be reflective of the dataframe for that rank.
Combining / joining / merging¶
pd.DataFrame.append¶
-
pandas.DataFrame. append (other, ignore_index=False, verify_integrity=False, sort=False)Supported Arguments
other: DataFrame, list/tuple of DataFrameignore_index: constant boolean
Example Usage
pd.DataFrame.assign¶
-
pandas.DataFrame. assign (**kwargs)Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6]}) ... df2 = df.assign(C = 2 * df["B"], D = lambda x: x.C -1) ... return df2 >>> f() A B C D 0 1 4 8 -8 1 2 5 10 -10 2 3 6 12 -12Note
arguments can be JIT functions, lambda functions, or values that can be used to initialize a Pandas Series.
pd.DataFrame.join¶
-
pandas.DataFrame. join (other, on=None, how='left', lsuffix='', rsuffix='', sort=False)Supported Arguments
other: DataFrameon: constant string column name, constant list/tuple of column names
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,1,3], "B": [4,5,6]}) ... return df.join(on = "A", other=pd.DataFrame({"C": [-1,-2,-3], "D": [4,5,6]})) >>> f() A B C D 0 1 4 -2 5 1 1 5 -2 5 2 3 6 <NA> <NA>Note
Joined dataframes cannot have common columns. The output dataframe is not sorted by default for better parallel performance
pd.DataFrame.merge¶
-
pandas.DataFrame. merge (right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)Note
See
pd.mergefor full list of supported arguments, and more examples.Example Usage
Time series-related¶
pd.DataFrame.shift¶
-
pandas.DataFrame. shift (periods=1, freq=None, axis=0, fill_value=NoDefault.no_default)Supported Arguments
periods: Integer
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,1,3], "B": [4,5,6]}) ... return df.shift(1) >>> f() A B 0 NaN NaN 1 1.0 4.0 2 1.0 5.0Note
Only supported for dataframes containing numeric, boolean, datetime.date and string types.
Serialization, IO, Conversion¶
Also see S3 and HDFS configuration requirements and more on Scalable File I/O.
pd.DataFrame.to_csv¶
pandas.DataFrame. to_csv compressionargument defaults toNonein JIT code. This is the only supported value of this argument.modeargument supports only the default value"w".errorsargument supports only the default valuestrict.storage_optionsargument supports only the default valueNone.
pd.DataFrame.to_json¶
pandas.DataFrame. to_json
pd.DataFrame.to_parquet¶
pandas.DataFrame. to_parquet
pd.DataFrame.to_sql¶
pandas.DataFrame. to_sql
- See Example Usage and more system specific instructions.
- Argument
conis supported but only as a string form. SQLalchemyconnectableis not supported. - Argument
name,schema,if_exists,index,index_label,dtype,methodare supported. - Argument
chunksizeis not supported.
Plotting¶
pd.DataFrame.plot¶
-
pandas.DataFrame. plot (x=None, y=None, kind="line", figsize=None, xlabel=None, ylabel=None, title=None, legend=True, fontsize=None, xticks=None, yticks=None, ax=None)Supported Arguments
x: Constant String column name, Constant integery: Constant String column name, Constant integerkind: constant String ("line" or "scatter")figsize: constant numeric tuple (width, height)xlabel: constant Stringylabel: constant Stringtitle: constant Stringlegend: booleanfontsize: integerxticks: Constant Tupleyticks: Constant Tupleax: Matplotlib Axes Object