DataFrame¶
Bodo provides extensive DataFrame support documented below.
pd.DataFrame¶
-
pandas. DataFrame (data=None, index=None, columns=None, dtype=None, copy=None)Supported Arguments
data
: constant key dictionary, 2D Numpy arraycolumns
argument is required when using a 2D Numpy array
index
: List, Tuple, Pandas index types, Pandas array types, Pandas series types, Numpy array typescolumns
: Constant list of String, Constant tuple of String- Must be constant at Compile Time
dtype
: All values supported withdataframe.astype
(see below)copy
: boolean- Must be constant at Compile Time
Attributes and underlying data¶
`pd.DataFrame.columns++
-
pandas.DataFrame. columns
Example Usage
pd.DataFrame.dtypes
¶
-
pandas.DataFrame. dtypes
Example Usage
pd.DataFrame.empty
¶
-
pandas.DataFrame. empty
Example Usage
pd.DataFrame.index
¶
-
pandas.DataFrame. index
Example Usage
pd.DataFrame.ndim
¶
-
pandas.DataFrame. ndim
Example Usage
pd.DataFrame.select_dtypes
¶
-
pandas.DataFrame. select_dtypes (include=None, exclude=None)
Supported Argumentsinclude
: string, type, List or tuple of string/type- Must be constant at Compile Time
exclude
: string, type, List or tuple of string/type- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df= pd.DataFrame({"A": [1], "B": ["X"], "C": [pd.Timedelta(10, unit="D")], "D": [True], "E": [3.1]}) ... out_1 = df_l.select_dtypes(exclude=[np.float64, "bool"]) ... out_2 = df_l.select_dtypes(include="int") ... out_3 = df_l.select_dtypes(include=np.bool_, exclude=(np.int64, "timedelta64[ns]")) ... formated_out = "\n".join([out_1.to_string(), out_2.to_string(), out_3.to_string()]) ... return formated_out >>> f() A B C 0 1 X 10 days A 0 1 D 0 True
pd.DataFrame.filter
¶
-
pandas.DataFrame. filter (items=None, like=None, regex=None, axis=None)Supported Arguments
items
: Constant list of Stringlike
: Constant stringregex
: Constant Stringaxis
(only supports the "column" axis): Constant String, Constant integer
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"ababab": [1], "hello world": [2], "A": [3]}) ... filtered_df_1 = pd.DataFrame({"ababab": [1], "hello world": [2], "A": [3]}).filter(items = ["A"]) ... filtered_df_2 = pd.DataFrame({"ababab": [1], "hello world": [2], "A": [3]}).filter(like ="hello", axis = "columns") ... filtered_df_3 = pd.DataFrame({"ababab": [1], "hello world": [2], "A": [3]}).filter(regex="(ab){3}", axis = 1) ... formated_out = "\n".join([filtered_df_1.to_string(), filtered_df_2.to_string(), filtered_df_3.to_string()]) ... return formated_out >>> f() A 0 3 hello world 0 2 ababab 0 1
pd.DataFrame.shape
¶
-
pandas.DataFrame. shape
Example Usage
pd.DataFrame.size
¶
-
pandas.DataFrame. size
Example Usage
pd.DataFrame.to_numpy
¶
-
pandas.DataFrame. to_numpy (dtype=None, copy=False, na_value=NoDefault.no_default)
Supported Argumentscopy
: boolean
Example Usage
pd.DataFrame.values
¶
-
pandas.DataFrame. (only for numeric dataframes)values
Example Usage
Conversion¶
pd.DataFrame.astype
¶
-
pandas.DataFrame. astype (dtype, copy=True, errors='raise')
Supported Arguments-
dtype
: dict of string column names keys, and Strings/types values. String (string must be parsable bynp.dtype
), Valid type (see types), The following functions: float, int, bool, str- Must be constant at Compile Time
-
copy
: boolean
Example Usage
-
pd.DataFrame.copy
¶
-
pandas.DataFrame. copy (deep=True)Supported Arguments
copy
: boolean
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3]}) ... shallow_df = df.copy(deep=False) ... deep_df = df.copy() ... shallow_df["A"][0] = -1 ... formated_out = "\n".join([df.to_string(), shallow_df.to_string(), deep_df.to_string()]) ... return formated_out >>> f() A 0 -1 1 2 2 3 A 0 -1 1 2 2 3 A 0 1 1 2 2 3
pd.DataFrame.isna
¶
-
pandas.DataFrame. isna ()
Example Usage
pd.DataFrame.isnull
¶
-
pandas.DataFrame. isnull ()
Example Usage
pd.DataFrame.notna
¶
-
pandas.DataFrame. notna ()
Example Usage
pd.DataFrame.notnull
¶
-
pandas.DataFrame. notnull ()
Example Usage
pd.DataFrame.info
¶
-
pandas.DataFrame. info (verbose=None, buf=None, max_cols=None, memory_usage=None, show_counts=None, null_counts=None)
Supported Arguments: NoneExample Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": ["X", "Y", "Z"], "C": [pd.Timedelta(10, unit="D"), pd.Timedelta(10, unit="H"), pd.Timedelta(10, unit="S")]}) ... return df.info() >>> f() <class 'DataFrameType'> RangeIndexType(none): 3 entries, 0 to 2 Data columns (total 3 columns): # Column Non-Null Count Dtype 0 A 3 non-null int64 1 B 3 non-null unicode_type 2 C 3 non-null timedelta64[ns] dtypes: int64(1), timedelta64[ns](1), unicode_type(1) memory usage: 108.0 bytes
Note
The exact output string may vary slightly from Pandas.
pd.DataFrame.infer_objects
¶
-
pandas.DataFrame. infer_objects ()
Example Usage>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3]}) ... return df.infer_objects() A 0 1 1 2 2 3
Note
Bodo does not internally use the object dtype, so types are never inferred. As a result, this API just produces a deep copy, consistent with Pandas.
Indexing, iteration¶
pd.DataFrame.head
¶
-
pandas.DataFrame. head (n=5)Supported Arguments
head
: integer
Example Usage
pd.DataFrame.iat
¶
-
pandas.DataFrame. iat Note
We only support indexing using
iat
using a pair of integers. We require that the second int (the column integer) is a compile time constantExample Usage
pd.DataFrame.iloc
¶
-
pandas.DataFrame. iloc getitem:
df.iloc
supports single integer indexing (returns row as series)df.iloc[0]
df.iloc
supports single list/array/series of integers/booldf.iloc[[0,1,2]]
- for tuples indexing
df.iloc[row_idx, col_idx]
we allow:row_idx
to be int list/array/series of integers/bool slicecol_idx
to be constant int, constant list of integers, or constant slice
- e.g.:
df.iloc[[0,1,2], :]
setitem:
df.iloc
only supports scalar setitemdf.iloc
only supports tuple indexingdf.iloc[row_idx, col_idx]
row_idx
can be anything supported for series setitem:- int
- list/array/series of integers/bool
- slice
col_idx
can be: constant int, constant list/tuple of integers
Example Usage
pd.DataFrame.insert
¶
-
pandas.DataFrame. insert (loc, column, value, allow_duplicates=False)
Supported Argumentsloc
: constant integercolumn
: constant stringvalue
: scalar, list/tuple, Pandas/Numpy array, Pandas index types, seriesallow_duplicates
: constant boolean
Example Usage
pd.DataFrame.isin
¶
-
pandas.DataFrame. isin (values)
Supported Argumentsvalues
: DataFrame (must have same indices) + iterable type, Numpy array types, Pandas array types, List/Tuple, Pandas Index Types (excluding interval Index and MultiIndex)
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... isin_1 = df.isin([1,5,9]) ... isin_2 = df.isin(pd.DataFrame({"A": [4,5,6], "C": [7,8,9]})) ... formated_out = "\n".join([isin_1.to_string(), isin_2.to_string()]) ... return formated_out >>> f() A B C 0 True False False 1 False True False 2 False False True A B C 0 False False True 1 False False True 2 False False True
Note
DataFrame.isin
ignores DataFrame indices. For example:>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.isin(pd.DataFrame({"A": [1,2,3]}, index=["A", "B", "C"])) >>> f() A B C 0 True False False 1 True False False 2 True False False >>> def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.isin(pd.DataFrame({"A": [1,2,3]}, index=["A", "B", "C"])) >>> f() A B C 0 False False False 1 False False False 2 False False False
pd.DataFrame.itertuples
¶
-
pandas.DataFrame. itertuples (index=True, name='Pandas')
Supported Arguments: NoneExample Usage
pd.DataFrame.query
¶
-
pandas.DataFrame. query (expr, inplace=False, **kwargs)Supported Arguments
expr
: Constant String
Example Usage
>>> @bodo.jit ... def f(a): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.query('A > @a') >>> f(1) A B C 1 2 5 8 2 3 6 9
Note
- The output of the query must evaluate to a 1d boolean array.
- Cannot refer to the index by name in the query string.
- Query must be one line.
- If using environment variables, they should be passed as arguments to the function.
pd.DataFrame.tail
¶
-
pandas.DataFrame. tail (n=5)Supported Arguments
n
: Integer
Example Usage
-
pandas.DataFrame. where (cond, other=np.nan, inplace=False, axis=1, level=None, errors='raise', try_cast=NoDefault.no_default)
Supported Argumentscond
: Boolean DataFrame, Boolean Series, Boolean Array- If 1-dimensional array or Series is provided, equivalent to Pandas
df.where
withaxis=1
.
- If 1-dimensional array or Series is provided, equivalent to Pandas
other
: Scalar, DataFrame, Series, 1 or 2-D Array,None
- Data types in
other
must match corresponding entries in DataFrame. None
or omitting argument defaults to the respectiveNA
value for each type.
- Data types in
Note
DataFrame can contain categorical data if
other
is a scalar.Example Usage
-
pandas.DataFrame. mask (cond, other=np.nan, inplace=False, axis=1, level=None, errors='raise', try_cast=NoDefault.no_default)Supported Arguments
cond
: Boolean DataFrame,Boolean Series,Boolean Array- If 1-dimensional array or Series is provided, equivalent to Pandas
df.mask
withaxis=1
. other
: Scalar, DataFrame, Series, 1 or 2-D ArrayNone
, - Data types inother
must match corresponding entries in DataFrame.None
or omitting argument defaults to the respectiveNA
value for each type.
Note
DataFrame can contain categorical data if
other
is a scalar.Example Usage
Function application, GroupBy & Window¶
pd.DataFrame.apply
¶
-
pandas.DataFrame. apply (func, axis=0, raw=False, result_type=None, args=(), _bodo_inline=False, **kwargs)Supported Arguments
func
: function (e.g. lambda) (axis must = 1), jit function (axis must = 1), String which refers to a supported DataFrame method- Must be constant at Compile Time
axis
: Integer (0, 1), String (only if the method takes axis as an argument )- Must be constant at Compile Time
_bodo_inline
: boolean- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.apply(lambda x: x["A"] * (x["B"] + x["C"])) >>> f() 0 11 1 26 2 45 dtype: int64
Note
Supports extra
_bodo_inline
boolean argument to manually control bodo's inlining behavior. Inlining user-defined functions (UDFs) can potentially improve performance at the expense of extra compilation time. Bodo uses heuristics to make a decision automatically if_bodo_inline
is not provided.
pd.DataFrame.groupby
¶
-
pandas.DataFrame. groupby (by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)
Supported Argumentsby
: String column label, List/Tuple of column labels- Must be constant at Compile Time
as_index
: boolean- Must be constant at Compile Time
dropna
: boolean- Must be constant at Compile Time
Note
sort=False
andobserved=True
are set by default. These are the only support values for sort and observed. For more information on using groupby, see the groupby section.Example Usage
pd.DataFrame.rolling
¶
-
pandas.DataFrame. rolling (window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')
Supported Argumentswindow
: Integer, String (must be parsable as a time offset),datetime.timedelta
,pd.Timedelta`, List/Tuple of column labelsmin_periods
: Integercenter
: booleanon
: Scalar column label- Must be constant at Compile Time
dropna
:boolean- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3,4,5]}) ... return df.rolling(3,center=True).mean() >>> f() A 0 NaN 1 2.0 2 3.0 3 4.0 4 NaN
For more information, please see the Window section.
Computations / Descriptive Stats¶
pd.DataFrame.abs
¶
-
pandas.DataFrame. abs ()Note
Only supported for dataframes containing numerical data and Timedeltas
Example Usage
pd.DataFrame.corr
¶
-
pandas.DataFrame. corr (method='pearson', min_periods=1)Supported Arguments
min_periods
: Integer
Example Usage
pd.DataFrame.count
¶
-
pandas.DataFrame. count (axis=0, level=None, numeric_only=False)
Supported Arguments : NoneExample Usage
pd.DataFrame.cov
¶
-
pandas.DataFrame. cov (min_periods=None, ddof=1)Supported Arguments
min_periods
: Integer
Example Usage
pd.DataFrame.cumprod
¶
-
pandas.DataFrame. cumprod (axis=None, skipna=True)
Supported Arguments : NoneExample Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1, 2, 3], "B": [.1,np.NaN,12.3],}) ... return df.cumprod() >>> f() A B 0 1 0.1 1 2 NaN 2 6 NaN
Note
Not supported for dataframe with nullable integer.
pd.DataFrame.cumsum
¶
-
pandas.DataFrame. cumsum (axis=None, skipna=True)
Supported Arguments : NoneExample Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1, 2, 3], "B": [.1,np.NaN,12.3],}) ... return df.cumsum() >>> f() A B 0 1 0.1 1 3 NaN 2 6 NaN
Note
Not supported for dataframe with nullable integer.
pd.DataFrame.describe
¶
-
pandas.DataFrame. describe (percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
Supported Arguments : NoneExample Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [pd.Timestamp(2000, 10, 2), pd.Timestamp(2001, 9, 5), pd.Timestamp(2002, 3, 11)]}) ... return df.describe() >>> f() A B count 3.0 3 mean 2.0 2001-07-16 16:00:00 min 1.0 2000-10-02 00:00:00 25% 1.5 2001-03-20 00:00:00 50% 2.0 2001-09-05 00:00:00 75% 2.5 2001-12-07 12:00:00 max 3.0 2002-03-11 00:00:00 std 1.0 NaN
Note
Only supported for dataframes containing numeric data, and datetime data. Datetime_is_numeric defaults to True in JIT code.
pd.DataFrame.diff
¶
-
pandas.DataFrame. diff (periods=1, axis=0)Supported Arguments
periods
: Integer
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [pd.Timestamp(2000, 10, 2), pd.Timestamp(2001, 9, 5), pd.Timestamp(2002, 3, 11)]}) ... return df.diff(1) >>> f() A B 0 NaN NaT 1 1.0 338 days 2 1.0 187 days
Note
Only supported for dataframes containing float, non-null int, and datetime64ns values
pd.DataFrame.max
¶
-
pandas.DataFrame. max (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.max(axis=1) >>> f() 0 7 1 8 2 9
Note
Only supported for dataframes containing float, non-null int, and datetime64ns values.
pd.DataFrame.mean
¶
-
pandas.DataFrame. mean (axis=None, skipna=None, level=None, numeric_only=None)
Supported Argumentsaxis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.mean(axis=1) >>> f() 0 4.0 1 5.0 2 6.0
Note
Only supported for dataframes containing float, non-null int, and datetime64ns values.
pd.DataFrame.median
¶
-
pandas.DataFrame. median (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.median(axis=1) >>> f() 0 4.0 1 5.0 2 6.0
Note
Only supported for dataframes containing float, non-null int, and datetime64ns values.
pd.DataFrame.min
¶
-
pandas.DataFrame. min (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6], "C": [7,8,9]}) ... return df.min(axis=1) >>> f() 0 1 1 2 2 3
Note
Only supported for dataframes containing float, non-null int, and datetime64ns values.
pd.DataFrame.nunique
¶
-
pandas.DataFrame. nunique (axis=0, dropna=True)Supported Arguments
dropna
: boolean
Example Usage
pd.DataFrame.pct_change
¶
-
pandas.DataFrame. pct_change (periods=1, fill_method='pad', limit=None, freq=None)Supported Arguments
periods
: Integer
Example Usage
pd.DataFrame.pipe
¶
-
pandas.DataFrame. pipe (func, *args, **kwargs)Supported Arguments
func
: JIT function or callable defined within a JIT function.- Additional arguments for
func
can be passed as additional arguments.
- Additional arguments for
Note
func
cannot be a tupleExample Usage
pd.DataFrame.prod
¶
-
pandas.DataFrame. prod (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.product
¶
-
pandas.DataFrame. product (axis=None, skipna=None, level=None, numeric_only=None)Supported Arguments
axis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.quantile
¶
-
pandas.DataFrame. quantile (q=0.5, axis=0, numeric_only=True, interpolation='linear')Supported Arguments
q
: Float or Int- must be 0<= q <= 1
axis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.rank
¶
-
pandas.DataFrame. rank (axis=0, method='average', numeric_only=NoDefault.no_default, na_option='keep', ascending=True, pct=False)
Supported Argumentsargument
datatypes
method
- String in {'average', 'min', 'max', 'first', 'dense'}
na_option
- String in {'keep', 'top', 'bottom'}
ascending
- Boolean
pct
- Boolean
Note
- Using
method='first'
withascending=False
is currently unsupported.
Example Usage
pd.DataFrame.std
¶
-
pandas.DataFrame. std (axis=None, skipna=None, level=None, ddof=1, numeric_only=None)Supported Arguments
axis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.sum
¶
-
pandas.DataFrame. sum (axis=None, skipna=None, level=None, numeric_only=None, min_count=0)
Supported Argumentsaxis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.var
¶
-
pandas.DataFrame. var (axis=None, skipna=None, level=None, ddof=1, numeric_only=None)Supported Arguments
axis
: Integer (0 or 1)- Must be constant at Compile Time
Example Usage
pd.DataFrame.memory_usage
¶
-
pandas.DataFrame. memory_usage (index=True, deep=False)Supported Arguments
index
: boolean
Example Usage
Reindexing / Selection / Label manipulation¶
pd.DataFrame.drop
¶
-
pandas.DataFrame. drop (labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')- Only dropping columns supported, either using
columns
argument or settingaxis=1
and using thelabels
argument labels
andcolumns
require constant string, or constant list/tuple of string valuesinplace
supported with a constant boolean value- All other arguments are unsupported
Example Usage
- Only dropping columns supported, either using
pd.DataFrame.drop_duplicates
¶
-
pandas.DataFrame. drop_duplicates (subset=None, keep='first', inplace=False, ignore_index=False)Supported Arguments
subset
: Constant list/tuple of String column names, Constant list/tuple of Integer column names, Constant String column names, Constant Integer column names
Example Usage
pd.DataFrame.duplicated
¶
-
pandas.DataFrame. duplicated (subset=None, keep='first')
Supported Arguments : NoneExample Usage
pd.DataFrame.first
¶
-
pandas.DataFrame. first (offset)
Supported Argumentsoffset
: String or Offset type- String argument must be a valid frequency alias.
Note
DataFrame must have a valid DatetimeIndex and is assumed to already be sorted. This function have undefined behavior if the DatetimeIndex is not sorted.
Example Usage
>>> @bodo.jit ... def f(df, offset): ... return df.first(offset) >>> df = pd.DataFrame({"A": np.arange(100), "B": np.arange(100, 200)}, index=pd.date_range(start='1/1/2022', end='12/31/2024', periods=100)) >>> f(df, "2M") A B 2022-01-01 00:00:00.000000000 0 100 2022-01-12 01:27:16.363636363 1 101 2022-01-23 02:54:32.727272727 2 102 2022-02-03 04:21:49.090909091 3 103 2022-02-14 05:49:05.454545454 4 104 2022-02-25 07:16:21.818181818 5 105
pd.DataFrame.idxmax
¶
-
pandas.DataFrame. idxmax (axis=0, skipna=True)
Supported Arguments : NoneExample Usage
pd.DataFrame.idxmin
¶
-
pandas.DataFrame. idxmin (axis=0, skipna=True)
Supported Arguments : NoneExample Usage
pd.DataFrame.last
¶
-
pandas.DataFrame. last (offset)
Supported Argumentsoffset
: String or Offset type- String argument must be a valid frequency alias
Note
DataFrame must have a valid DatetimeIndex and is assumed to already be sorted. This function have undefined behavior if the DatetimeIndex is not sorted.
Example Usage
>>> @bodo.jit ... def f(df, offset): ... return df.last(offset) >>> df = pd.DataFrame({"A": np.arange(100), "B": np.arange(100, 200)}, index=pd.date_range(start='1/1/2022', end='12/31/2024', periods=100)) >>> f(df, "2M") A B 2024-11-05 16:43:38.181818176 94 194 2024-11-16 18:10:54.545454544 95 195 2024-11-27 19:38:10.909090912 96 196 2024-12-08 21:05:27.272727264 97 197 2024-12-19 22:32:43.636363632 98 198 2024-12-31 00:00:00.000000000 99 199
pd.DataFrame.rename
¶
-
pandas.DataFrame. rename (mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')
Supported Argumentsmapper
: must be constant dictionary.- Can only be used alongside axis=1
columns
: must be constant dictionaryaxis
: Integer- Can only be used alongside mapper argument
copy
: booleaninplace
: must be constant boolean
Example Usage
pd.DataFrame.reset_index
¶
-
pandas.DataFrame. reset_index (level=None, drop=False, inplace=False, col_level=0, col_fill='')
Supported Argumentslevel
: Integer- If specified, must drop all levels.
drop
: Constant booleaninplace
: Constant boolean
Example Usage
pd.DataFrame.set_index
¶
-
pandas.DataFrame. set_index (keys, drop=True, append=False, inplace=False, verify_integrity=False)
Supported Arguments- keys: must be a constant string
Example Usage
pd.DataFrame.take
¶
-
pandas.DataFrame. take (indices, axis=0, is_copy=None)Supported Arguments
- indices: scalar Integer, Pandas Integer Array, Numpy Integer Array, Integer Series
Example Usage
Missing data handling¶
pd.DataFrame.dropna
¶
-
pandas.DataFrame. dropna (axis=0, how='any', thresh=None, subset=None, inplace=False)
Supported Argumentshow
: Constant String: either "all" or "any"thresh
: Integersubset
: Constant list/tuple of String column names, Constant list/tuple of Integer column names, Constant String column names, Constant Integer column names
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3,None], "B": [4, 5,None, None], "C": [6, None, None, None]}) ... df_1 = df.dropna(how="all", subset=["B", "C"]) ... df_2 = df.dropna(thresh=3) ... formated_out = "\n".join([df_1.to_string(), df_2.to_string()]) ... return formated_out >>> f() A B C 0 1 4 6 1 2 5 <NA> A B C 0 1 4 6
pd.DataFrame.fillna
¶
-
pandas.DataFrame. fillna (value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Supported Argumentsvalue
: various scalars- Must be of the same type as the filled column
inplace
: Constant booleaninplace
is not supported alongside method
method
: One ofbfill
,backfill
,ffill
, orpad
- Must be constant at Compile Time
inplace
is not supported alongside method
Example Usage
pd.DataFrame.replace
¶
-
pandas.DataFrame. replace (to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')Supported Arguments
to_replace
: various scalars- Required argument
value
: various scalars- Must be of the same type as to_replace
Example Usage
Reshaping, sorting, transposing¶
pd.DataFrame.explode
¶
-
pandas.DataFrame. explode (column, ignore_index=False)Supported Arguments
column
: Constant Column label or list of labels
Example Usage
pd.DataFrame.melt
¶
-
pandas.DataFrame. melt (id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)Supported Arguments
id_vars
: Constant Column label or list of labelsvalue_vars
: Constant Column label or list of labels
Example Usage
>>> @bodo.jit ... def f(df, id_vars, value_vars): ... return df.melt(id_vars, value_vars) >>> df = pd.DataFrame({"A": ["a", "b", "c"], 'B': [1, 3, 5], 'C': [2, 4, 6]) >>> f(df, ["A"], ["B", "C"]) A variable value 0 a B 1 1 b B 3 2 c B 5 3 a C 2 4 b C 4 5 c C 6
Note
To offer increased performance, row ordering and corresponding Index value may not match Pandas when run on multiple cores.
pd.DataFrame.pivot
¶
-
pandas.DataFrame. pivot (values=None, index=None, columns=None)Supported Arguments
values
: Constant Column Label or list of labelsindex
: Constant Column Label or list of labelscolumns
: Constant Column Label
Note
The the number of columns and names of the output DataFrame won't be known at compile time. To update typing information on DataFrame you should pass it back to Python.
Example Usage
pd.DataFrame.pivot_table
¶
-
pandas.DataFrame. pivot_table (values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)Supported Arguments
values
: Constant Column Label or list of labelsindex
: Constant Column Label or list of labelscolumns
: Constant Column Labelaggfunc
: String Constant
Note
This code takes two different paths depending on if pivot values are annotated. When pivot values are annotated then output columns are set to the annotated values. For example,
@bodo.jit(pivots={'pt': ['small', 'large']})
declares the output pivot tablept
will have columns calledsmall
andlarge
.If pivot values are not annotated, then the number of columns and names of the output DataFrame won't be known at compile time. To update typing information on DataFrame you should pass it back to Python.
Example Usage
>>> @bodo.jit(pivots={'pivoted_tbl': ['X', 'Y']}) ... def f(): ... df = pd.DataFrame({"A": ["X","X","X","X","Y","Y"], "B": [1,2,3,4,5,6], "C": [10,11,12,20,21,22]}) ... pivoted_tbl = df.pivot_table(columns="A", index="B", values="C", aggfunc="mean") ... return pivoted_tbl >>> f() X Y B 1 10.0 NaN 2 11.0 NaN 3 12.0 NaN 4 20.0 NaN 5 NaN 21.0 6 NaN 22.0
pd.DataFrame.sample
¶
-
pandas.DataFrame. sample (n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)Supported Arguments
n
: Integerfrac
: Floatreplace
: boolean
Example Usage
pd.DataFrame.sort_index
¶
-
pandas.DataFrame. sort_index (axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)Supported Arguments
ascending
: booleanna_position
:constant String ("first" or "last")
Example Usage
pd.DataFrame.sort_values
¶
-
pandas.DataFrame. sort_values (by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)
Supported Argumentsby
: constant String or constant list of stringsascending
: boolean, list/tuple of boolean, with length equal to the number of key columnsinplace
: Constant booleanna_position
: constant String ("first" or "last"), constant list/tuple of String, with length equal to the number of key columns
Example Usage
pd.DataFrame.to_string
¶
-
pandas.DataFrame. to_string (buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, min_rows=None, max_cols=None, show_dimensions=False, decimal='.', line_width=None, max_colwidth=None, encoding=None)
Supported Argumentsbuf
columns
col_space
header
index
na_rep
formatters
float_format
sparsify
index_names
justify
max_rows
min_rows
max_cols
how_dimensions
decimal
line_width
max_colwidth
encoding
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3]}) ... return df.to_string() >>> f() A 0 1 1 2 2 3
Note
- This function is not optimized.
- When called on a distributed dataframe, the string returned for each rank will be reflective of the dataframe for that rank.
Combining / joining / merging¶
pd.DataFrame.append
¶
-
pandas.DataFrame. append (other, ignore_index=False, verify_integrity=False, sort=False)Supported Arguments
other
: DataFrame, list/tuple of DataFrameignore_index
: constant boolean
Example Usage
pd.DataFrame.assign
¶
-
pandas.DataFrame. assign (**kwargs)Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6]}) ... df2 = df.assign(C = 2 * df["B"], D = lambda x: x.C -1) ... return df2 >>> f() A B C D 0 1 4 8 -8 1 2 5 10 -10 2 3 6 12 -12
Note
arguments can be JIT functions, lambda functions, or values that can be used to initialize a Pandas Series.
pd.DataFrame.join
¶
-
pandas.DataFrame. join (other, on=None, how='left', lsuffix='', rsuffix='', sort=False)Supported Arguments
other
: DataFrameon
: constant string column name, constant list/tuple of column names
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,1,3], "B": [4,5,6]}) ... return df.join(on = "A", other=pd.DataFrame({"C": [-1,-2,-3], "D": [4,5,6]})) >>> f() A B C D 0 1 4 -2 5 1 1 5 -2 5 2 3 6 <NA> <NA>
Note
Joined dataframes cannot have common columns. The output dataframe is not sorted by default for better parallel performance
pd.DataFrame.merge
¶
-
pandas.DataFrame. merge (right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)Note
See
pd.merge
for full list of supported arguments, and more examples.Example Usage
Time series-related¶
pd.DataFrame.shift
¶
-
pandas.DataFrame. shift (periods=1, freq=None, axis=0, fill_value=NoDefault.no_default)Supported Arguments
periods
: Integer
Example Usage
>>> @bodo.jit ... def f(): ... df = pd.DataFrame({"A": [1,1,3], "B": [4,5,6]}) ... return df.shift(1) >>> f() A B 0 NaN NaN 1 1.0 4.0 2 1.0 5.0
Note
Only supported for dataframes containing numeric, boolean, datetime.date and string types.
Serialization, IO, Conversion¶
Also see S3 and HDFS configuration requirements and more on Scalable File I/O.
pd.DataFrame.to_csv
¶
pandas.DataFrame. to_csv compression
argument defaults toNone
in JIT code. This is the only supported value of this argument.mode
argument supports only the default value"w"
.errors
argument supports only the default valuestrict
.storage_options
argument supports only the default valueNone
.
pd.DataFrame.to_json
¶
pandas.DataFrame. to_json
pd.DataFrame.to_parquet
¶
-
pandas.DataFrame. to_parquet (path, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None)path
is a required argument and must be a string. When writing distributed dataframes, the path refers to a directory of parquet files.engine
argument only supports"auto"
and"pyarrow"
. Default:"auto"
which uses the pyarrow engine.compression
argument must be one of:"snappy"
,"gzip"
,"brotli"
,None
. Default:"snappy"
.index
argument must be a constant bool orNone
. Default:None
.partition_cols
argument is supported in most cases, except when the columns in the DataFrame cannot be determined at compile time. This must be a list of column names orNone
. Default:None
.storage_options
argument supports only the default valueNone
.row_group_size
argument can be used to specify the maximum size of the row-groups in the generated parquet files; the actual size of the written row-groups may be smaller then this value. This must be an integer. If not specified, Bodo writes row-groups with 1M rows.
Note
Bodo writes multiple files in parallel (one per core), and the total number of row-groups across all files is roughly
max(num_cores, total_rows / row_group_size)
. The size of the row groups can affect read performance significantly. In general, the dataset should have at least as many row-groups as the number of cores used for reading, but ideally a lot more. At the same time, the row-groups shouldn't be too small since this can lead to overheads at read time. For more details, refer to the parquet file format.Example Usage
pd.DataFrame.to_sql
¶
pandas.DataFrame. to_sql - See Example Usage and more system specific instructions.
- Argument
con
is supported but only as a string form. SQLalchemyconnectable
is not supported. - Argument
name
,schema
,if_exists
,index
,index_label
,dtype
,method
are supported. - Argument
chunksize
is not supported.
Plotting¶
pd.DataFrame.plot
¶
-
pandas.DataFrame. plot (x=None, y=None, kind="line", figsize=None, xlabel=None, ylabel=None, title=None, legend=True, fontsize=None, xticks=None, yticks=None, ax=None)Supported Arguments
x
: Constant String column name, Constant integery
: Constant String column name, Constant integerkind
: constant String ("line" or "scatter")figsize
: constant numeric tuple (width, height)xlabel
: constant Stringylabel
: constant Stringtitle
: constant Stringlegend
: booleanfontsize
: integerxticks
: Constant Tupleyticks
: Constant Tupleax
: Matplotlib Axes Object