GroupBy¶
pd.DataFrame.groupby
¶
-
pandas.DataFrame. groupby (by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)Supported Arguments
by
: Column label or list of column labels- Must be constant at Compile Time
- This argument is required
as_index
: Boolean- Must be constant at Compile Time
dropna
: Boolean- Must be constant at Compile Time
Example Usage
pd.Series.groupby
¶
-
pandas.Series. groupby (by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)Supported Arguments
by
: Array-like or Series data. This is not supported with Decimal or Categorical data.- Must be constant at Compile Time
level
: integer- Must be constant at Compile Time
- Only
level=0
is supported and not with MultiIndex.
Important
You must provide exactly one of
by
andlevel
Example Usage
>>> @bodo.jit ... def f(S, by_series): ... return S.groupby(by_series).count() >>> S = pd.Series([1, 2, 24, None] * 5) >>> by_series = pd.Series(["421", "f31"] * 10) >>> f(S, by_series) 421 10 f31 5 Name: , dtype: int64
Note
Series.groupby
doesn't currently keep the name of the original Series.
pd.core.groupby.Groupby.apply
¶
-
pandas.core.groupby.Groupby. apply (func, *args, **kwargs)Supported Arguments
func
: JIT function, callable defined within a JIT function that returns a DataFrame or Series- Additional arguments for
func
can be passed as additional arguments.
- Additional arguments for
Example Usage
>>> @bodo.jit ... def f(df, y): ... return df.groupby("B", dropna=True).apply(lambda group, y: group.sum(axis=1) + y, y=y) >>> df = pd.DataFrame( ... { ... "A": [1, 2, 24, None] * 5, ... "B": ["421", "f31"] * 10, ... "C": [1.51, 2.421, 233232, 12.21] * 5 ... } ... ) >>> y = 4 >>> f(df, y) B 421 0 6.510 2 8.421 4 233260.000 6 16.210 8 6.510 10 8.421 12 233260.000 14 16.210 16 6.510 18 8.421 f31 1 233260.000 3 16.210 5 6.510 7 8.421 9 233260.000 11 16.210 13 6.510 15 8.421 17 233260.000 19 16.210 dtype: float64
pd.core.groupby.Groupby.agg
¶
-
pandas.core.groupby.Groupby. agg (func, *args, **kwargs)Supported Arguments
func
: JIT function, callable defined within a JIT function, constant dictionary mapping column name to a function- Additional arguments for
func
can be passed as additional arguments.
- Additional arguments for
Note
- Passing a list of functions is also supported if only one output column is selected.
- Output column names can be specified using keyword arguments and
pd.NamedAgg()
.
Example Usage
pd.core.groupby.DataFrameGroupby.aggregate
¶
-
pandas.core.groupby.DataFrameGroupby. aggregate (func, *args, **kwargs)Supported Arguments
func
: JIT function, callable defined within a JIT function, constant dictionary mapping column name to a function- Additional arguments for
func
can be passed as additional arguments.
- Additional arguments for
Note
- Passing a list of functions is also supported if only one output column is selected.
- Output column names can be specified using keyword arguments and
pd.NamedAgg()
.
Example Usage
pd.core.groupby.DataFrameGroupby.transform
¶
-
pandas.core.groupby.DataFrameGroupby. transform (func, *args, engine=None, engine_kwargs=None, **kwargs)Supported Arguments
func
: Constant string, Python function from the builtins module that matches a supported operation- Numpy functions cannot be provided.
Note
The supported builtin functions are
'count'
,'first'
,'last'
,'min'
,'max'
,'mean'
,'median'
,'nunique'
,'prod'
,'std'
,'sum'
, and'var'
Example Usage
>>> @bodo.jit ... def f(df): ... return df.groupby("B", dropna=True).transform(max) >>> df = pd.DataFrame( ... { ... "A": [1, 2, 24, None] * 5, ... "B": ["421", "f31"] * 10, ... "C": [1.51, 2.421, 233232, 12.21] * 5 ... } ... ) >>> f(df) A C 0 24.0 233232.00 1 2.0 12.21 2 24.0 233232.00 3 2.0 12.21 4 24.0 233232.00 5 2.0 12.21 6 24.0 233232.00 7 2.0 12.21 8 24.0 233232.00 9 2.0 12.21 10 24.0 233232.00 11 2.0 12.21 12 24.0 233232.00 13 2.0 12.21 14 24.0 233232.00 15 2.0 12.21 16 24.0 233232.00 17 2.0 12.21 18 24.0 233232.00 19 2.0 12.21
pd.core.groupby.Groupby.pipe
¶
-
pandas.core.groupby.Groupby. pipe (func, *args, **kwargs)Supported Arguments
func
: JIT function, callable defined within a JIT function.- Additional arguments for
func
can be passed as additional arguments.
- Additional arguments for
Note
func
cannot be a tupleExample Usage
>>> @bodo.jit ... def f(df, y): ... return df.groupby("B").pipe(lambda grp, y: grp.sum() - y, y=y) >>> df = pd.DataFrame( ... { ... "A": [1, 2, 24, None] * 5, ... "B": ["421", "f31"] * 10, ... "C": [1.51, 2.421, 233232, 12.21] * 5 ... } ... ) >>> y = 5 >>> f(df, y) A C B 421 120.0 1166162.550 f31 5.0 68.155
pd.core.groupby.Groupby.count
¶
-
pandas.core.groupby.Groupby. count ()
Example Usage
pd.core.groupby.Groupby.cumsum
¶
-
pandas.core.groupby.Groupby. cumsum (axis=0)Note
cumsum
is only supported on numeric columns and is not supported on boolean columnsExample Usage
>>> @bodo.jit ... def f(df): ... return df.groupby("B").cumsum() >>> df = pd.DataFrame( ... { ... "A": [1, 2, 24, None] * 5, ... "B": ["421", "f31"] * 10, ... "C": [1.51, 2.421, 233232, 12.21] * 5 ... } ... ) >>> f(df) A C 0 1.0 1.510 1 2.0 2.421 2 25.0 233233.510 3 NaN 14.631 4 26.0 233235.020 5 4.0 17.052 6 50.0 466467.020 7 NaN 29.262 8 51.0 466468.530 9 6.0 31.683 10 75.0 699700.530 11 NaN 43.893 12 76.0 699702.040 13 8.0 46.314 14 100.0 932934.040 15 NaN 58.524 16 101.0 932935.550 17 10.0 60.945 18 125.0 1166167.550 19 NaN 73.155
pd.core.groupby.Groupby.first
¶
-
pandas.core.groupby.Groupby. first (numeric_only=False, min_count=-1)Note
first
is not supported on columns with nested array typesExample Usage
pd.core.groupby.Groupby.head
¶
-
pandas.core.groupby.Groupby. head (n=5)Supported Arguments
n
: Non-negative integer- Must be constant at Compile Time
Example Usage
>>> @bodo.jit ... def f(df): ... return df.groupby("B").head() >>> df = pd.DataFrame( ... { ... "A": [1, 2, 24, None] * 5, ... "B": ["421", "f31"] * 10, ... "C": [1.51, 2.421, 233232, 12.21] * 5 ... } ... ) >>> f(df) A B C 0 1.0 421 1.510 1 2.0 f31 2.421 2 24.0 421 233232.000 3 NaN f31 12.210 4 1.0 421 1.510 5 2.0 f31 2.421 6 24.0 421 233232.000 7 NaN f31 12.210 8 1.0 421 1.510 9 2.0 f31 2.421
pd.core.groupby.Groupby.last
¶
-
pandas.core.groupby.Groupby. last (numeric_only=False, min_count=-1)Note
last
is not supported on columns with nested array typesExample Usage
pd.core.groupby.Groupby.max
¶
-
pandas.core.groupby.Groupby. max (numeric_only=False, min_count=-1)Note
max
is not supported on columns with nested array types.- Categorical columns must be ordered.
Example Usage
pd.core.groupby.Groupby.mean
¶
-
pandas.core.groupby.Groupby. mean (numeric_only=NoDefault.no_default)Note
mean
is only supported on numeric columns and is not supported on boolean columnExample Usage
pd.core.groupby.Groupby.median
¶
-
pandas.core.groupby.Groupby. median (numeric_only=NoDefault.no_default)Note
median
is only supported on numeric columns and is not supported on boolean columnExample Usage
pd.core.groupby.Groupby.min
¶
-
pandas.core.groupby.Groupby. min (numeric_only=False, min_count=-1)Note
min
is not supported on columns with nested array types- Categorical columns must be ordered.
Example Usage
pd.core.groupby.Groupby.prod
¶
-
pandas.core.groupby.Groupby. prod (numeric_only=NoDefault.no_default, min_count=0)Note
prod
is not supported on columns with nested array typesExample Usage
pd.core.groupby.Groupby.rolling
¶
-
pandas.core.groupby.Groupby. rolling (window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')Supported Arguments
window
: Integer, String, Datetime, Timedeltamin_periods
: Integercenter
: Booleanon
: Column label- Must be constant at Compile Time
Note
This is equivalent to performing the DataFrame API on each groupby. All operations of the rolling API can be used with groupby.
Example Usage
>>> @bodo.jit ... def f(df): ... return df.groupby("B").rolling(2).mean >>> df = pd.DataFrame( ... { ... "A": [1, 2, 24, None] * 5, ... "B": ["421", "f31"] * 10, ... "C": [1.51, 2.421, 233232, 12.21] * 5 ... } ... ) >>> f(df) A C B 421 0 NaN NaN 2 NaN NaN 4 12.5 116616.7550 6 NaN 7.3155 8 12.5 116616.7550 10 NaN 7.3155 12 12.5 116616.7550 14 NaN 7.3155 16 12.5 116616.7550 18 NaN 7.3155 f31 1 12.5 116616.7550 3 NaN 7.3155 5 12.5 116616.7550 7 NaN 7.3155 9 12.5 116616.7550 11 NaN 7.3155 13 12.5 116616.7550 15 NaN 7.3155 17 12.5 116616.7550 19 NaN 7.3155
pd.core.groupby.Groupby.size
¶
-
pandas.core.groupby.Groupby. size ()Example Usage
pd.core.groupby.Groupby.std
¶
-
pandas.core.groupby.Groupby. std (ddof=1)Note
std
is only supported on numeric columns and is not supported on boolean columnExample Usage
pd.core.groupby.Groupby.sum
¶
-
pandas.core.groupby.Groupby. sum (numeric_only=NoDefault.no_default, min_count=0)Note
sum
is not supported on columns with nested array typesExample Usage
pd.core.groupby.Groupby.var
¶
-
pandas.core.groupby.Groupby. var (ddof=1)Note
var
is only supported on numeric columns and is not supported on boolean columnExample Usage
pd.core.groupby.DataFrameGroupby.idxmax
¶
-
pandas.core.groupby.DataFrameGroupby. idxmax (axis=0, skipna=True)
Example Usage
pd.core.groupby.DataFrameGroupby.idxmin
¶
-
pandas.core.groupby.DataFrameGroupby. idxmin (axis=0, skipna=True)
Example Usage
pd.core.groupby.DataFrameGroupby.nunique
¶
-
pandas.core.groupby.DataFrameGroupby. nunique (dropna=True)Supported Arguments
dropna
: boolean
Note
nunique
is not supported on columns with nested array typesExample Usage
pd.core.groupby.DataFrameGroupby.shift
¶
-
pandas.core.groupby.DataFrameGroupby. shift (periods=1, freq=None, axis=0, fill_value=None)Note
shift
is not supported on columns with nested array typesExample Usage
>>> @bodo.jit ... def f(df): ... return df.groupby("B").shift() >>> df = pd.DataFrame( ... { ... "A": [1, 2, 24, None] * 5, ... "B": ["421", "f31"] * 10, ... "C": [1.51, 2.421, 233232, 12.21] * 5 ... } ... ) >>> f(df) A C 0 NaN NaN 1 NaN NaN 2 1.0 1.510 3 2.0 2.421 4 24.0 233232.000 5 NaN 12.210 6 1.0 1.510 7 2.0 2.421 8 24.0 233232.000 9 NaN 12.210 10 1.0 1.510 11 2.0 2.421 12 24.0 233232.000 13 NaN 12.210 14 1.0 1.510 15 2.0 2.421 16 24.0 233232.000 17 NaN 12.210 18 1.0 1.510 19 2.0 2.421
pd.core.groupby.SeriesGroupBy.value_counts
¶
-
pandas.core.groupby.SeriesGroupby. value_counts (normalize=False, sort=True, ascending=False, bins=None, dropna=True)Supported Arguments
ascending
: boolean- Must be constant at Compile Time
Example Usage