bodo.pandas.BodoDataFrame.apply¶
BodoDataFrame.apply(
func,
axis=0,
raw=False,
result_type=None,
args=(),
by_row="compat",
engine="bodo",
engine_kwargs=None,
**kwargs,
) -> BodoSeries
Apply a function along an axis of the BodoDataFrame.
Currently only supports applying a function that returns a scalar value for each row (i.e. axis=1
).
All other uses will fall back to Pandas.
See pandas.DataFrame.apply
for more details.
By default, bodo.jit will be applied to func
. If this JIT compilation fails for any
reason, the mapping function will be run as a normal Python function. If the compilation succeeds,
the JIT compiled function will be used for apply and the overheads associated with running Python code
from within the execution pipeline are avoided.
Note
Calling BodoDataFrame.apply
will immediately execute a plan if this JIT compilation fails,
generating a small sample of the BodoDataFrame and calling pandas.DataFrame.apply
on the sample to
infer output types before proceeding with lazy evaluation.
Note
Functions passed to func
(whether explicitly wrapper with a JIT decorator or not) may not
use Numba's with objmode
context. Doing so will result in a runtime exception.
Parameters
-
func : function: Function to apply to each row.
-
axis : {0 or 1}, default 0: The axis to apply the function over.
axis=0
will fall back topandas.DataFrame.apply
. -
args : tuple: Additional positional arguments to pass to func.
-
engine : {'bodo', 'python', 'numba'}, default 'bodo': The engine to use to compute the UDF. By default,
engine="bodo"
will apply bodo.jit tofunc
with fallback to Python described above. Use engine='python' to avoid any jit compilation.engine='numba'
will trigger a fall back topandas.DataFrame.apply
. -
**kwargs: Additional keyword arguments to pass as keyword arguments to func.
-
All other parameters will trigger a fallback to
pandas.DataFrame.apply
if a non-default value is provided. Returns
-
BodoSeries: The result of applying func to each row in the BodoDataFrame.
Example
import bodo.pandas as bd
bdf = bd.DataFrame(
{
"a": bd.array([1, 2, 3] * 4, "Int64"),
"b": bd.array([4, 5, 6] * 4, "Int64"),
"c": ["a", "b", "c"] * 4,
},
)
out_bodo = bdf.apply(lambda x: x["a"] + 1, axis=1)
print(type(out_bodo))
print(out_bodo)
Output:
<class 'bodo.pandas.series.BodoSeries'>
0 2
1 3
2 4
3 2
4 3
5 4
6 2
7 3
8 4
9 2
10 3
11 4
dtype: int64[pyarrow]