Skip to content

bodo.pandas.BodoDataFrame.apply

BodoDataFrame.apply(
        func,
        axis=0,
        raw=False,
        result_type=None,
        args=(),
        by_row="compat",
        engine="bodo",
        engine_kwargs=None,
        **kwargs,
    ) -> BodoSeries

Apply a function along an axis of the BodoDataFrame.

Currently only supports applying a function that returns a scalar value for each row (i.e. axis=1). All other uses will fall back to Pandas. See pandas.DataFrame.apply for more details.

By default, bodo.jit will be applied to func. If this JIT compilation fails for any reason, the mapping function will be run as a normal Python function. If the compilation succeeds, the JIT compiled function will be used for apply and the overheads associated with running Python code from within the execution pipeline are avoided.

Note

Calling BodoDataFrame.apply will immediately execute a plan if this JIT compilation fails, generating a small sample of the BodoDataFrame and calling pandas.DataFrame.apply on the sample to infer output types before proceeding with lazy evaluation.

Note

Functions passed to func (whether explicitly wrapper with a JIT decorator or not) may not use Numba's with objmode context. Doing so will result in a runtime exception.

Parameters

func : function: Function to apply to each row.

axis : {0 or 1}, default 0: The axis to apply the function over. axis=0 will fall back to pandas.DataFrame.apply.

args : tuple: Additional positional arguments to pass to func.

engine : {'bodo', 'python', 'numba'}, default 'bodo': The engine to use to compute the UDF. By default, engine="bodo" will apply bodo.jit to func with fallback to Python described above. Use engine='python' to avoid any jit compilation. engine='numba' will trigger a fall back to pandas.DataFrame.apply.

**kwargs: Additional keyword arguments to pass as keyword arguments to func.

All other parameters will trigger a fallback to pandas.DataFrame.apply if a non-default value is provided.

Returns

BodoSeries: The result of applying func to each row in the BodoDataFrame.

Example

import bodo.pandas as bd

bdf = bd.DataFrame(
        {
            "a": bd.array([1, 2, 3] * 4, "Int64"),
            "b": bd.array([4, 5, 6] * 4, "Int64"),
            "c": ["a", "b", "c"] * 4,
        },
    )

out_bodo = bdf.apply(lambda x: x["a"] + 1, axis=1)

print(type(out_bodo))
print(out_bodo)

Output:

<class 'bodo.pandas.series.BodoSeries'>
0     2
1     3
2     4
3     2
4     3
5     4
6     2
7     3
8     4
9     2
10    3
11    4
dtype: int64[pyarrow]