bodo.pandas.BodoDataFrame.map_partitions¶
Apply a function to groups of rows in a DataFrame and return a DataFrame or Series of the same size.
If the input DataFrame is lazy (i.e. its plan has not been evaluated yet) and func returns a Series, then the output will be lazy as well. When the lazy output is evaluated, func will take batches of rows from the input DataFrame. In the cases where func returns a DataFrame or the input DataFrame is not lazy, each worker will call func on their entire local chunk of the input DataFrame.
Parameters
-
func : Callable: A function that takes in a DataFrame and returns a DataFrame or Series (with the same number of rows). Currently, functions that return a DataFrame will trigger execution even if the input DataFrame has a lazy plan.
-
*args: Additional positional arguments to pass to func.
-
**kwargs: Additional keyword arguments to pass as keyword arguments to func.
Returns
-
BodoSeries or BodoDataFrame: The result of applying func to the BodoDataFrame.
Example
import bodo.pandas as bd
bdf = bd.DataFrame(
{"foo": range(15), "bar": range(15, 30)}
)
bdf_mapped = bdf.map_partitions(lambda df_: df_.foo + df_.bar)
print(bdf_mapped)
Output:
0 15
1 17
2 19
3 21
4 23
5 25
6 27
7 29
8 31
9 33
10 35
11 37
12 39
13 41
14 43
dtype: int64[pyarrow]