Skip to content

Using Regular Python inside JIT (Object Mode)

Regular Python functions and Bodo JIT functions can be used together in applications arbitrarily, but there are cases where regular Python code needs to be used inside JIT code. For example, you may want to use Bodo's parallel constructs with some code that does not have JIT support yet. Object Mode allows switching to a Python interpreted context to be able to run non-jittable code. The main requirement is that the user has to specify the type of variables used in later JIT code.

For example, the following code calls a non-JIT function on rows of a distributed dataframe. The code inside with bodo.objmode runs as regular Python, but variable y is returned to JIT code (since it is used after the with block). Therefore, the y="float64" type annotation is required.

import pandas as pd
import numpy as np
import bodo
import scipy.special as sc


def my_non_jit_function(a, b):
    return np.log(a) + sc.entr(b)


@bodo.jit
def f(row):
    with bodo.objmode(y="float64"):
        y = my_non_jit_function(row.A, row.B)
    return y


@bodo.jit
def objmode_example(n):
    df = pd.DataFrame({"A": np.random.ranf(n), "B": np.arange(n)})
    df["C"] = df.apply(f, axis=1)
    print(df["C"].sum())


objmode_example(10)

We recommend keeping the code inside the with bodo.objmode block minimal and call outside Python functions instead (as in this example). This reduces compilation time and sidesteps potential compiler limitations.

Object Mode Type Annotations

There are various ways to specify the data types in objmode. Basic data types such as float64 and int64 can be specified as string values (as in the previous example). For more complex data types like dataframes, bodo.typeof() can be used on sample data that has the same type as expected outputs. For example:

df_sample = pd.DataFrame({"A": [0], "B": ["AB"]}, index=[0])
df_type = bodo.typeof(df_sample)


@bodo.jit
def f():
    with bodo.objmode(df=df_type):
        df = pd.DataFrame({"A": [1, 2, 3], "B": ["ab", "bc", "cd"]}, index=[3, 2, 1])
    return df

This is equivalent to creating the DataFrameType directly:

@bodo.jit
def f():
    with bodo.objmode(
        df=bodo.DataFrameType(
            (bodo.int64[::1], bodo.string_array_type),
            bodo.NumericIndexType(bodo.int64),
            ("A", "B"),
        )
    ):
        df = pd.DataFrame({"A": [1, 2, 3], "B": ["ab", "bc", "cd"]}, index=[3, 2, 1])
    return df

The data type can be registered in Bodo so it can be referenced using a string name later:

df_sample = pd.DataFrame({"A": [0], "B": ["AB"]}, index=[0])
bodo.register_type("my_df_type", bodo.typeof(df_sample))


@bodo.jit
def f():
    with bodo.objmode(df="my_df_type"):
        df = pd.DataFrame({"A": [1, 2, 3], "B": ["ab", "bc", "cd"]}, index=[3, 2, 1])
    return df

See pandas datatypes for more details on Bodo data types in general. Bodo's Object Mode is built on top of Numba's Object Mode (see Numba objmode for more details).

What Can Be Done Inside Object Mode

The code inside Object Mode runs in regular Python on all parallel processes, which means Object Mode does not include Bodo compiler's automatic parallel communication management. Therefore, the computation inside Object Mode should be independent on different processors and not require communication. In general:

  • Operations on scalars are safe
  • Operations that compute on rows independently are safe
  • Operations that compute across rows may not be safe

The example below demonstrates a valid use of Object Mode, since it uses df.apply(axis=1) which runs on different rows independently.

df_type = bodo.typeof(pd.DataFrame({"A": [1], "B": [1], "C": [1]}))


def f(df):
    return df.assign(C=df.apply(lambda r: r.A + r.B, axis=1))


@bodo.jit
def valid_objmode():
    df = pd.read_parquet("in_file.pq")
    with bodo.objmode(df2=df_type):
        df2 = f(df)
    df2.to_parquet("out_file.pq")


valid_objmode()

In contrast, the example below demonstrates an invalid use of Object Mode. The reason is that groupby computation requires grouping together all rows with the same key across all chunks. However, on each processor, Bodo passes a chunk of df to Object Mode which returns results from local groupby computation. Therefore, df2 does not include valid global groupby output.

df_type = bodo.typeof(pd.DataFrame({"A": [1], "B": [1]}))


def f(df):
    return df.groupby("A", as_index=False).sum()


@bodo.jit
def invalid_objmode():
    df = pd.read_parquet("in_file.pq")
    # Invalid use of objmode
    with bodo.objmode(df2=df_type):
        df2 = f(df)
    df2.to_parquet("out_file.pq")


invalid_objmode()

Groupby/Apply Object Mode Pattern

ML algorithms and other complex data science computations are often called on groups of dataframe rows. Bodo supports parallelizing these computations (which may not have JIT support yet) using Object Mode inside groupby/apply. For example, the code below runs Prophet on groups of rows. This is a valid use of Object Mode since Bodo handles shuffle communication for groupby/apply and brings all rows of each group in the same local chunk. Therefore, the apply function running in Object Mode has all the data it needs.

import bodo
import pandas as pd
import numpy as np
from fbprophet import Prophet

prophet_output_type = bodo.typeof(pd.DataFrame({"ds": pd.date_range("2017-01-03", periods=1), "yhat": [0.0]}))

def run_prophet(df):
    m = Prophet()
    m.fit(df)
    return m.predict(df)[["ds", "yhat"]]


@bodo.jit
def apply_func(df):
    with bodo.objmode(df2=prophet_output_type):
        df2 = run_prophet(df)
    return df2


@bodo.jit
def f(df):
    df2 = df.groupby("A").apply(apply_func)
    return df2


n = 10
df = pd.DataFrame({"A": np.arange(n) % 3, "ds": pd.date_range("2017-01-03", periods=n), "y": np.arange(n)})
print(f(df))