13. Compilation Tips and Troubleshooting

13.1. Compilation Tips

The general recommendation is that you should only try to use Bodo to compile the code that is performance critical or requires scaling. In other words:

  • Only use Bodo for data processing and analytics code.

  • Don’t use Bodo for scripts that set up infrastucture or do initializations.

This reduces the risk of hitting unsupported features and reduces compilation time. To do so, simply factor out the code that needs to be compiled by Bodo and pass data into Bodo compiled functions. This recommendation is similar to Numba’s What to compile.

13.2. Compilation Error

13.2.1. Why Compilation error

First of all, let’s understand Why doesn’t the code compile?

The most common reason is that the code relies on features that Bodo currently does not support, so it’s important to understand the limitations of Bodo. There are 4 main limitations:

  1. Not supported Pandas API (Supported Pandas Operations)

  2. Not supported NumPy API (Supported NumPy Operations)

  3. Not supported datatypes

  4. Not supported Python programs due to type instability

13.2.2. Troubleshooting Compilation Error

Now that we understand what causes the compilation error, let’s fix it!

For the first three of the limitations (not Supported Pandas Operations, not Supported NumPy Operations, and not supported datatypes) we discussed in the previous section, Why Compilation error, try the following:
  1. Make sure your code works in Python: A lot of the times, a Bodo decorated function doesn’t compile, but it does not compile in Python, either.

  2. Rewrite your code with supported operations if possible. One example is what we mentioned earlier: Dictionary containing heterogeneous values (e.g. thisdict = {"A": 1, "B": "a", "C": 0.1} can be replaced with namedtuple

  3. Refactor your code and use regular Python, explained in Integration with non-Bodo APIs of Bodo tutorial
    1. Pass data in and out like we discussed in Compilation Tips earlier

    2. Use Bodo object mode, explained in Object mode of Bodo tutorial

For the last (Not supported Python programs) of the 4 limitations we discussed in the previous section, Why Compilation error , refactor your code to make it type stable:

import bodo

# previous code

@bodo.jit
def f(flag):
    if flag:
        a = 1.0
    else:
        a = np.ones(10)
    return a

print(f(flag))

# modified type stable code

@bodo.jit
def f1():
    return 1.0

@bodo.jit
def f2():
    return np.ones(10)

if flag:
    print(f1())
else:
    print(f2())

13.2.3. Common compilation/runtime errors

Some parameters passed to supported APIs have to be literal constants. This requirement could be due to several reasons such as type stability and performance. For example, the following will raise a compilation error:

@bodo.jit
def f(df1, df2, how_mode):
    df3 = df1.merge(df2, how=how_mode)
    return df3

On the other hand the hand the following works:

@bodo.jit
def f(df1, df2):
    df3 = df1.merge(df2, how='inner')
    return df3

Zero-length dataframe arguments to Bodo functions can cause compilation errors due to potential type ambiguity. Dataframes can become empty inadvertently when multiple processes are used with variable-length data chunks across them. The solution is to specify the types in the decorator:

@bodo.jit(locals={'df':{'A': bodo.float64[:],
                        'B': bodo.int64[:],
                  }})
def f(df):

Sometimes standard output prints may not appear when the program fails, due to Python’s I/O buffering. Therefore, setting PYTHONUNBUFFERED environment variable is recommended for debugging:

export PYTHONUNBUFFERED=1