About Bodo¶

Bodo is a new just-in-time (JIT) inferential compiler that brings supercomputing-style performance and scalability to native Python analytics code automatically. Bodo has several advantages over other big data analytics systems (which are usually distributed scheduler libraries):

Simple programming with native Python APIs such as Pandas and Numpy (no "Pandas-like" API layers)
Extreme performance and scalability using true parallelism and advanced compiler technology
Very high reliability due to binary code generation, which avoids distributed library failures
Simple deployment using standard Python workflows
Flexible integration with other systems such as cloud storage, data warehouses, and visualization tools

This documentation covers the basics of using Bodo and provides a reference of supported Python features and APIs. In a nutshell, Bodo provides a JIT compilation workflow using the @bodo.jit decorator. It replaces the decorated Python functions with an optimized and parallelized binary version automatically. For example, the program below can perform data transformation on large datasets:

@bodo.jit
def data_transform(file_name):
    df = pd.read_parquet(file_name)
    df = df[df.C.dt.month == 1]
    df2 = df.groupby("A")["B", "D"].agg(
        lambda S: (S == "ABC").sum()
    )
    df2.to_parquet("output.pq")

To run Bodo programs such as this example, programmers can simply use the command line such as mpiexec -n 1024 python data_transform.py (to run on 1024 cores), or use Jupyter Notebook.