Skip to content

Bodo Python Quickstart (Local)

This quickstart guide will walk you through the process of running a simple Python computation using Bodo on your local machine.

Prerequisites

Conda is the recommended way to install Bodo on your local environment. You can install the Community Edition using conda, which allows you to use Bodo for free on up to 8 cores.

conda create -n Bodo python=3.12 -c conda-forge
conda activate Bodo
conda install bodo -c bodo.ai -c conda-forge

These commands create a conda environment called Bodo and install Bodo Community Edition.

Generate Sample Data

Let's start by creating a Parquet file with some sample data. The following Python code creates a Parquet file with two columns A and B and 20 million rows. The column A contains values from 0 to 29, and the column B contains values from 0 to 19,999,999.

import pandas as pd
import numpy as np
import bodo
import time

NUM_GROUPS = 30
NUM_ROWS = 20_000_000
df = pd.DataFrame({
    "A": np.arange(NUM_ROWS) % NUM_GROUPS,
    "B": np.arange(NUM_ROWS)
})
df.to_parquet("my_data.pq")

A Simple Pandas Computation

Now let's write a simple Python function that computes the sum of column A for all rows where B is greater than 4 using pandas. We decorate the function with @bodo.jit to indicate that we want to compile the code using Bodo. Let's also add a timer to measure the execution time.

@bodo.jit(cache=True)
def computation():
    t1 = time.time()
    df = pd.read_parquet("my_data.pq")
    df1 = df[df.B > 4].A.sum()
    print("Execution time:", time.time() - t1)
    return df1

result = computation()
print(result)

Running the Code

Bringing it all together, the complete code looks like this:

import pandas as pd
import numpy as np
import bodo
import time

NUM_GROUPS = 30
NUM_ROWS = 20_000_000

df = pd.DataFrame({
    "A": np.arange(NUM_ROWS) % NUM_GROUPS,
    "B": np.arange(NUM_ROWS)
})
df.to_parquet("my_data.pq")

@bodo.jit(cache=True)
def computation():
    t1 = time.time()
    df = pd.read_parquet("my_data.pq")
    df1 = df[df.B > 4].A.sum()
    print("Execution time:", time.time() - t1)
    return df1

result = computation()
print(result)

To run the code, save it to a file, e.g. test_bodo.py, and run the following command in your terminal:

mpiexec -n 8 python test_bodo.py

Replace 8 with the number of cores you want to use. Note that the first time you run this code, it may take a few seconds to compile the code. Next time you run the code, it will execute much faster. Check the Python API Reference for the full list of supported Python operations.