Skip to content

Bodo Python Quick Start

This quickstart guide will walk you through the process of running a simple Python computation using Bodo DataFrames on your local machine.

Installation

Install Bodo DataFrames to get started (e.g., pip install -U bodo or conda install bodo -c conda-forge).

Drop-in Pandas Replacement with Bodo DataFrames

Bodo DataFrames can be used as a drop-in replacement for Pandas by changing import pandas as pd with import bodo.pandas as pd. For example:

import bodo.pandas as pd
import numpy as np
import time

NUM_GROUPS = 30
NUM_ROWS = 20_000_000

df = pd.DataFrame({
    "A": np.arange(NUM_ROWS) % NUM_GROUPS,
    "B": np.arange(NUM_ROWS)
})
df.to_parquet("my_data.pq")

def computation():
    t1 = time.time()
    df = pd.read_parquet("my_data.pq")
    df["C"] = df.apply(lambda r: 0 if r.A == 0 else (r.B // r.A), axis=1)
    df.to_parquet("out.pq")
    print("Execution time:", time.time() - t1)

computation()

Bodo DataFrames will optimize and parallelize the code automatically when possible. It will fall back to Pandas seamlessly when some API isn't supported yet and throw a warning. See the Bodo DataFrames API reference for supported Pandas APIs.

Bodo JIT Compilation for Best Native End-to-end Performance

JIT compilation converts Python functions to optimized parallel binaries. Unlike Bodo DataFrames, JIT can optimize both Pandas and Numpy operations together and in some cases provide better performance than using Bodo DataFrames Pandas APIs. For example:

import bodo
import pandas as pd
import numpy as np
import time

NUM_GROUPS = 30
NUM_ROWS = 20_000_000

df = pd.DataFrame({
    "A": np.arange(NUM_ROWS) % NUM_GROUPS,
    "B": np.arange(NUM_ROWS)
})

@bodo.jit
def computation(df):
    t1 = time.time()
    df["C"] = df.apply(lambda r: 0 if r.A == 0 else (r.B // r.A), axis=1)
    df["D"] = np.sin(df.A)
    df.to_parquet("out.pq")
    print("Execution time:", time.time() - t1)

computation(df)

All the code in JIT functions has to be compilable by Bodo JIT (will throw appropriate errors otherwise). See JIT development guide and JIT API reference for supported Python features and APIs.