Installing Bodo Community Edition¶
Bodo can be installed as a using the conda command (see how to install conda below). 
We recommend creating a conda environment and installing 
Bodo and its dependencies in it as shown below:
conda create -n Bodo python=3.12 -c conda-forge
conda activate Bodo
conda install bodo -c bodo.ai -c conda-forge
Bodo uses MPI
for parallelization, which is automatically installed as part of the
conda install command above. This command installs Bodo Community
Edition by default, which is free and works on up to 8 cores. For
information on Bodo Enterprise Edition and pricing, please contact
us.
See Also
How to Install Conda¶
Install Conda using the instructions below.
On Linux¶
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
chmod +x miniconda.sh
./miniconda.sh -b
export PATH=$HOME/miniconda3/bin:$PATH
On MacOS¶
curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -L -o miniconda.sh
chmod +x miniconda.sh
./miniconda.sh -b
export PATH=$HOME/miniconda3/bin:$PATH
On Windows¶
start /wait "" Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /D=%UserProfile%\Miniconda3
Open the Anaconda Prompt to use Bodo (click Start, select Anaconda Prompt). You may use other terminals if you have already added Anaconda to your PATH.
Optional Dependencies¶
Some Bodo functionality may require other dependencies, as summarized in the table below. All optional dependencies except Hadoop can be installed using the commands
conda install gcsfs sqlalchemy snowflake-connector-python hdf5='1.14.*=*mpich*' openjdk -c conda-forge
and
| Functionality | Dependency | 
|---|---|
| pd.read_sql / df.to_sql | sqlalchemy | 
| Snowflake I/O | snowflake-connector-python | 
| GCS I/O | gcsfs | 
| Delta Lake | deltalake | 
| HDFS or ADLS Gen2 | hadoop (only the Hadoop client is needed) | 
| HDF5 | hdf5 (MPI version) | 
Testing your Installation¶
Once you have activated your conda environment and installed Bodo in
it, you can test it using the example program below. This program has
two functions:
- The function gen_datacreates a sample dataset with 20,000 rows and writes to a parquet file calledexample1.pq.
- The function testreadsexample1.pqand performs multiple computations on it.
import bodo
import pandas as pd
import numpy as np
import time
@bodo.jit
def gen_data():
    NUM_GROUPS = 30
    NUM_ROWS = 20_000_000
    df = pd.DataFrame({
        "A": np.arange(NUM_ROWS) % NUM_GROUPS,
        "B": np.arange(NUM_ROWS)
    })
    df.to_parquet("example1.pq")
@bodo.jit
def test():
    df = pd.read_parquet("example1.pq")
    t0 = time.time()
    df2 = df.groupby("A")["B"].agg(
        (lambda a: (a==1).sum(), lambda a: (a==2).sum(), lambda a: (a==3).sum())
    )
    m = df2.mean()
    print("Result:", m, "\nCompute time:", time.time() - t0, "secs")
gen_data()
test()
Save this code in a file called example.py, and run it on a single
core as follows:
Alternatively, to run the code on four cores, you can use mpiexec:
Note
You may need to delete example1.pq between consecutive runs.