Skip to content

Deep Learning

Bodo works seamlessly with Horovod to support large-scale distributed deep learning with PyTorch and TensorFlow.

Prerequisites

You will need to install Horovod and a deep learning framework in your Bodo conda environment. Horovod needs to be compiled and linked with the same MPI library that Bodo uses.

Installing Horovod and PyTorch

Here are simple instructions for installing Horovod and PyTorch in your Bodo environment without CUDA support:

# Activate the Bodo conda environment
conda install -c pytorch -c conda-forge -c defaults bokeh pytorch=1.5 torchvision=0.6
pip install horovod[pytorch]

For information on setting up Horovod in a conda environment with CUDA see here.

How it works

Bodo works seamlessly with Horovod for distributed deep learning. The main thing to consider if you are using GPUs for deep learning is that a Bodo application typically uses all CPU cores on a cluster (there is one worker or process per core), whereas for deep learning only a subset of Bodo workers are pinned to a GPU (one worker per GPU). This means that data processed and generated by Bodo will need to be distributed to the GPU workers before training.

Note

Bodo can automatically assign a subset of workers in your cluster to GPU devices, initialize Horovod with these workers, and distribute data for deep learning to these workers.

To ensure that Bodo automatically handles all of the above call bodo.dl.start() before starting training and bodo.dl.prepare_data(X) to distribute the data.

API

bodo.dl.start

  • bodo.dl.start(framework)

    framework is a string specifying the DL framework to use ("torch" or "tensorflow"). Note that this must be called before starting deep learning. It initializes Horovod and pins workers to GPUs.

bodo.dl.prepare_data

  • bodo.dl.prepare_data(X)

    Redistributes the given data to DL workers.

bodo.dl.end

  • bodo.dl.end()

    On calling this function, non-DL workers will wait for DL workers. They will become idle to free up computational resources for DL workers. This has to be called by every process.

Example

The code snippet below shows how deep learning can be integrated in a Bodo application:

# Deep learning code in regular Python usign Horovod
def deep_learning(X, y):
    # Note: X and y have already been distributed by Bodo and there is no
    # need to partition data with Horovod
    if hvd.initialized():
        # do deep learning with Horovod and your DL framework of choice
        ...
        use_cuda = bodo.dl.is_cuda_available()
        ...
    else:
        # this rank does not participate in DL (not pinned to GPU)
        pass


@bodo.jit
def main()
    ...
    X = ... # distributed NumPy array generated with Bodo
    y = ... # distributed NumPy array generated with Bodo
    bodo.dl.start("torch")  # Initialize Horovod with PyTorch
    X = bodo.dl.prepare_data(X)
    y = bodo.dl.prepare_data(y)
    with bodo.objmode:
        deep_learning()  # DL user code
    bodo.dl.end()

As we can see, the deep learning code is not compiled by Bodo. It runs in Python (in objmode or outside Bodo jitted functions) and must use Horovod. Note that data coming from Bodo has already been partitioned and distributed by Bodo (in bodo.dl.prepare_data), and that you don't have to initialize Horovod.

A full distributed training example with the MNIST dataset can be found here.