Deploying Bodo with Kubernetes¶
This section demonstrates an example showing how to deploy a Bodo application with Kubernetes. We deploy Bodo with the Kubeflow MPI-Operator.
Setting Up¶
You need the following to deploy your Bodo application using Kubernetes:
-
Access to a Kubernetes cluster.
For this example, we'll use kops on AWS. See the section below on creating a Kubernetes cluster to see how we set it up.
-
A Docker image containing the Bodo application scripts and their intended Bodo version made available on a Docker registry, so that Kubernetes can pull it.
For this example, we created a Docker image using this Dockerfile and uploaded it to Docker Hub. It includes a Bodo application called
pi.py
that calculates the value of pi using the Monte Carlo method, and can be used to validate your setup.You can use this as a base image for your own Docker image. If you want to use a private registry, you can follow the instructions here.
Warning
Make sure to provide the correct CPU and Memory requests in the YAML file for your Bodo jobs. If correct values are not provided or the cluster doesn't have sufficient CPU or Memory required for the job, the job will be terminated and worker pods may keep respawning. You can get a good estimate of the CPU and Memory requirements by extrapolation from running the job locally on a smaller dataset.
Creating a Kubernetes Cluster using KOPS¶
Here are the steps create an AWS EKS cluster using KOPS.
-
Install KOPS on your local machine:
-
Create a location to store your cluster configuration:
First you need to create an S3 bucket to use as your
KOPS_STATE_STORE
. -
Create your cluster:
The following code block creates a cluster of 2 nodes each with 4 cores . You can modify the
node-count
argument to change the number of instances. To change the number of worker nodes, updatenode-size
. You can deploy the cluster in a different AWS region and availability zone by modifying thezones
argument.kops create cluster \ --node-count=2 \ --node-size=c5.2xlarge \ --control-plane-size=c5.large \ --zones=us-east-2c \ --name=${KOPS_CLUSTER_NAME}
Tip
The parameter
control-plane-size
refers to the leader that manages K8s but doesn’t do any Bodo computation, so you should keep the instance size small. -
Finish creating the cluster with the following command.
Note
This might take several minutes to finish.
-
Verify that the cluster setup is finished by running:
Deploying Bodo on a Kubernetes Cluster Manually¶
Install MPIJob Custom Resource Definitions(CRD)¶
The most up-to-date installation guide is available at MPI-Operator Github. This example was tested using v0.4.0, as shown below:
git clone https://github.com/kubeflow/mpi-operator --branch v0.4.0
cd mpi-operator
kubectl apply -f deploy/v2beta1/mpi-operator.yaml
You can check whether the MPI Job custom resource is installed via:
The output should include mpijobs.kubeflow.org
similar to:
Run your Bodo application¶
- Define a kubernetes resource for your Bodo workload, such as the one defined in
mpijob.yaml
that runs the pi example. You can modify it based on your cluster configuration: - Update
spec.slotsPerWorker
with the number of physical cores (not vCPUs) on each node - Set
spec.mpiReplicaSpecs.Worker.replicas
to the number of worker nodes in your cluster. - Build the image using the Dockerfile or use
bodoaidocker/bodo-kubernetes
and replace the image atspec.mpiReplicaSpecs.Launcher.template.spec.containers.image
andspec.mpiReplicaSpecs.Worker.template.spec.containers.image
. - Check the container arguments is referring to the python file you have intended to run
- Lastly, make sure
-n
is equal tospec.mpiReplicaSpecs.Worker.replicas
multiplied byspec.slotsPerWorker
, i.e. the total number of physical cores on your worker nodes. - Run the example by deploying it in your cluster with
kubectl create -f mpijob.yaml
. This should add 1 pod to each worker and a launcher pod to your master node. - View the generated pods by this deployment with
kubectl get pods
. You may inspect any logs by looking at the individual pod's logs.
Retrieve the Results¶
When the job finishes running, your launcher pod will change its status to completed and any stdout information can be found in the logs of the launcher pod:
```shell
PODNAME=$(kubectl get pods -o=name)
kubectl logs -f ${PODNAME}
```
Teardown¶
- When a job has finished running, you can remove it by running
kubectl delete -f mpijob.yaml
. - If you want to delete the MPI-Operator crd, please follow the steps on the MPI-Operator Github repository.