Skip to content

Using Bodo Cloud Platform

Bodo Cloud Platform Concepts

This section describes the fundamental concepts you will need to know to use the Bodo Cloud Platform.

Organizations

Organizations on the Bodo Cloud Platform are tenants for billing and cloud resource management purposes. An organization can have multiple workspaces and cloud configurations, and users can be part of multiple organizations.

User-Mgmt

Cloud-Configurations

A cloud-configuration is an entity used to store information about your AWS or Azure account. It consists of:

  1. Details regarding the trust relationship between the platform and your cloud provider account. For AWS accounts, this is done through a cross-account IAM role. For Azure account, this is done through a service principal (scoped to a specific resource group) for the Bodo Platform application. This gives the platform the ability to provision and manage cloud resources in your account.
  2. Details regarding metadata storage. The platform needs to store certain metadata to carry out its functions, such as the state of your various cloud deployments, logs, etc. On AWS, this data is stored in an S3 bucket and a DynamoDB table. On Azure, this data is stored in a storage container.

Cloud-Configs

Workspaces

A workspace on the Bodo Cloud Platform consists of:

  1. A shared filesystem where you can collaborate with your team on your projects.
  2. Networking infrastructure such as virtual networks, security groups and subnets in which your compute clusters and Jupyter servers will be securely deployed.

A workspace is tied to a particular cloud-configuration and has its own user-management i.e., you can have different subsets of users with different sets of roles and permissions in different workspaces within the same organization.

Important

If a user that is not part of the organization, is invited to a workspace in the organization, it is automatically added to the organization with minimal permissions.

Workspaces

To create a workspace, go to the "Workspaces" section in the sidebar and click on "Create Workspace". In the creation form, enter the name of the workspace, select the cloud-configuration to use for provisioning it and the region where it should be deployed, and click on "Create Workspace".

Create-Workspace

This will start the workspace deployment. When the workspace is in the "READY" state, click on the button next to it to enter it.

Enter-Workspace

Notebooks

Jupyter servers act as your interface to both your shared file-system and your compute clusters. Users can execute code from their notebooks on the compute cluster from the Jupyter interface. A Jupyter server is automatically provisioned for your use when you first enter the workspace.

Notebook-View

You can view and manage all the Jupyter servers in the "Notebook Manager" section of "Workspace Settings".

Notebook-Manager

Creating Clusters

In the left bar click on Clusters (or click on the second step in the Onboarding list). This will take you to the Clusters page. At the top right corner, click on Create Cluster which opens the cluster creation form.

Cluster-Create

Cluster creation form:

Cluster-Form

First, choose a name for your cluster.

Then, select the type of nodes in the cluster to be created from the Instance type dropdown list. EFA will be used if the instance type supports it.

Cluster-Form-Instance

Note

If the Instance type dropdown list does not populate, either the credentials are not entered properly or they are not valid. Please see how to set your AWS or Azure credentials and make sure your credentials are valid.

Next, enter the number of nodes for your cluster in Number of Instances. and choose the Bodo Version to be installed on your cluster. Typically the three latest Bodo Releases are available.

Cluster-Form-Bodo

Then, select a value for Cluster auto pause. This is the amount of time of inactivity after which the platform will pause the cluster automatically.

Cluster-Form-Auto-Pause

Additionally, you can select a value for Cluster auto shutdown. Activity is determined through attached notebooks (see how to attach a notebook to a cluster) and jobs (see how to run a job). Therefore, if you don't plan to attach a notebook or a job to this cluster (and use it via ssh instead), it's recommended to set this to Never, since otherwise the cluster will be removed after the set time.

Cluster-Form-Advanced

Finally click on CREATE. You will see that a new task for creating the cluster has been created. The status is updated to INPROGRESS when the task starts executing and cluster creation is in progress.

Cluster-Status-InProgress

You can click on the Details drop down to monitor the progress for the cluster creation.

Cluster-Info

Once the cluster is successfully created and ready to use, the status is updated to FINISHED.

Cluster-Status-Finished

Attaching a Notebook to a Cluster

To attach a notebook to a cluster, select the cluster from the drop-down in the top-left.

Attach-Cluster

To execute your code across the attached cluster, use IPyParallel magics %%px and %autopx.

Run-Code-Notebook

Note that parallel execution is only allowed when the notebook is attached to a cluster. If you execute a cell without a cluster attached, the following warning will be shown:

Detached-Notebook-Warning

Managing Packages on the cluster using IPyParallel magics - Conda and Pip

We recommend all packages to be installed using Conda as that is what we use in our environments. Any conda command can be run in parallel on all the nodes of your cluster using %pconda. To install a new package on all the nodes of your cluster you can use %pconda install. All conda install arguments work as expected, e.g. -c conda-forge to set the channel.

%pconda install -c conda-forge <PACKAGE_NAME>

To learn more about the packages installed on the cluster nodes %pconda list.

%pconda list

To remove a conda package on all the nodes of your cluster, use %pconda remove.

%pconda remove <PACKAGE_NAME>

Conda-Magic

Any pip command can be run in parallel on all the nodes of your cluster using %ppip.

Example:

%ppip install <PACKAGE_NAME>

To learn about the installed packages, you can use %ppip show to get the details of the package.

%ppip show <PACKAGE_NAME>

To remove the same package on all the nodes of your cluster, use %ppip uninstall.

%ppip uninstall <PACKAGE_NAME> -y

Pip-Magic

Running shell commands on the cluster using IPyParallel magics

Shell commands can be run in parallel on the nodes of your cluster using %psh <shell_command>.

%psh echo "Hello World"

Shell-Magic

Connecting to a Cluster

We recommend interacting with clusters primarily through Jupyter notebooks and Jobs. However, it may be necessary to connect directly to a cluster in some cases. In that case, you can connect through a notebook terminal.

Connecting with a Notebook Terminal

First, you need to create a cluster and attach a notebook to the cluster. This will create the ssh-key at ~/cluster_ssh_keys/id_rsa-<CLUSTER-UUID>.

Then, go the cluster tab and find your cluster. Click on DETAILS and copy the cluster UUID and IP address of the node you would like to connect to.

Cluster-UUID-Info

Next, go to the notebooks tab and select OPEN NOTEBOOK. In the Launcher, click on Terminal.

Notebook-Terminal

In the terminal you can connect to any of the cluster nodes by running

ssh -i ~/cluster_ssh_keys/id_rsa-<CLUSTER_UUID> <IP>

Connect-Cluster

Through this terminal, you can interact with the /shared folder, which is shared by all the instances in the cluster and the Notebook instance. Verify your connection to interact directly with your cluster.

Verify your Connection

Once you have connected to a node in your cluster, you should verify that you can run operations across all the instances in the cluster.

  1. Verify the path to the hostfile for your cluster. You can find it by running:
    ls -la /shared/.hostfile-<CLUSTER UUID>
    
  2. Check that you can run a command across you cluster. To do this, run:

    mpiexec -n <TOTAL_CORE_COUNT> -f /shared/.hostfile-<CLUSTER UUID> hostname
    

    This will print one line per each core in the cluster, with one unique hostname per cluster node.

    Your cluster's TOTAL_CORE_COUNT is usually half the number of vCPUs on each instance times the number of instances in your cluster. For example, if you have a 4 instance cluster of c5.4xlarge, then your TOTAL_CORE_COUNT is 32.

  3. Verify that you can run a python command across your cluster. For example, run:

    mpiexec -n <TOTAL_CORE_COUNT> -f /shared/.hostfile-<CLUSTER_UUID> python --version
    

If all commands succeed, you should be able to execute workloads across your cluster. You can place scripts and small data that are shared across cluster nodes in /shared. However, external storage, such as S3, should be used for reading and writing large data.

Running a Job

Bodo Cloud Platform has support for running scheduled (and immediate) Python jobs without the need for Jupyter Notebooks. To create a Job, navigate to the Jobs page by selecting Jobs in the left bar.

Sidebar-Jobs

This pages displays any INPROGRESS jobs you have previously scheduled and allows you to schedule new Jobs. At the top right corner, click on CREATE JOB. This opens a job creation form.

First, select a name for your job and specify the cluster on which you want to deploy your job. If you have an existing cluster that is not currently bound to a notebook or another job, you can select this cluster from the dropdown menu. Alternatively, you can create a cluster specifically for this job by selecting the NEW button next to the cluster dropdown menu. When creating a cluster specifically for a job, note that the cluster is only used for that job and is removed once the job completes. After selecting your cluster, indicate when you want your job to be executed in the Schedule section. Then, enter the Command that you want to execute inside this cluster.

Note

This command is automatically prepended with mpiexec -n <CORE_COUNT> python. For example, to run a file ex.py with the argument 1, you would enter the command ex.py 1.

To specify your source code location, fill in the Path line with a valid Git URL that leads to a repository containing your code.

Note

When selecting a GitHub URL, you should select the URL available at the top of your web browser and NOT the path when cloning the repository, i.e. your path SHOULD NOT end in .git.

Jobs-Forms-Standard

If you are cloning a private repository, you need to provide the platform with valid Git credentials to download your repository. To do so, select Show advanced in the bottom right of the form. Then in Workspace username, enter your Git username and in Workspace password enter either your password or a valid Github Access Token. The advanced options also allow you to specify a particular commit or branch with Workspace reference and to load other custom environment variables in Other.

Note

If your Github Account uses 2FA please use a Github Access Token to avoid any possible authentication issues.

Once your form is complete, select CREATE to begin your job.

Job-Run

Once you've provided all the necessary details, select CREATE to begin your job. You will see a NEW task created in your jobs page.

If you created a cluster specifically for this job, a new cluster will also appear in your clusters page.

Your job will begin once it reaches its scheduled time and any necessary clusters have been created. Then your job will transition to being INPROGRESS.

At this point your job will execute your desired command. Once it finishes executing, your job will transition to FINISHED status. You can find any stdout information that you may need by pressing DETAILS followed by SHOW LOGS. If a cluster was specifically created for this job, it will be deleted after the job finishes.

Note

Bodo DOES NOT preserve artifacts written to local storage. If you have any information that you need to persist and later review, you should write to external storage, such as Amazon S3. You may also write to stdout/stderr, but output logs may be truncated, so it should not be considered reliable for large outputs that need to be read later.

Troubleshooting

Here are solutions to potential issues you may encounter while using the Bodo Cloud Platform.

Unexpected number of ranks

If you are getting an unexpected number of ranks then the issue could be an inaccurate MPI hostfile for the cluster. This is mostly likely to happen after scaling up a cluster. You can update the hostfile using IPyParallel Magic %update_hostfile and then restart the kernel to apply the changes.

%update_hostfile

Update-Hostfile

For AWS troubleshooting, refer to this guide.

For Azure troubleshooting, refer to this guide.

Back to top