5. Bodo Cloud Platform

This page descibes how to use the Bodo Cloud Platform, including registration, cluster creation, notebook attachment, and running jobs.

5.1. Registration

  1. Subscribe through the AWS Marketplace.

  2. After confirming your subscription, you’ll be directed to Bodo Platform’s registration page.

  3. Fill out the fields with your information. If this is your individual account, use a unique name such as firstname_lastname for the Organization Name field.

  4. Check the box for accepting terms and conditions and click on SIGN UP:

    Signup-Page
  5. A page confirming that an activation link was sent to your email will appear. Please open the email and click on the activation link:

    Signup-Page-Confirmation

    Clicking on the confirmation link will take you to the bodo platform page where you can use your newly created credentials to sign in:

    Login-Page

5.2. Setting AWS Credentials

The next step is to link your AWS account to the Bodo platform. This can be done either using the Settings page in the left bar or the first item in the Onboarding list highlighted in green as shown in the picture below:

Dashboard

To be able to use the Bodo Platform to launch clusters and notebooks, you must grant it permission to access your AWS account and provision the required resources in it. This can be done through an AWS Cross Account IAM Role for the Bodo Platform.

5.2.1. Create a Cross-Account IAM Role

There are two ways to create such an IAM Role, (a) you can create it manually, or (b) you can provide us with Access Keys and we can create an IAM role in your AWS account. We provide directions for both these methods below.

5.2.1.1. Create the IAM Role Manually

  1. Log in to the AWS Management Console and navigate to the IAM Service.

  2. Select the Roles tab in the sidebar, and click Create Role.

  3. In Select type of trusted entity, select Another AWS Account.

  4. Enter the Bodo Platform Account ID 481633624848 in the Account ID field.

  5. Check the Require external ID option.

    Create Role Form Step 1

    In the External ID field, copy over the External ID from the Settings page on the Bodo Platform.

    External ID Platform
  6. Click the Next: Permissions button.

  7. Click the Next: Tags button.

  8. Click the Next: Review button.

  9. In the Role name field, enter a role name, e.g. BodoPlatformUser.

    Create Role Form Review
  10. Click Create Role. You will be taken back to the list of IAM Roles in your account.

  11. In the list of IAM Roles, click on the role you just created.

  12. Click on Add inline policy.

    Create Role Summary Page
  13. Click the JSON tab.

    Create Role Manual Policy Editor
  14. Bodo Cloud Platform requires a specific set of AWS permissions which are documented in Bodo-Platform Policy. Paste the contents of the linked JSON file into the policy editor.

  15. Click on Review policy.

  16. In the Name field, add a policy name, e.g. Bodo-Platform-User-Policy. Click on Create policy. You will be taken back to the Role Summary.

  17. From the role summary, copy the Role ARN. This is the value that you will enter into the Role ARN field on the Setting Page on the Bodo Platform.

    Create Role Final Summary

5.2.1.2. Let the Bodo Platform create the IAM Role

  1. Follow the instructions from AWS Account and Access Keys guide to create/retrieve your AWS access key ID and secret access key.

  2. Click on Create Role For Me below the Role ARN field on the Setting page. This will open up a panel.

    Create Role Button on Platform
  3. Enter the Access Keys created in step 1 in the form and click on CREATE ROLE.

    Enter Access Keys to create role on Platform

    NOTE: We will not save the provided Access Keys for security reasons.

  4. Click OK on the popup confirmation box.

  5. We will use the provided Access Keys to create an IAM Role in your AWS Account.

  6. The created Role ARN will be displayed on the same form.

    Role ARN generated on the Platform
  7. Copy the generated Role ARN. This is the value that you will enter into the Role ARN field on the Setting Page on the Bodo Platform.

  8. In some cases, this role creation might fail. This could happen due to various reasons.

    1. A role already exists: In this case, please open the AWS Management Console, and navigate to the IAM Service. Click on Roles in the sidebar. Look for a Role named BodoPlatformUser. Click on the role, and copy over the Role ARN from the role summary. Alternatively, you can delete the existing role from the AWS Console and then try to create an IAM role again via the Bodo Platform. This will ensure you have the role set up with the correct permissions.

      Note: If this is a shared AWS Account, ensure that no one else is actively using this IAM Role before deleting it.

    2. Provided access keys are not valid: Please ensure that valid access keys are provided.

    3. Provided access keys don’t have the right permissions to create a role: Please ensure that the provided access keys have the permissions required to create an IAM Role.

    If none of these work, try creating the IAM Role manually as described in Create the IAM Role Manually.

Once you have generated an IAM Role using either of the methods described above, you are now ready to fill the Setting Form on the Bodo Platform.

  1. Follow the instructions on AWS Account ID guide to retrieve your AWS account ID, and enter it in the AWS Account ID field in the Settings Form on the Bodo Platform.

  2. Enter the Role ARN created using one of the above options into the Role ARN field in the Settings Form.

  3. Select a region from the dropdown list. This is the region that your resources will be deployed in by default.

  4. Click on SAVE.

You can see the progress on granting AMI launch permissions to your account ID in the AMI Share Status field. Your account is ready when it turns green.

Note: We grant AMI launch permissions to your account in the following AWS regions: us-east-1, us-east-2, us-west-1 & us-west-2.

Note: It is highly recommended that you ensure sufficient limits on your AWS account to launch resources. See Resources Created in Your AWS Environment for the resources required for Bodo Cloud Platform.

5.3. Creating Clusters

In the left bar click on Clusters (or click on the second step in the Onboarding list):

Sidebar-Clusters

This will take you to the Clusters page. At the top right corner, click on Create Cluster which opens the cluster creation form. First, choose a name for your cluster and check the EFA checkbox if you want to use EFA-enabled nodes. Then, select the type of nodes in the cluster to be created from the Instance type dropdown list.

Note: If the Instance type dropdown list does not populate, either the AWS credentials are not entered properly or they are not valid. Please go back to Setting AWS Credentials and make sure you complete it with valid credentials.

Next, enter the number of nodes for your cluster in Number of Instances. and choose the Bodo Version to be installed on your cluster. Typically the three latest Bodo Releases are available.

Note: If the Bodo Version dropdown list does not populate, either the AWS credentials are not entered properly or the permissions to Bodo’s AMIs have not been granted to your account. Please go back Setting AWS Credentials and make sure you complete it with valid credentials and that AMIs have been successfully shared with your AWS account.

Then, select a value for Cluster auto shutdown. This is the amount of time of inactivity after which the platform will remove the cluster automatically. Activity is determined through attached notebooks (see Attaching a Notebook to a Cluster) and jobs (see Running a Job). Therefore, if you don’t plan to attach a notebook or a job to this cluster (and use it via ssh instead), it’s recommended to set this to Never, since otherwise the cluster will be removed after the set time.

Cluster-creation-form

Finally click on CREATE. You will see that a new task for creating the cluster has been created.

Cluster-Status-New

The status is updated to INPROGRESS when the task starts executing and cluster creation is in progress.

Cluster-Status-InProgress

You can click on the Details drop down to monitor the progress for the cluster creation.

Cluster-Info

Once the cluster is successfully created and ready to use, the status is updated to FINISHED.

Cluster-Status-Finished

5.4. Attaching a Notebook to a Cluster

Go to the notebooks page by clicking on Notebooks in the left bar (or on the third green step in the Onboarding list at the top).

Sidebar-Notebooks

This will take you to the Notebooks page. At the top right corner, click on the Create Notebook button which opens the notebook creation form. Choose a name for your notebook and select the type of node that will host the notebook from the Instance type drop down list. Note that this node is for running the Jupyter notebook itself, and will not run cluster workloads. Lastly, select a cluster for attaching the notebook from the Cluster drop down menu and and click on CREATE.

Notebook-Creation-Form

After clicking CREATE, a new task for creating the notebook and its corresponding node is created.

Notebook-Status-New

The status updates to INPROGRESS when the task starts executing.

Notebook-Status-InProgress

After creating the notebook, the platform runs AWS readiness probe checks:

Notebook-Status-ReadinessProbe

The notebook is ready to use after all checks are complete. OPEN NOTEBOOK will open the notebook in the current browser page, while the dropdown allows opening the notebook in a new tab.

Notebook-Status-Finished

5.5. Connecting to a Cluster

We recommend interacting with clusters primarily through Jupyter notebooks and Jobs. However, it may be necessary to connect directly to a cluster in some cases. You can either connect through a notebook terminal (recommended), or ssh directly from your machine. The latter requires providing your ssh public key during cluster creation.

5.5.1. Connecting with a Notebook Terminal

Follow the steps in Creating Clusters and Attaching a Notebook to a Cluster to attach a Notebook to a cluster.

Then, go the cluster tab and find your cluster. Click on DETAILS and copy the cluster UUID.

Cluster-UUID-Info

Next, go to the notebooks tab and select OPEN NOTEBOOK. In the Launcher, click on Terminal.

Notebook-Terminal

Through this terminal, you can interact with the /shared folder, which is shared by all the instances in the cluster and the Notebook instance. Follow the steps in Verify your Connection, to interact directly with your cluster.

5.5.2. SSH From Your Machine

First, navigate to the clusters tabs and select Create a Cluster. Click on Show Advanced and add your public key in SSH Public Key. Then, click on Add your IP in the Access from IP address section to enable accessing your cluster from your machine.

Cluster-Creation-Advanced-Settings

Fill the rest of the form by following the steps in Creating Clusters.

In the clusters tab, select your cluster and click on DETAILS to find the list of IP addresses for your cluster nodes. Use any of the IP addresses as the ssh destination. In addition, also copy the cluster UUID which will be needed to execute commands across the cluster.

Cluster-IP-Info

In any ssh agent, you can connect to one of your nodes with:

ssh -i <path_to_private_key> bodo@<IP_ADDRESS>

To add additional ssh options please refer to the documentation for your ssh agent.

5.5.3. Verify your Connection

Once you have connected to a node in your cluster, you should verify that you can run operations across all the instances in the cluster.

  1. Verify the path to the hostfile for your cluster. You can find it by running:

    ls -la /shared/.hostfile-<CLUSTER UUID>
    
  2. Check that you can run a command across you cluster. To do this, run:

    mpiexec -n <TOTAL_CORE_COUNT> -f /shared/.hostfile-<CLUSTER UUID> hostname
    

    This will print one line per each core in the cluster, with one unique hostname per cluster node.

    Your cluster’s TOTAL_CORE_COUNT is usually half the number of vCPUs on each instance times the number of instances in your cluster. For example, if you have a 4 instance cluster of c5.4xlarge, then your TOTAL_CORE_COUNT is 32.

  3. Verify that you can run a python command across your cluster. For example, run:

    mpiexec -n <TOTAL_CORE_COUNT> -f /shared/.hostfile-<CLUSTER_UUID> python --version
    

If all commands succeed, you should be able to execute workloads across your cluster. You can place scripts and small data that are shared across cluster nodes in /shared. However, external storage, such as S3, should be used for reading and writing large data.

5.6. Running a Job

Bodo Cloud Platform has support for running scheduled (and immediate) Python jobs without the need for Jupyter Notebooks. To create a Job, navigate to the Jobs page by selecting Jobs in the left bar.

Sidebar-Jobs

This pages displays any INPROGRESS jobs you have previously scheduled and allows you to schedule new Jobs. At the top right corner, click on CREATE JOB. This opens a job creation form.

First, select a name for your job and specify the cluster on which you want to deploy your job. If you have an existing cluster that is not currently bound to a notebook or another job, you can select this cluster from the dropdown menu. Alternatively, you can create a cluster specifically for this job by selecting the NEW button next to the cluster dropdown menu. When creating a cluster specifically for a job, note that the cluster is only used for that job and is removed once the job completes. After selecting your cluster, indicate when you want your job to be executed in the Schedule section. Then, enter the Command that you want to execute inside this cluster.

Note: This command is automatically prepended with mpiexec -n <CORE_COUNT> python. For example, to run a file ex.py with the argument 1, you would enter the command ex.py 1.

To specify your source code location, fill in the Path line with a valid Git URL or S3 URI that leads to a repository containing your code.

Note: When selecting a GitHub URL, you should select the URL available at the top of your web browser and NOT the path when cloning the repository, i.e. your path SHOULD NOT end in .git. If selecting an S3 URI, your S3 bucket must be in the same region as your cluster.

Jobs-Forms-Standard

If you are cloning a private repository, you need to provide the platform with valid Git credentials to download your repository. To do so, select Show advanced in the bottom right of the form. Then in Workspace username, enter your Git username and in Workspace password enter either your password or a valid Github Access Token. The advanced options also allow you to specify a particular commit or branch with Workspace reference and to load other custom environment variables in Other.

Note: If your Github Account uses 2FA please use a Github Access Token to avoid any possible authentication issues.

Once your form is complete, select CREATE to begin your job.

Jobs-Forms-Advanced

Once you’ve provided all the necessary details, select CREATE to begin your job. You will see a NEW task created in your jobs page.

New-Job

If you created a cluster specifically for this job, a new cluster will also appear in your clusters page.

New-Job-Cluster

Your job will begin once it reaches its scheduled time and any necessary clusters have been created. Then your job will transition to being INPROGRESS.

InProgress-Job

At this point your job will execute your desired command. Once it finishes executing, your job will transition to FINISHED status. You can find any stdout information that you may need by pressing DETAILS followed by SHOW LOGS. If a cluster was specifically created for this job, it will be deleted after the job finishes.

Finished-Job

Note: Bodo DOES NOT preserve artifacts written to local storage. If you have any information that you need to persist and later review, you should write to external storage, such as Amazon S3. You may also write to stdout/stderr, but output logs may be truncated, so it should not be considered reliable for large outputs that need to be read later.

5.7. Resources Created in Your AWS Environment

Bodo deploys cluster/notebook resources in your own AWS environment to ensure security of your data. Below is a list of AWS resources that the Bodo Platform creates in your account to enable clusters and notebooks.

AWS Service

Purpose

EC2 Instances

Cluster/notebook workers

EFS

Shared file system for clusters

VPC, Subnets, NAT Gateway, Elastic IP, ENI, Security Groups, …

Secure networking for clusters/notebooks

S3 and Dynamo DB

Resource states

AWS Systems Manager

Managing EC2 instances

KMS

Cluster secrets (e.g. SSH keys)

IAM Role for Clusters

Allow cluster workers to access resources above

Note

These resources incur additional AWS infrastructure charges and are not included in the Bodo Platform charges.

5.8. AWS Account Cleanup

As explained in Resources Created in Your AWS Environment, the platform creates two types of resources in the users’ AWS environments: organization level resources and cluster specific resources. The organization level resources are created by the platform to set up shared resources (such as a VPC, an EFS Mount, etc) that are used later by all created resources. The cluster specific resources (such as EC2 instances, ENIs, etc) are created by the platform to host/manage a specific cluster. This includes notebooks and corresponding resources as well. The cluster specific resources are removed when you request a cluster to be removed. The organization level resources persist in the user account so they can be used by clusters deployed in the future. However, if you need to remove these resources for any reason (AWS limits, etc.), an option to do so is provided. Navigate to the Settings page and click on Show Advanced in the bottom-right corner.

Settings-Account-Cleanup

This will bring up a section called AWS Resource Cleanup.

Advanced-Settings-Account-Cleanup

Select the region from which you would like to remove these resources (i.e. the region in which the resources you want to delete have been created), and click CLEANUP AWS RESOURCES. Note that this will only work if you don’t have any active clusters in that region deployed through the platform. Else, the request will be rejected, and you’ll be asked to remove all clusters in that region before trying again. Removing active clusters (including clusters with a FAILED status) is necessary because this process will make them inaccessible to the platform.

5.9. Troubleshooting

Here are solutions to potential issues you may encounter while using the Bodo Cloud Platform:

5.9.1. Cluster Creation Fails

Most of cluster creation failures are usually due to one of the following:

  • Your account hits AWS resource limits such as limits on the number of VPCs and EC2 instances

  • Your AWS credentials do not have the required permissions (see Setting AWS Credentials)

  • AWS does not have enough of the requested resources (such as some of the large EC2 instances)

In case of failure, the logs are made available on the platform and should provide some details regarding why the failure occurred. Even though cluster creation was not successful, some AWS resources may still have been provisioned. Click on the delete icon to remove all the created resources, otherwise you may incur charges for the provisioned AWS resources. You can try to create a cluster again after addressing the underlying issue such as increasing limits or providing AWS credentials with the required permissions.

5.9.2. Cluster Deletion Fails

Failures during cluster deletion are very rare and usually only occur when the provisioned resources have been manually modified in some way. In these cases, logs are provided to help you diagnose the issue. For instance, if logs indicate that some resource cannot be deleted due to a dependent resource, you can try to delete the resource manually through the AWS Management Console and try to remove the cluster through the platform again.

5.9.3. Cleanup Shared Resources Manually

As described in AWS Account Cleanup, an option to remove organization level shared resources provisioned by Bodo in your AWS environment is provided. If you need to remove resources manually (e.g. the process fails), below is the list of organization level resources and the order to remove them.

Note: Please ensure that you have removed all clusters and related resources before proceeding. Deleting the resources listed below may result in the platform losing access to those clusters for removal in the future.

The resources should be easy to identify within their respective sections on the AWS Management Console since their names are all prefixed with bodo.

  1. Navigate to the AWS Management Console. Sign in if you are not already signed in. Make sure you have selected the region from which you want to remove the shared resources.

  2. Click on Services in the top-right corner. Navigate to the EC2 section (under Compute) and then to Network Interfaces in the sidebar (under Network & Security). You will see two Network Interfaces. One of them is required for an EFS Mount (shared storage), and the other is required by a NAT Gateway. These dependent resources need to be removed first.

    1. Click on Services and navigate to the EFS section (under Storage). Click on File Systems in the sidebar. Delete the File System prefixed with bodo by selecting it and clicking on Delete.

    2. Click on Services and navigate to the VPC section (under Networking & Content Delivery). Select NAT Gateways in the sidebar (under Virtual Private Cloud). Select the NAT Gateway prefixed with bodo and delete it.

    Navigate back to Network Interfaces in the EC2 section and ensure that the two ENIs are deleted (or have the status available). This may take a few minutes in some cases.

  3. Click on Services and navigate to the VPC section (under Networking & Content Delivery). Select Your VPCs in the sidebar (under Virtual Private Cloud). Select the VPC prefixed with bodo and delete it. If there is a dependency warning, wait for a few minutes and try again. You can also try to delete the linked dependent resources manually if it does not resolve on its own.

  4. Click on Services in the top-right corner. Navigate to the EC2 section (under Compute) and select Elastic IPs in the sidebar (under Network & Security). Select the EIP prefixed with bodo and select Release Elastic IP addresses under Actions.

  5. Click on Services in the top-right corner. Navigate to the Key Management Service (KMS) section (under Security, Identity, & Compliance) and select Customer managed keys in the sidebar. Click on the key prefixed with bodoai-kms. Go to the Aliases tab. There should be a single alias defined. Select this alias and delete it. Next, click on Key actions (top-right) and select Schedule key deletion.

    Optional: Reduce the Waiting period from 30 days to 7 days.

    Next, check Confirm that you want to delete this key in XX days and click on Schedule deletion.

  6. Finally, click on Services in the top-right corner and navigate to Systems Manager (under Management & Governance). Select Parameter Store from sidebar. Look for parameters prefixed with /<EXTERNAL_ID>, where EXTERNAL_ID is the same as the External ID visible on the Settings page on the Bodo Platform (see Create the IAM Role Manually). Select all these parameter entries and delete them.

The steps above should remove the organization level resources provisioned by Bodo in your AWS environment.

5.10. Billing

Users subscribed to the Bodo Platform through the AWS Marketplace will be charged for their use of the platform as part of their regular AWS bill. The platform charges are based on the type of instances deployed and the duration of their usage (to the nearest minute). The hourly rate for the supported instance types can be found on our website. For any cluster deployed through the platform, users are charged starting from when the cluster has been successfully deployed, until the time the user requests the cluster to be removed.

Note: Users are not charged in case of failures in cluster creation.

As mentioned previously in Resources Created in Your AWS Environment, the AWS resources set up by the platform in your AWS environment incur additional AWS infrastructure charges, and are not included in the Bodo Platform charges.