Skip to content

Recommended Cluster Configuration

This page describes best practices for configuring compute clusters for Bodo applications.

Minimizing Communication Overheads

Communication across cores is usually the largest overhead in parallel applications including Bodo. To minimize it:

  • For a given number of physical cores, use fewer large nodes with high core count rather than many small nodes with a low core count.

    This ensures that more cross core communication happens inside nodes. For example, a cluster with two c5n.18xlarge AWS instances will generally perform better than a cluster with four c5n.9xlarge instances, even though the two options have equivalent cost and compute power.

  • Use node types that support high bandwidth networking.

    AWS instance types with n in their name, such as c5n.18xlarge, m5n.24xlarge and r5n.24xlarge provide high bandwidth. On Azure, use virtual machines that support Accelerated Networking.

  • Use instance types that support RDMA networking.

    Examples of such instance types are Elastic Fabric Adapter (EFA) (AWS) and Infiniband (Azure). In our empirical testing, we found that EFA can significantly accelerate inter-node communication during expensive operations such as shuffle (which is used in join, groupby, sorting and others).

  • Ensure that the server nodes are located physically close to each other.

    On AWS this can be done by adding all instances to a placement group with the cluster strategy. Similarly on Azure, you can use Proximity Placement Groups.

For most applications, we recommend using c5n.18xlarge instances on AWS for best performance. For memory intensive use cases r5n.24xlarge instances are a good alternative. Both instance types support 100 Gbps networking as well as EFA.

Other Best Practices

  • Ensure that the file descriptor limit (ulimit -n) is set to a large number like 65000.

    This is especially useful when using IPyParallel which opens direct connections between ipengine and ipcontroller processes.

  • Avoid unnecessary threading inside the application since it can conflict with MPI parallelism.

    You can set the following environment variables in your shell (e.g. in bashrc) to avoid threading:

    export OPENBLAS_NUM_THREADS=1
    export OMP_NUM_THREADS=1
    export MKL_NUM_THREADS=1
    
  • Use Intel MPI for best performance. See our recommended MPI settings for more details.