Recommended Cluster Configuration¶
This section describes best practices for configuring compute clusters for Bodo applications.
Minimizing Communication Overheads¶
Communication across cores is usually the largest overhead in parallel applications including Bodo. To minimize it:
-
For a given number of physical cores, use fewer large nodes with high core count rather than many small nodes with a low core count.
This ensures that more cross core communication happens inside nodes. For example, a cluster with two
c5n.18xlarge
AWS instances will generally perform better than a cluster with fourc5n.9xlarge
instances, even though the two options have equivalent cost and compute power. -
Use node types that support high bandwidth networking.
AWS instance types with
n
in their name, such asc5n.18xlarge
,m5n.24xlarge
andr5n.24xlarge
provide high bandwidth. On Azure, use virtual machines that support Accelerated Networking. -
Use instance types that support RDMA networking.
Examples of such instance types are Elastic Fabric Adapter (EFA) (AWS) and Infiniband (Azure). In our empirical testing, we found that EFA can significantly accelerate inter-node communication during expensive operations such as shuffle (which is used in join, groupby, sorting and others).
-
List of AWS EC2 instance types that support EFA. For more information about EFA refer to the section on Recommended AWS Network Interface .
-
-
Ensure that the server nodes are located physically close to each other.
On AWS this can be done by adding all instances to a placement group with the
cluster
strategy. Similarly on Azure, you can use Proximity Placement Groups.
For most applications, we recommend using c5n.18xlarge
instances on
AWS for best performance. For memory intensive use cases r5n.24xlarge
instances are a good alternative. Both instance types support 100 Gbps
networking as well as EFA.
Other Best Practices¶
-
Ensure that the file descriptor limit (
ulimit -n
) is set to a large number like65000
.This is especially useful when using IPyParallel which opens direct connections between
ipengine
andipcontroller
processes. -
Avoid unnecessary threading inside the application since it can conflict with MPI parallelism.
You can set the following environment variables in your shell (e.g. in
bashrc
) to avoid threading: -
Use Intel MPI for best performance. See our recommended MPI settings for more details.