Several large and small organizations run workloads on AWS using AWS compute. Here, AWS Compute refers to a set of services on AWS that help you build and deploy your own solutions and services; this can include workloads as diverse as websites, data analytics engines, Machine Learning (ML), High-Performance Computing (HPC), and more. Being one of the first services to be released, Amazon Elastic Compute Cloud (EC2) is sometimes used synonymously with the term compute and offers a wide variety of instance types, processors, memory, and storage configurations for your workloads.
Apart from EC2, compute services that are suited to some specific types of workloads include Amazon Elastic Container Service (ECS), Elastic Kubernetes Service (EKS), Batch, Lambda, Wavelength, and Outposts. Networking on AWS refers to foundational networking services, including Amazon Virtual Private Cloud (VPC), AWS Transit Gateway, and AWS PrivateLink. These services, along with the various compute services, enable you to build solutions with the most secure and performant networked systems at a global scale. AWS compute and networking concepts are two broad topics and are important to understand many concepts that will be discussed in the following chapters.
Compute and networking also form two important pillars of HPC, along with data management, which was discussed in the last chapter. Every application of HPC is generally optimized for high levels of distributed compute, which depends on networking.
In this chapter, you will learn about the different services AWS offers for compute and networking, how these services are used for different types of computing workloads, and lastly, best practices for the HPC type of workloads on AWS, which goes beyond the AWS Well-Architected Framework.
Specifically, in this chapter, we will cover the following topics:
Compute lies at the foundation of every HPC application that you will read about in and outside of this book. In AWS and other clouds in general, compute refers to a group of services that offer the basic building blocks of performing a computation or some business logic. This can range from basic data computations to ML.
The basic units of measuring compute power on AWS (regardless of the service we are talking about) are as follows:
Typical HPC applications access multiple instances and hence can take advantage of pooled compute and memory resources for larger workloads.
The foundational service that provides compute resources for customers to build their applications on AWS is called Amazon EC2. Amazon EC2 provides customers with a choice of about 500 instance types (at the time of writing this book and according to public documentation). Customers can then tailor the right combination of instance types for their business applications.
Amazon EC2 provides five types of instances:
Each of the instance types listed here is actually a family of instances, as shown in Figure 3.1:
Figure 3.1 – Amazon EC2 instance types
In the following section, we will highlight some important facts about these instance types.
General purpose instances can be used for a variety of workloads. They have the right balance of compute, memory, and storage for most typical applications that customers have on AWS. On AWS, there are several types of general purpose instances:
Figure 3.2 – CPU utilization versus time for burstable T-type instances on AWS
In the following section, we will discuss compute optimized instances on AWS.
Many HPC applications that will be described in this book take advantage of the high-performance, compute optimized instance types on AWS.
There are several types of compute optimized instances:
In the following section, we will discuss accelerated compute instances on AWS.
Accelerated computing instances use co-processors such as GPUs to accelerate performance for workloads such as floating point number calculations useful for ML, deep learning, and graphics processing.
Accelerated compute instances use hardware-based compute accelerators such as the following:
GPUs were originally used for 3D graphics but are now being used as general-purpose co-processors for various applications such as HPC and deep learning. HPC applications are computation and bandwidth-heavy. Several types of NVIDIA GPUs are available on AWS, and detailed information can be found at the following link, https://aws.amazon.com/nvidia/.
Let’s dive into the basics of how GPUs help with compute-heavy calculations. Imagine adding a list of numbers to another list of the same size. Visually, this looks like the following diagram:
Figure 2.3 – Adding two arrays
The naïve way of adding these to arrays is to loop through all elements of each array and add each corresponding number from the top and bottom arrays. This may be fine for small arrays, but what about arrays that are millions of elements long? To do this on a GPU, we first allocate memory for these two very long arrays and then use threads to parallelize these computations. Adding these arrays using a single thread on a single GPU is the same as our earlier naïve approach. Using multiple threads (say 256) can help parallelize this operation by allocating a part of the work to each thread. For example, the first few elements (the total size divided by 256 in this case) will be done by the first thread, and so on. This speeds up the operation by letting each thread focus on a smaller portion of the work and do each of these split-up addition operations in parallel; see the shaded region in the following diagram:
Figure 3.4 – Multiple threads handling a portion of the computation
GPUs today are architected in a way that allows even higher levels of parallelism – multiple processing threads make up a block, and there are usually multiple blocks in a GPU. Each block can run concurrently in a Streaming Multiprocessor (SM) and process the same set of computations or kernels. Visually, this looks like the following:
Figure 3.5 – Multiple blocks in a GPU
To give you an idea of what you can access on AWS, consider the P4d.24xlarge instance. This instance has eight GPUs, as seen in the following figure, each of which is an NVIDIA A100 housing 108 SMs, with each SM capable of running 2,048 threads in parallel:
Figure 3.6 – A single instance with multiple GPUs
On AWS, P4d instances can be used to provision a supercomputer or an EC2 Ultracluster with more than 4,000 A100 GPUs, Petabit-scale networking, and scalable, shared high throughput storage on Amazon FSx for Lustre (https://aws.amazon.com/fsx/lustre/). Application and package developers use the NVIDIA CUDA library to build massively parallel applications for HPC and deep learning. For example, PyTorch, a popular ML library, uses NVIDIA’s CUDA GPU programming library for training large-scale models. Another example is Ansys Fluent, a popular Computational Fluid Dynamics (CFD) simulation software that uses GPU cores to accelerate fluid flow computations.
On AWS, there are several families of GPU instances:
Amazon EC2 F1 instances allow you to develop and deploy hardware-accelerated applications easily on the cloud. Example applications include (but are not limited to) big data analytics, genomics, and simulation-related applications. Developers can use high-level C/C++ code to program their applications, register the FPGA as an Amazon FPGA Image (AFI), and deploy the application to an F1 instance. For more information on F1 instances, please refer to the links in the Reference section at the end of this chapter.
In the following section, we will discuss memory optimized compute instances on AWS.
Memory optimized instances on AWS are suited to run applications that require storage of extremely large data in memory. Typical applications that fall into this category are in-memory databases, HPC applications, simulation, and Electronic Design Automation (EDA) applications. On AWS, there are several types of memory optimized instances:
Storage optimized instances are well suited for applications that need frequent, sequential reads and writes from local storage by providing very high I/O Operations Per Second (IOPS). There are several storage optimized instances on AWS:
Now that we have discussed different instance types that you can choose for your applications on AWS, we can move on to the topic of Amazon Machine Images (AMIs). AMIs contain all the information needed to launch an instance. This includes the following:
You can create your own AMI, or buy, share, or sell your AMIs on the AWS Marketplace. AWS maintains Amazon Linux-based AMIs that are stable and secure, updated and maintained on a regular basis, and includes several AWS tools and packages. Furthermore, these AMIs are provided free of charge to AWS customers.
In the previous section, we spoke about AMIs on AWS that can help isolate and replicate applications across several instances and instance types. Containers can be used to further isolate and launch one or more applications onto instances. The most popular flavor of containers is called Docker. Docker is an open platform for developing, shipping, and running applications. Docker provides the ability to package and run an application in a loosely isolated environment called a container. Docker containers are definitions of runnable images, and these images can be run locally on your computer, on virtual machines, or in the cloud. Docker containers can be run on any host operating system, and as such are extremely portable, as long as Docker is running on the host system.
A Docker container contains everything that is needed to run the applications that are defined inside it – this includes configuration information, directory structure, software dependencies, binaries, and packages. This may sound complicated, but it is actually very easy to define a Docker image; this is done in a Dockerfile that may look similar to this:
FROM python:3.7-alpine COPY . /app WORKDIR /app RUN pip install -r requirements.txt CMD ["gunicorn", "-w 4", "main:app"]
The preceding file named Dockerfile defines the Docker image to run a sample Python application using the popular Gunicorn package (see the last line in the file). Before we can run the application, we tell Docker to use the Python-3.7 base image (FROM python:3.7-alpine), copy all the required files from the host system to a folder called app, and install requirements or dependencies for that application to run successfully (RUN pip install -r requirements.txt). Now you can test out this application locally before deploying it at scale on the cloud.
On AWS, you can run containers on EC2 instances of your choice or make use of the many container services available:
In the previous section, you read about AWS Fargate, which lets you run applications and code based on Docker containers, without the need to manage infrastructure. This is an example of a serverless service on AWS. AWS offers serverless services that have the following features in common:
Serverless compute technologies on AWS are AWS Lambda and Fargate. AWS Lambda is a serverless computing service that lets you run any code that can be triggered by over 200 services and SaaS applications. Code can be written in popular languages such as Python, Node.js, Go, and Java or can be packaged as Docker containers, as described earlier. With AWS Lambda, you only pay for the number of milliseconds that your code runs, beyond a very generous free tier of over a million free requests. AWS Lambda supports the creation of a wide variety of applications including file processing, streaming, web applications, IoT backend applications, and mobile app backends.
For more information on serverless computing on AWS, please refer to the links included in the References section.
In the next section, we will cover basic concepts around networking on AWS.
Networking on AWS is a vast topic that is out of the scope of this book. However, in order to easily explain some of the sections and chapters that follow, we will attempt to provide a brief overview here. First, AWS has a concept called regions, which are physical areas around the world where AWS places clusters of data centers. Each region contains multiple logically separated, groups of data centers called availability zones. Each availability zone has independent power, cooling, and physical security. Availability zones are connected via redundant and ultra-low latency AWS Networks. At the time of writing this chapter, AWS has 26 regions and 84 availability zones.
The next foundational concept we will discuss here is a Virtual Private Cloud (VPC). A VPC is a logical partition that lets you launch and group AWS resources. In the following diagram, we can see that a region has multiple availability zones that can span multiple VPCs:
Figure 3.7 – Relationship between regions, VPCs, and availability zones
A subnet is a range of IP addresses associated with the VPC you have defined. A route table is a set of rules that determine how traffic will flow within the VPC. Every subnet you create in a VPC is automatically associated with the main route table of the VPC. A VPC endpoint lets you connect resources from one VPC to another and to other services.
Next, we will discuss Classless Inter-Domain Routing (CIDR) blocks and routing.
CIDR is a set of standards that is useful for assigning IP addresses to a device or group of devices. A CIDR block looks like the following:
10.0.0.0/16
This defines the starting IP, and the number of IP addresses in the block. Here, the 16 means that there are 2^(32-16) or 65,536 unique addresses. When you create a CIDR block, you have to make sure that all IP addresses are contiguous, the block size is a power of 2, and IPs range from 0.0.0.0 to 256.256.256.256.
For example, the CIDR block 10.117.50.0/22 has a total of 2^(32-22), or 1,024 addresses. Now, if we would like to partition this network into four more networks with 256 addresses each, we could use the following CIDR blocks:
10.117.50.0.22 |
10.117.50.0/24 |
256 addresses |
10.117.51.0/24 |
256 addresses | |
10.117.52.0/24 |
256 addresses | |
10.117.53.0/24 |
256 addresses |
Figure 3.8 – Example of using CIDR blocks to create four partitions on the network
Great, now that we know how CIDR blocks work, let us apply the same to VPCs and subnets.
Referring back to Figure 3.8, we have made a few modifications to show CIDR blocks that define two subnets within VPC1 in the following diagram:
Figure 3.9 – CIDR blocks used to define two subnets within VPC1
As we can see in Figure 3.9, VPC 1 has a CIDR block of 10.0.0.0/16 (amounting to 65,536 addresses), and the two subnets (/24) have allocated 256 addresses each. As you have already noticed, there are several unallocated addresses in this VPC, which can be used in the future for more subnets. Routing decisions are defined using a route table, as shown in the figure. Here, each subnet is considered to be private, as traffic originating from within the VPC cannot leave the VPC. This also means that resources within this VPC cannot, by default, access the internet. One way to allow resources from within a subnet to access the internet is to add an internet gateway. For allowing only outbound internet connection from a private subnet, you can use an NAT gateway. This is often a requirement for security-sensitive workloads. This modification results in the following change to our network diagram:
Figure 3.10 – Adding an internet gateway to Subnet 1
The main route table is associated with all subnets in the VPC, but we can also define custom route tables for each subnet. This defines whether the subnet is private, public, or VPN only. Now, if we need resources in Subnet 2 to only access VPN resources in a corporate network via a Virtual Private Gateway (VGW) in Figure 3.11, we can create two route tables and associate them with Subnet 1 and Subnet 2, as shown in the following diagram:
Figure 3.11 – Adding a VGW to connect to on-premises resources
A feature called VPC peering can be used in order to privately access resources in another VPC on AWS. With VPC peering, you can use a private networking connection between two VPCs to enable communication between them. For more information, you can visit https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html. As shown in the following diagram, VPC peering allows resources in VPC 1 and VPC 2 to communicate with each other as though they are in the same network:
Figure 3.12 – Adding VPC peering and VPC endpoints
VPC peering can be done within VPCs in the same region or VPCs in different regions. A VPC endpoint allows resources from within a VPC (here, VPC 2) to access AWS services privately. Here, an EC2 instance can make private API calls to services such as Amazon S3, Kinesis, or SageMaker. These are called interface-type endpoints. Gateway-type VPC endpoints are also available for Amazon S3 and DynamoDB, where you can further customize access control using policies (for example, bucket policies for Amazon S3).
Large enterprise customers with workloads that run on-premises, as well as on the cloud, may have a setup similar to Figure 3.13:
Figure 3.13 – Enterprise network architecture example
Each corporate location may be connected to AWS by using Direct Connect (a service for creating dedicated network connections to AWS with a VPN backup. Private subnets may host single or clusters of EC2 instances for large, permanent workloads. The cluster of EC2 instances is placed in a multi-AZ autoscaling group so that the workload can recover from the unlikely event of an AZ failure, and a minimum number of EC2 instances is maintained.
For ephemeral workloads, managed services such as EKS, Glue, or SageMaker can be used. In the preceding diagram, a private EKS cluster is placed in VPC 2. Since internet access is disabled by default, all container images must be local to the VPC or copied onto an ECR repository; that is, you cannot use an image from Docker Hub. To publish logs and save checkpoints, VPC endpoints are required in VPC 2 to connect to the Amazon S3 and CloudWatch services. Data stores and databases are not discussed in this diagram but are important considerations in hybrid architectures. This is because some data cannot leave the corporate network but may be anonymized and replicated on AWS temporarily.
Typically, this temporary data on AWS is used for analytics purposes before getting deleted. Lastly, hybrid architectures may also involve AWS Outposts, which is a fully managed service that extends AWS services, such as EC2, ECS, EKS, S3, EMR, Relational Database Service (RDS) and so on, to on-premises.
Now that you have learned about the foundations of compute and network on AWS, we are ready to explore some typical architectural patterns for compute on AWS.
Selecting the right compute for HPC and ML applications involves considering the rest of the architecture you are designing, and therefore involves all aspects of the Well-Architected Framework:
We cover best practices across these pillars at the end of this section, but first, we will start with the most basic pattern of computing on AWS and add complexity as we progress.
Many HPC applications that are built for simulations, financial services, CFD, or genomics can run on a single EC2 instance as long as the right instance type is selected. We discussed many of these instance-type options in the Introducing AWS compute ecosystem section. As shown in the following diagram, a CloudFormation Template can be used to launch an EC2 Instance in a VPC, and Secure Shell (SSH) access can be provided to the user for installing and using software on this instance:
Figure 3.14 – CloudFormation Template used to launch an EC2 Instance inside a VPC
Next, we will describe a pattern that uses AWS ParallelCluster.
AWS ParallelCluster can be used to provision a cluster with head and worker nodes for massive-scale parallel processing or HPC. ParallelCluster, once launched, will be similar to on-premises HPC clusters with the added benefits of security and scalability in the cloud. These clusters can be permanent or provisioned and de-provisioned on an as-needed basis. On AWS, a user can use the AWS ParallelCluster Command-Line Interface (CLI) to create a cluster of EC2 instances on the fly. AWS CloudFormation is used to launch the infrastructure, including required networking, storage, and AMI configurations. As the user (or multiple users) submit jobs through the job scheduler, more instances are provisioned and de-provisioned in the autoscaling group, as shown in the following diagram:
Figure 3.15 – Using AWS ParallelCluster for distributed workloads on AWS
Once the user is done with using the cluster for their HPC workloads, they can use the CLI or CloudFormation APIs to delete all resources created. As a modification to what is suggested in the following architecture, you can replace the head/master EC2 node with an Amazon SQS queue to get a queue-based architecture for typical HPC workloads.
Next, we will discuss how you can use AWS Batch.
AWS Batch helps run HPC and big data-based applications that are based on unconnected input configurations or files without the need to manage infrastructure. To submit a job to AWS batch, you package your application as a container and use the CLI or supported APIs to define and submit a job. With AWS Batch, you can get started quickly by using default job configurations, a built-in job queue, and integration with workflow services such as AWS Step Functions and Luigi.
As you can see in the following screenshot, the user first defines a Docker image (much like the image we discussed in the section on containers) and then registers this image with Amazon ECR. Then, the user can create a job definition in AWS Batch and submit one or more jobs to the job queue. Input data can be pulled from Amazon S3, and output data can be written to a different location on Amazon S3:
Figure 3.16 – Using AWS Batch along with AWS EC2 instances for batch workloads
Next, we will discuss patterns that help with hybrid architectures on AWS.
Customers who have already invested in large on-premises clusters, and who also want to make use of the on-demand, highly scalable, and secure AWS environment for their jobs, generally opt for a hybrid approach. In this approach, organizations decide to do one of the following:
On-premises data can be transferred to Amazon S3 using a software agent called DataSync (see https://docs.aws.amazon.com/datasync/latest/userguide/working-with-agents.html). Clusters that use Lustre’s shared high-performance file system on-premises can make use of Amazon FSx for Lustre on AWS (for more information, see https://aws.amazon.com/fsx/lustre/). The following diagram is a reference architecture for hybrid workloads:
Figure 3.17 – Using FSx, S3, and AWS DataSync for hybrid architectures
Next, we will discuss patterns for container-based distributed processing
The following diagram is a reference architecture for container-based distributed processing workflows that are suited for HPC and other related applications:
Figure 3.18 – EKS-based architecture for distributed computing
Admins can use command-line tools such as eksctl or CloudFormation to provision resources. Pods that are one or more containers can be run on managed EC2 nodes of your choice or via the AWS Fargate service. EMR on EKS can also be used to run open source, big data applications (for example, based on Spark) directly on EKS-managed nodes. In all of the preceding cases, containers that are provided by AWS can be used as a baseline, or completely custom containers that you build and push to ECR may be used. Applications running in EKS pods can access data from Amazon S3, Redshift, DynamoDB, or a host of other services and applications. To learn more about EKS, Fargate, or EMR on EKS, please take a look at the links provided in the References section.
The following diagram is an example of serverless architecture that can be used for real-time, serverless processing, analytics, and business intelligence:
Figure 3.19 – Architecture for real-time, serverless processing and business analytics
First, Kinesis Data Streams captures data from one or more data producers. Next, Kinesis Data Analytics can be used to build real-time applications for transforming this incoming data using SQL, Java, Python, or Scala. Data can also be interactively processed using managed Apache Zeppelin notebooks (https://zeppelin.apache.org/). In this case, a Lambda Function is being used to continuously post-process the output of the Kinesis Analytics application before dropping a filtered set of results into the serverless, NoSQL database DynamoDB.
Simultaneously, the Kinesis Firehose component is being used to save incoming data into S3, which is then processed by several other serverless components such as AWS Glu and AWS Lambda, and orchestrated using AWS Step Functions. With AWS Glue, you can run serverless Extract-Transform-Load (ETL) applications that are written in familiar languages such as SQL or Spark. You can then save the output of Glue transform jobs to data stores such as Amazon S3 or Amazon Redshift. ML applications that run on Amazon SageMaker can also make use of the output data from real-time streaming analytics.
Once the data is transformed, it is ready to be queried interactively using Amazon Athena. Amazon Athena makes it possible for you to query data that resides in Amazon S3 using standard SQL commands. Athena is also directly integrated with the Glue Data Catalog, which makes it much easier to work with these two services without the additional burden of writing ETL jobs or scripts to enable this connection. Athena is built on the open source library Presto (https://prestodb.io/) and can be used to query a variety of standard formats such as CSV, JSON, Parquet, and Avro. With Athena Federated data sources, you can use a visualization tool such as Amazon QuickSight to run complex SQL queries.
Rather than using a dataset to visualize outputs, QuickSight, when configured correctly, can directly send these SQL queries to Athena. The results of the query can then be directly visualized interactively using multiple chart types and organized into a dashboard. These dashboards can then be shared with business analysts for further research.
In this section, we have covered various patterns around the topic of compute on AWS. Although this is not an exhaustive list of patterns, this should give you a basic idea of the components or services used and how these components are connected to each other to achieve different requirements. Next, we will describe some best practices related to HPC on AWS.
The AWS Well-Architected Framework helps with the architecting of secure, cost-effective, resilient, and high-performing applications and workloads on the cloud. It is the go-to reference when building any application. Details about the AWS Well-Architected Framework can be obtained at https://aws.amazon.com/architecture/well-architected/. However, applications in certain domains and verticals require further scrutiny and have details that need to be handled differently from the generic guidance that the AWS Well-Architected Framework provides. Thus, we have many other documents called lenses that provide best practice guidance; some of these lenses that are relevant to our current discussion are listed as follows:
While it is out of the scope of this book to go over best practices from the generic AWS Well-Architected Framework, as well as these individual lenses, we will list some common, important design considerations that are relevant to our current topic of HPC and ML:
On AWS, compute clusters can be right-sized at any given point in time, and the use of managed services can help with provisioning resources on the fly. For example, Amazon SageMaker allows users to provision various instance types for training without the undifferentiated heavy lifting of maintaining clusters or infrastructure. Customers only need to choose the framework of interest, point to training data in Amazon S3, and use the APIs to start, monitor, and stop training jobs. Customers only pay for what they use and don’t pay for any idle time.
On SageMaker, using Spot instances is very simple – you just need to pass an argument to supported training APIs. On the other hand, for high-performance workloads, it is important to prioritize on-demand instances over spot instances so that the results of simulations or ML training jobs can be returned and analyzed in a timely manner. When choosing services or applications to use for your HPC or ML workloads, prefer pay-as-you-go pricing over licensing and upfront costs.
Similarly in HPC, the choice of software that does molecular dynamics simulations will decide the scale of simulations that can be done, which services are compatible with the package on AWS, and which team members are trained and ready to make use of this software set up on AWS.
In this section, we have listed some best practices for HPC workloads on AWS.
In this chapter, we first described the AWS Compute ecosystem, including the various types of EC2 instances, as well as container-based services (Fargate, ECS, and EKS), and serverless compute options (AWS Lambda). We then introduced networking concepts on AWS and applied them to typical workloads using a visual walk-through. To help guide you through selecting the right compute for HPC workloads, we described several typical patterns including standalone, self-managed instances, AWS ParallelCluster, AWS Batch, hybrid architectures, container-based architectures, and completely serverless architectures for HPC. Lastly, we discussed various best practices that may further help you right-size your instances and clusters and apply the Well-Architected Framework to your workloads.
In the next chapter, we will outline the various storage services that can be used on AWS for HPC and ML workloads.
For additional information on the topics covered in this chapter, please navigate to the following pages: