In the previous chapter, you learnt a lot about monitoring our AWS infrastructure, especially the EC2 instances using Amazon CloudWatch. We also created our very first alarms using CloudWatch and monitored our instance's CPU, memory, and disk utilization and performance using the same.
In this chapter, we are going continue where we last dropped off and introduce an amazing and awesome concept called Auto Scaling! AWS has been one of the first public cloud providers to provide this feature and really it is something that you must try out and use in your environments! This chapter will teach you the basics of Auto Scaling, its concepts and terminologies, and even how to create an auto scaled environment using AWS. It will also cover Amazon Elastic Load Balancers and how you can use them in conjunction with Auto Scaling to manage your applications more effectively! So without wasting any more time, let's first get started by understanding what Auto Scaling is and how it actually works!
We have been talking about AWS and the concept of dynamic scalability, also known as Elasticity in general throughout this book; well now is the best time to look at it in depth with the help of Auto Scaling!
Auto Scaling basically enables you to scale your compute capacity (EC2 instances) either up or down, depending on the conditions you specify. These conditions could be as simple as a number that maintains the count of your EC2 instances at any given time, or even complex conditions that measure the load and performance of your instances such as CPU utilization, memory utilization, and so on. But a simple question that may arise here is why do I even need Auto Scaling? Is it really that important? Let's look at a dummy application's load and performance graph to get a better understanding of things; let's take a look at the following screenshot:
The graph to the left depicts the traditional approach that is usually taken to map an application's performance requirements with a fixed infrastructure capacity. Now, to meet this application's unpredictable performance requirements, you would have to plan and procure additional hardware upfront, as depicted by the red line. And since there is no guaranteed way to plan for unpredictable workloads, you generally end up procuring more than you need. This is a standard approach employed by many businesses and it doesn't come without its own set of problems. For example, the region highlighted in red is when most of the procured hardware capacity is idle and wasted as the application simply does not have that high a requirement. Whereas there can be cases as well where the procured hardware simply did not match the application's high performance requirements, as shown by the green region. All these issues, in turn, have an impact on your business, which frankly can prove to be quite expensive. That's where the elasticity of a cloud comes into play. Rather than procuring at the nth hour and ending up with wasted resources, you grow and shrink your resources dynamically as per your application's requirements, as depicted in the graph on the right. This not only helps you in saving overall costs but also makes your application's management a lot more easy and efficient. And don't worry if your application does not have an unpredictable load pattern! Auto Scaling is designed to work with both predictable and unpredictable workloads so that no matter what application you may have, you can rest assured that the required compute capacity is always going to be made available for use when required. Keeping that in mind, let us summarize some of the benefits that AWS Auto Scaling provides:
With these basics in mind, let us understand how Auto Scaling actually works out in AWS.
To get started with Auto Scaling on AWS, you will be required to work with three primary components, each described briefly as follows.
An Auto Scaling group is a core component of the Auto Scaling service. It is basically a logical grouping of instances that share some common scaling characteristics between them. For example, a web application can contain a set of web server instances that can form one Auto Scaling group and another set of application server instances that become a part of another Auto Scaling group and so on. Each group has its own set of criteria specified that includes the minimum and maximum number of instances that the group should have, along with the desired number of instances that the group must have at all times.
Auto Scaling Groups are also responsible for performing periodic health checks on the instances contained within them. An instance with degraded health is then immediately swapped out and replaced by a new one by the Auto Scaling Group, thus ensuring that each of the instances within the Group works at optimum levels.
A launch configuration is a set of blueprint statements that the Auto Scaling Group uses to launch instances. You can create a single launch configuration and use it with multiple Auto Scaling Groups; however, you can only associate one Launch Configuration with a single Auto Scaling Group at a time. What does a Launch Configuration contain? Well to start off with, it contains the AMI ID using which Auto Scaling launches the instances in the Auto Scaling Group. It also contains additional information about your instances such as instance type, the security group it has to be associated with, block device mappings, key pairs, and so on. An important thing to note here is that once you create a Launch Configuration, there is no way you can edit it again. The only way to make changes to a Launch Configuration is by creating a new one in its place and associating that with the Auto Scaling Group.
With your Launch Configuration created, the final step left is to create one or more scaling plans. Scaling Plans describe how the Auto Scaling Group should actually scale. There are three scaling mechanisms you can use with your Auto Scaling Groups, each described as follows:
Before we go ahead and create our first Auto Scaling activity, we need to understand one additional AWS service that will help us balance and distribute the incoming traffic across our auto scaled EC2 instances. Enter the Elastic Load Balancer!