Chapter 7. Manage Your Applications with Auto Scaling and Elastic Load Balancing

In the previous chapter, you learnt a lot about monitoring our AWS infrastructure, especially the EC2 instances using Amazon CloudWatch. We also created our very first alarms using CloudWatch and monitored our instance's CPU, memory, and disk utilization and performance using the same.

In this chapter, we are going continue where we last dropped off and introduce an amazing and awesome concept called Auto Scaling! AWS has been one of the first public cloud providers to provide this feature and really it is something that you must try out and use in your environments! This chapter will teach you the basics of Auto Scaling, its concepts and terminologies, and even how to create an auto scaled environment using AWS. It will also cover Amazon Elastic Load Balancers and how you can use them in conjunction with Auto Scaling to manage your applications more effectively! So without wasting any more time, let's first get started by understanding what Auto Scaling is and how it actually works!

An overview of Auto Scaling

We have been talking about AWS and the concept of dynamic scalability, also known as Elasticity in general throughout this book; well now is the best time to look at it in depth with the help of Auto Scaling!

Auto Scaling basically enables you to scale your compute capacity (EC2 instances) either up or down, depending on the conditions you specify. These conditions could be as simple as a number that maintains the count of your EC2 instances at any given time, or even complex conditions that measure the load and performance of your instances such as CPU utilization, memory utilization, and so on. But a simple question that may arise here is why do I even need Auto Scaling? Is it really that important? Let's look at a dummy application's load and performance graph to get a better understanding of things; let's take a look at the following screenshot:

An overview of Auto Scaling

The graph to the left depicts the traditional approach that is usually taken to map an application's performance requirements with a fixed infrastructure capacity. Now, to meet this application's unpredictable performance requirements, you would have to plan and procure additional hardware upfront, as depicted by the red line. And since there is no guaranteed way to plan for unpredictable workloads, you generally end up procuring more than you need. This is a standard approach employed by many businesses and it doesn't come without its own set of problems. For example, the region highlighted in red is when most of the procured hardware capacity is idle and wasted as the application simply does not have that high a requirement. Whereas there can be cases as well where the procured hardware simply did not match the application's high performance requirements, as shown by the green region. All these issues, in turn, have an impact on your business, which frankly can prove to be quite expensive. That's where the elasticity of a cloud comes into play. Rather than procuring at the nth hour and ending up with wasted resources, you grow and shrink your resources dynamically as per your application's requirements, as depicted in the graph on the right. This not only helps you in saving overall costs but also makes your application's management a lot more easy and efficient. And don't worry if your application does not have an unpredictable load pattern! Auto Scaling is designed to work with both predictable and unpredictable workloads so that no matter what application you may have, you can rest assured that the required compute capacity is always going to be made available for use when required. Keeping that in mind, let us summarize some of the benefits that AWS Auto Scaling provides:

  • Cost savings: By far the biggest advantage provided by Auto Scaling, you can actually gain a lot of control over the deployment of your instances as well as costs by launching instances only when they are needed and terminating them when they aren't required.
  • Ease of use: AWS provides a variety of tools using which you can create and manage your Auto Scaling, such as the AWS CLI and even the EC2 Management Dashboard. Auto Scaling can be programmatically created and managed via a simple and easy to use web service API as well.
  • Scheduled scaling actions: Apart from scaling instances as per a given policy, you can additionally even schedule scaling actions that can be executed in the future. This type of scaling comes in handy when your application's workload patterns are predictable and well known in advance.
  • Geographic redundancy and scalability: AWS Auto Scaling enables you to scale, distribute, and load balance your application automatically across multiple availability zones within a given region.
  • Easier maintenance and fault tolerance: AWS Auto Scaling replaces unhealthy instances automatically based on predefined alarms and thresholds.

With these basics in mind, let us understand how Auto Scaling actually works out in AWS.

Auto scaling components

To get started with Auto Scaling on AWS, you will be required to work with three primary components, each described briefly as follows.

Auto scaling groups

An Auto Scaling group is a core component of the Auto Scaling service. It is basically a logical grouping of instances that share some common scaling characteristics between them. For example, a web application can contain a set of web server instances that can form one Auto Scaling group and another set of application server instances that become a part of another Auto Scaling group and so on. Each group has its own set of criteria specified that includes the minimum and maximum number of instances that the group should have, along with the desired number of instances that the group must have at all times.

Note

The desired number of instances is an optional field in an Auto Scaling group. If the desired capacity value is not specified, then the Auto Scaling Group will consider the minimum number of instance values as the desired value instead.

Auto Scaling Groups are also responsible for performing periodic health checks on the instances contained within them. An instance with degraded health is then immediately swapped out and replaced by a new one by the Auto Scaling Group, thus ensuring that each of the instances within the Group works at optimum levels.

Launch configurations

A launch configuration is a set of blueprint statements that the Auto Scaling Group uses to launch instances. You can create a single launch configuration and use it with multiple Auto Scaling Groups; however, you can only associate one Launch Configuration with a single Auto Scaling Group at a time. What does a Launch Configuration contain? Well to start off with, it contains the AMI ID using which Auto Scaling launches the instances in the Auto Scaling Group. It also contains additional information about your instances such as instance type, the security group it has to be associated with, block device mappings, key pairs, and so on. An important thing to note here is that once you create a Launch Configuration, there is no way you can edit it again. The only way to make changes to a Launch Configuration is by creating a new one in its place and associating that with the Auto Scaling Group.

Scaling plans

With your Launch Configuration created, the final step left is to create one or more scaling plans. Scaling Plans describe how the Auto Scaling Group should actually scale. There are three scaling mechanisms you can use with your Auto Scaling Groups, each described as follows:

  • Manual scaling: Manual scaling by far is the simplest way of scaling your resources. All you need to do here is specify a new desired number of instances value or change the minimum or maximum number of instances in an Auto Scaling Group and the rest is taken care of by the Auto Scaling service itself.
  • Scheduled scaling: Scheduled scaling is really helpful when it comes to scaling resources based on a particular time and date. This method of scaling is useful when the application's load patterns are highly predictable, and thus you know exactly when to scale up or down. For example, an application that process a company's payroll cycle is usually load intensive during the end of each month, so you can schedule the scaling requirements accordingly.
  • Dynamic scaling: Dynamic scaling, or scaling on demand is used when the predictability of your application's performance is unknown. With dynamic scaling, you generally provide a set of scaling policies using some criteria; for example, scaling the instances in my Auto Scaling Group by 10 when the average CPU utilization exceeds 75 percent for a period of 5 minutes. Sounds familiar, right? Well that's because these dynamic scaling policies rely on Amazon CloudWatch to trigger scaling events. CloudWatch monitors the policy conditions and triggers the auto scaling events when certain thresholds are breached. In either case, you will require a minimum of two such scaling policies: one for scaling in (terminating instances) and one for scaling out (launching instances).

Before we go ahead and create our first Auto Scaling activity, we need to understand one additional AWS service that will help us balance and distribute the incoming traffic across our auto scaled EC2 instances. Enter the Elastic Load Balancer!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset