Chapter 14. Scaling up and down: auto-scaling and CloudWatch

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 14. Scaling up and down: auto-scaling and CloudWatch

This chapter covers

Creating an auto-scaling group with launch configuration
Using auto-scaling to adapt the number of virtual servers
Scaling a synchronous decoupled app behind an ELB
Scaling an asynchronous decoupled app using SQS
Using CloudWatch alarms to modify an auto-scaling group

Suppose you’re organizing a party to celebrate your birthday. How much food and drink do you need to buy? Calculating the right numbers for your shopping list is difficult:

How many people will actually attend? You received several confirmations, but some guests will need to cancel at short notice or show up without letting you know in advance, so the number of guests is vague.
How much will your guests eat and drink? Will it be a hot day, with everybody drinking a lot? Will your guests be hungry? You need to guess the demand for food and drink based on experiences from previous parties.

Solving the equation is a challenge because there are many unknown factors. Behaving as a good host, you’ll order more food and drink than needed to have a solid buffer, and no guest will be hungry or thirsty for long.

Planning to meet future demands is nearly impossible. To prevent a supply gap, you need to add extra capacity on top of the planned demand to prevent running short of resources.

The same was true when we planned the capacity of our IT infrastructure. When procuring hardware for a data center, we always had to buy hardware based on the demands of the future. There were many uncertainties when making these decisions:

How many users would need to be served by the infrastructure?
How much storage would the users need?
How much computing power would be required to handle their requests?

To avoid supply gaps, we had to order more or faster hardware than needed, causing unnecessary expenses.

On AWS, you can use services on demand. Planning capacity is less and less important. Scaling from one server to thousands of servers is possible. Storage can grow from gigabytes to petabytes. You can scale on demand, thus replacing capacity planning. The ability to scale on demand is called elasticity by AWS.

Public cloud providers like AWS can offer needed capacity with a short waiting time. AWS is serving a million customers, and at that scale, it isn’t a problem to provide you with 100 additional virtual servers within minutes if you need them suddenly. This allows you to address another problem: typical traffic patterns, as shown in figure 14.1. Think about the load on your infrastructure during the day versus at night, on a weekday versus the weekend, or before Christmas versus the rest of year. Wouldn’t it be nice if you could add capacity when traffic grows and remove capacity when traffic shrinks? In this chapter, you’ll learn how to scale the number of virtual servers based on current load.

Figure 14.1. Typical traffic patterns for a web shop

Scaling the number of virtual servers is possible with auto-scaling groups and scaling policies on AWS. Auto-scaling is part of the EC2 service and helps you to scale the number of EC2 instances needed to fulfill the current load of your system. We introduced auto-scaling groups in chapter 11 to ensure that a single virtual server was running even if an outage of an entire data center occurred. In this chapter, you’ll learn how to use a dynamic server pool:

Using auto-scaling groups to launch multiple virtual servers of the same kind
Changing the number of virtual servers based on CPU load with the help of CloudWatch
Changing the number of virtual servers based on a schedule, to be able to adapt to recurring traffic patterns
Using a load balancer as an entry point to the dynamic server pool
Using a queue to decouple the jobs from the dynamic server pool

Examples are 100% covered by the Free Tier

The examples in this chapter are completely covered by the Free Tier. As long as you don’t run the examples for longer than a few days, you won’t pay anything. Keep in mind that this only applies if you created a fresh AWS account for this book and nothing else is going on in your AWS account. Try to complete the examples of the chapter within a few days; you’ll clean up your account at the end of each example.

There are two prerequisites for being able to scale your application horizontally, which means increasing and decreasing the number of virtual servers based on the current workload:

The servers you want to scale need to be stateless. You can achieve stateless servers by storing data with the help of a service like RDS (SQL database), DynamoDB (NoSQL database), or S3 (object store) instead of storing data on local or network-attached disks that are only available to a single server.
An entry point to the dynamic server pool is needed to be able to distribute the workload across multiple servers. Servers can be decoupled synchronously with a load balancer or asynchronously with a queue.

We introduced the concept of the stateless servers in part 3 of this book and explained how to use decoupling in chapter 12. You’ll return to the concept of the stateless server and also work through an example of synchronous and asynchronous decoupling in this chapter.

14.1. Managing a dynamic server pool

Imagine that you need to provide a scalable infrastructure to run a web application, such as a blogging platform. You need to launch uniform virtual servers when the number of requests grows and terminate virtual servers when the number of requests shrinks. To adapt to the current workload in an automated way, you need to be able to launch and terminate virtual servers automatically. The configuration and deployment of the blogging platform needs to be done during bootstrapping, without human interaction.

AWS offers a service to manage such a dynamic server pool, called auto-scaling groups. Auto-scaling groups help you to

Run a desired number of virtual servers that can be adjusted dynamically
Launch, configure, and deploy uniform virtual servers

As figure 14.2 shows, auto-scaling consists of three parts:

Figure 14.2. Auto-scaling consists of an auto-scaling group and a launch configuration, launching and terminating uniform virtual servers.

A launch configuration that defines the size, image, and configuration of virtual servers
An auto-scaling group that specifies how many virtual servers need to be running based on the launch configuration
Scaling policies that adjust the desired number of servers in the auto-scaling group

Because the auto-scaling group references a launch configuration, you need to create a launch configuration before you can create an auto-scaling group. If you use a template, as you will in this chapter, this dependency will be resolved by CloudFormation automatically.

If you want multiple servers to handle a workload, it’s important to start identical virtual servers to build a homogeneous foundation. You use a launch configuration to define and configure new virtual servers. Table 14.1 shows the most important parameters for a launch configuration.

Table 14.1. Launch configuration parameters

Name	Description	Possible values
ImageId	Image from which to start a virtual server	ID of Amazon Machine Image (AMI)
InstanceType	Size for new virtual servers	Instance type (such as t2.micro)
UserData	User data for the virtual server used to execute a script during bootstrapping	BASE64-encoded String
KeyName	Name of the SSH key pair	Name of an EC2 key pair
AssociatePublicIpAddress	Associates a public IP address with the virtual server	True or false
SecurityGroups	Attaches security groups to new virtual servers	List of security group names
IamInstanceProfile	Attaches an IAM instance profile linked to an IAM role	Name or Amazon Resource Name (ARN, an ID) of an IAM instance profile
SpotPrice	Uses a spot instance instead of an on-demand instance with the maximum price	Maximum price for the spot instance per hour (such as 0.10)
EbsOptimized	Enables EBS optimization for the EC2 instance offering a dedicated throughput to EBS root volumes with the IOPS defined in the image (AMI)	True or false

After you create a launch configuration, you can create an auto-scaling group referencing it. The auto-scaling group defines the maximum, minimum, and desired number of virtual servers. Desired means this number of servers should be running. If the current number of servers is below the desired number, the auto-scaling group will add servers. If the current number of servers is above the desired number, servers will be terminated.

The auto-scaling group also monitors whether EC2 instances are healthy and replaces broken instances. Table 14.2 shows the most important parameters for an auto-scaling group.

Table 14.2. Auto-scaling group parameters

Name	Description	Possible values
DesiredCapacity	Desired number of healthy virtual servers	Integer
MaxSize	Maximum number of virtual servers; scaling limit	Integer
MinSize	Minimum number of virtual servers; scaling limit	Integer
Cooldown	Minimum time span between two scaling actions	Number of seconds
HealthCheckType	How the auto-scaling group checks the health of virtual servers	EC2 (health of the instance) or ELB (health check of instance performed by a load balancer)
HealthCheckGracePeriod	Period for which the health check is paused after the launch of a new instance, to wait until the instance is fully bootstrapped	Number of seconds
LaunchConfigurationName	Name of the launch configuration used to start new virtual servers	Name of a launch configuration
LoadBalancerNames	Load balancers at which auto-scaling registers new instances automatically	List of load-balancer names
TerminationPolicies	Policies used to determine which instance is terminated first	OldestInstance, NewestInstance, OldestLaunchConfiguration, ClosestToNextInstanceHour, or Default
VPCZoneIdentifier	List of subnets in which to launch EC2 instances	List of subnet identifiers of a VPC

If you specify multiple subnets with the help of VPCZoneIdentifier for the auto-scaling group, EC2 instances will be evenly distributed among these subnets and thus among availability zones.

Avoid unnecessary scaling with a cooldown and a grace period

Be sure to define reasonable Cooldown and HealthCheckGracePeriod values. The tendency is to specify short Cooldown and HealthCheckGracePeriod periods. But if your Cooldown period is too short, you’ll scale up and down too early. If your HealthCheckGracePeriod is too short, the auto-scaling group will launch a new instance because the previous instance isn’t bootstrapped quickly enough. Both will launch unnecessary instances and cause unnecessary expense.

You can’t edit a launch configuration. If you need to make changes to a launch configuration, follow these steps:

1. Create a new launch configuration.

2. Edit the auto-scaling group, and reference the new launch configuration.

3. Delete the old launch configuration.

Fortunately, CloudFormation does this for you when you make changes to a launch configuration in a template. The following listing shows how to set up such a dynamic server pool with the help of a CloudFormation template.

Listing 14.1. Auto-scaling for a web app with multiple EC2 instances

Auto-scaling groups are a useful tool if you need to start multiple virtual servers of the same kind across multiple availability zones.

14.2. Using metrics and schedules to trigger scaling

So far in this chapter, you’ve learned how to use an auto-scaling group and a launch configuration to launch virtual servers. You can change the desired capacity of the auto-scaling group manually, and new instances will be started or old instances will be terminated to reach the new desired capacity.

To provide a scalable infrastructure for a blogging platform, you need to increase and decrease the number of virtual servers in the dynamic server pool automatically by adjusting the desired capacity of the auto-scaling group with scaling policies.

Many people surf the web during their lunch break, so you might need to add virtual servers every day between 11:00 AM and 1:00 PM. You also need to adapt to unpredictable load patterns—for example, if articles hosted on your blogging platform are shared frequently through social networks.

Figure 14.3 illustrates two different ways of changing the number of virtual servers:

Figure 14.3. Triggering auto-scaling based on CloudWatch alarms or schedules

Using a CloudWatch alarm to increase or decrease the number of virtual servers based on a metric (such as CPU usage or number of requests on the load balancer)
Defining a schedule to increase or decrease the number of virtual servers according to recurring load patterns (such as decreasing the number of virtual servers at night)

Scaling based on a schedule is less complex than scaling based on a CloudWatch metric because it’s difficult to find a metric to scale on reliably. On the other hand, scaling based on a schedule is less precise.

14.2.1. Scaling based on a schedule

When operating a blogging platform, you might notice recurring load patterns:

Many people seem to read articles during their lunch break, between 11:00 AM and 1:00 PM.
Requests to your registration page increase heavily after you run a TV advertisement in the evening.

You can react to patterns in the utilization of your system with different types of scheduled scaling actions:

One-time-only actions, creating using the starttime parameter
Recurring actions, created using the recurrence parameter

You can create both types of scheduled scaling actions with the help of the CLI. The command shown in the next listing created a scheduled scaling action that sets the desired capacity of the auto-scaling group called webapp to 4 on January 1, 2016 at 12:00 (UTC). Don’t try to run this command now—you haven’t created the auto-scaling group webapp to play with.

Listing 14.2. Scheduling a one-time scaling action

You can also schedule recurring scaling actions using cron syntax. The next listing sets the desired capacity of an auto-scaling group to 2 every day at 20:00 UTC. Don’t try to run this command now—you haven’t created the auto-scaling group webapp to play with.

Listing 14.3. Scheduling a recurring scaling action that runs at 20:00 o’clock UTC every day

Recurrence is defined in Unix cron syntax format as shown here:

* * * * *
| | | | |
| | | | +- day of week (0 - 6) (0 Sunday)
| | | +--- month (1 - 12)
| | +----- day of month (1 - 31)
| +------- hour (0 - 23)
+--------- min (0 - 59)

You could add another scheduled recurring scaling action to add capacity in the morning that you removed during the night. Use scheduled scaling actions whenever the load on your infrastructure is predictable. For example, internal systems may be mostly needed during work hours, or marketing actions may go live at a certain time.

14.2.2. Scaling based on CloudWatch metrics

Predicting the future is a hard task. Traffic will increase or decrease beyond known patterns from time to time. For example, if an article published on your blogging platform is heavily shared through social media, you need to be able to react to unplanned load changes and scale the number of servers.

You can adapt the number of EC2 instances to handle the current workload with the help of CloudWatch and scaling policies. CloudWatch helps monitor virtual servers and other services on AWS. Typically, a service publishes usage metrics to CloudWatch, helping you to evaluate the available capacity. To trigger scaling based on the current workload, you use metrics, alarms, and scaling policies. Figure 14.4 illustrates.

Figure 14.4. Triggering auto-scaling based on a CloudWatch metric and alarm

An EC2 instance publishes several metrics to CloudWatch by default: CPU, network, and disk utilization are the most important. Unfortunately, there is currently no metric for a virtual server’s memory usage. You can use these metrics to scale the number of virtual servers if a bottleneck is reached. For example, you can add servers if the CPU is working to capacity.

The following parameters describe a CloudWatch metric:

Namespace —Defines the source of the metric (such as AWS/EC2)
Dimensions —Defines the scope of the metric (such as all virtual servers belonging to an auto-scaling group)
MetricName —Unique name of the metric (such as CPUUtilization)

A CloudWatch alarm is based on a CloudWatch metric. Table 14.3 explains the alarm parameters in detail.

Table 14.3. Parameters for a CloudWatch alarm that triggers scaling based on CPU utilization of all virtual servers belonging to an auto-scaling group

Context	Name	Description	Possible values
Condition	Statistic	Statistical function applied to a metric	Average, Sum, Minimum, Maximum, SampleCount
Condition	Period	Defines a time-based slice of values from a metric	Seconds (multiple of 60)
Condition	EvaluationPeriods	Number of periods to evaluate when checking for an alarm	Integer
Condition	Threshold	Threshold for an alarm	Number
Condition	ComparisonOperator	Operator to compare the threshold against the result from a statistical function	GreaterThanOrEqualToThreshold, GreaterThanThreshold, LessThanThreshold, LessThanOrEqualToThreshold
Metric	Namespace	Source of the metric	AWS/EC2 for metrics from the EC2 service
Metric	Dimensions	Scope of the metric	Depends on the metric, references the auto-scaling group for an aggregated metric over all associated servers
Metric	MetricName	Name of the metric	For example, CPUUtilization
Action	AlarmActions	Actions to trigger if the threshold is reached	Reference to the scaling policy

The following listing creates an alarm that increases the number of virtual servers with the help of auto-scaling if the average CPU utilization of all virtual servers belonging to the auto-scaling group exceeds 80%.

Listing 14.4. CloudWatch alarm based on CPU load of an auto-scaling group

If the threshold is reached, the CloudWatch alarm triggers an action. To connect the alarm with the auto-scaling group, you need a scaling policy. A scaling policy defines the scaling action executed by the CloudWatch alarm.

Listing 14.5 creates a scaling policy with CloudFormation. The scaling policy is linked to an auto-scaling group. There are three different options to adjust the desired capacity of an auto-scaling group:

ChangeInCapacity —Increases or decreases the number of servers by an absolute number
PercentChangeInCapacity —Increases or decreases the number of servers by a percentage
ExactCapacity —Sets the desired capacity to a specified number

Listing 14.5. Scaling policy that will add one server when triggered

You can define alarms on many different metrics. You’ll find an overview of all namespaces, dimensions, and metrics that AWS offers at http://mng.bz/8E0X. You can also publish custom metrics—for example, metrics directly from your application like thread pool usage, processing times, or user sessions.

Scaling based on CPU load with virtual servers offering burstable performance

Some virtual servers, such as instance family t2, offer burstable performance. These virtual servers offer a baseline CPU performance and can burst performance for a short time based on credits. If all credits are spent, the instance operates at the baseline. For a t2.micro instance, baseline performance is 10% of the performance of the underlying physical CPU.

Using virtual servers with burstable performance can help you react to load spikes. You save credits in times of low load and spend credits to burst performance in times of high load. But scaling the number of virtual servers with burstable performance based on CPU load is tricky because your scaling strategy must take into account whether your instances have enough credits to burst performance. Consider searching for another metric to scale (such as number of sessions) or using an instance type without burstable performance.

Many times it’s a good idea to scale up faster than you scale down. Consider adding two servers instead of one every 5 minutes but only scaling down one server every 10 minutes. Also, test your scaling policies by simulating real-world traffic. For example, replay an access log as fast as your servers can handle the requests. But keep in mind that servers need some time to start; don’t expect that auto-scaling can double your capacity within a few seconds.

You’ve learned how to use auto-scaling to adapt the number of virtual servers to the workload. Time to bring this into action.

14.3. Decoupling your dynamic server pool

If you need to scale the number of virtual servers running your blogging platform based on demand, auto-scaling groups can help you provide the needed number of uniform virtual servers, and a scaling schedule or CloudWatch alarms will increase or decrease the desired number of servers automatically. But how can users reach the servers in the dynamic server pool to browse the hosted articles? Where should the HTTP request be routed?

Chapter 12 introduced the concept of decoupling: synchronous decoupling with the help of ELB, and asynchronous decoupling with the help of SQS. Decoupling allows you to route requests or messages to one or multiple servers. Sending requests to a single server is no longer possible in a dynamic server pool. If you want to use auto-scaling to grow and shrink the number of virtual servers, you need to decouple your server because the interface that’s reachable from outside the system needs to stay the same no matter how many servers are working behind the load balancer or message queue. Figure 14.5 shows how to build a scalable system based on synchronous or asynchronous decoupling.

Figure 14.5. Decoupling allows you to scale the number of virtual servers dynamically.

A decoupled and scalable application requires stateless servers. A stateless server stores any shared data remotely in a database or storage system. The following two examples implement the concept of a stateless server:

WordPress blog —Decoupled with ELB, scaled with auto-scaling and CloudWatch based on CPU utilization, and data outsourced to RDS and S3
URL2PNG taking screenshots of URLs —Decoupled with SQS (queue), scaled with auto-scaling and CloudWatch based on queue length, data outsourced to DynamoDB and S3

14.3.1. Scaling a dynamic server pool synchronously decoupled by a load balancer

Answering HTTP(S) requests is a synchronous task. If a user wants to use your web application, the web server has to answer the corresponding requests immediately. When using a dynamic server pool to run a web application, it’s common to use a load balancer to decouple the servers from user requests. A load balancer forwards HTTP(S) requests to multiple servers, acting as a single entry point to the dynamic server pool.

Suppose your company is using a corporate blog to publish announcements and interact with the community. You’re responsible for the hosting of the blog. The marketing department complains about page speed in the evening, when traffic reaches its daily peak. You want to use the elasticity of AWS by scaling the number of servers based on the current workload.

Your company uses the popular blogging platform WordPress for its corporate blog. Chapters 2 and 9 introduced a WordPress setup based on EC2 instances and RDS (MySQL database). In this last chapter of the book, we’ll complete the example by adding the ability to scale.

Figure 14.6 shows the final, extended WordPress example. The following services are used for this highly available scaling architecture:

Figure 14.6. WordPress running on multiple virtual servers, storing data on RDS but media files on the disks of virtual servers

EC2 instances running Apache to serve WordPress, a PHP application
RDS offering a MySQL database that’s highly available through Multi-AZ deployment
S3 to store media files such as images and videos, integrated with a WordPress plug-in
ELB to synchronously decouple the web servers from visitors
Auto-scaling and CloudWatch to scale the number of web servers based on the current CPU load of all running virtual servers

So far, the WordPress example can’t scale based on current load and contains a pitfall: WordPress stores uploaded media files in the local file system as shown in figure 14.6. As a result, the server isn’t stateless. If you upload an image for a blog post, it’s only available on a single server.

This is a problem if you want to run multiple servers to handle the load. Other servers won’t be able to service the uploaded image and will deliver a 404 (not found) error. To fix that, you’ll install a WordPress plug-in called amazon-s3-and-cloudfront that stores and delivers media files with the help of S3. You’re outsourcing the state of the server as you did with the MySQL database running on RDS. Figure 14.7 shows the improved version of the WordPress setup.

Figure 14.7. Auto-scaling web servers running WordPress, storing data on RDS and S3, decoupled, with a load balancer scaling based on load

As usual, you’ll find the code in the book’s code repository on GitHub: https://github.com/AWSinAction/code. The CloudFormation template for the WordPress example is located in /chapter14/wordpress.json.

Execute the following command to create a CloudFormation stack that spins up the scalable WordPress setup. Replace $BlogID with a unique ID for your blog (such as awsinaction-andreas), $AdminPassword with a random password, and $AdminEMail with your e-mail address:

$ aws cloudformation create-stack --stack-name wordpress 
--template-url https://s3.amazonaws.com/
awsinaction/chapter14/wordpress.json 
--parameters ParameterKey=BlogID,ParameterValue=$BlogID 
ParameterKey=AdminPassword,ParameterValue=$AdminPassword 
ParameterKey=AdminEMail,ParameterValue=$AdminEMail 
--capabilities CAPABILITY_IAM

It will take up to 10 minutes for the stack to be created. This is a perfect time to grab some coffee or tea. Log in to the AWS Management Console, and navigate to the AWS CloudFormation service to monitor the process of the CloudFormation stack named wordpress. You have time to look through the most important parts of the CloudFormation template, shown in the following two listings.

Listing 14.6. Scalable and highly available WordPress setup (part 1 of 2)

The scaling policies and CloudWatch alarms follow in the next listing.

Listing 14.7. Scalable and highly available WordPress setup (part 2 of 2)

Follow these steps after the CloudFormation stack reaches the state CREATE_COMPLETE to create a new blog post containing an image:

1. Select the CloudFormation stack wordpress and switch to the Outputs tab.

2. Open the link shown for key URL with a modern web browser.

3. Search for the Log In link in the navigation bar and click it.

4. Log in with username admin and the password you specified when creating the stack with the CLI.

5. Click Posts in the menu at left.

6. Click Add New.

7. Type in a title and text, and upload an image to your post.

8. Click Publish.

9. Move back to the blog by entering the URL from step 1 again.

Now you’re ready to scale. We’ve prepared a load test that will send 10,000 requests to the WordPress setup in a short amount of time. New virtual servers will be launched to handle the load. After a few minutes, when the load test is finished, the additional virtual servers will disappear. Watching this is fun; you shouldn’t miss it.

Note

If you plan to do a big load test, consider the AWS Acceptable Use Policy at https://aws.amazon.com/aup and ask for permission before you begin (see also https://aws.amazon.com/security/penetration-testing).

Simple HTTP load test

We’re using a tool called Apache Bench to perform a load test of the WordPress setup. The tool is part of the httpd-tools package available from the Amazon Linux package repositories.

Apache Bench is a basic benchmarking tool. You can send a specified number of HTTP requests by using a specified number of threads. We’re using the following command for the load test, to send 10,000 requests to the load balancer using two threads. $UrlLoadBalancer is replaced by the URL of the load balancer:

$ ab -n 10000 -c 2 $UrlLoadBalancer

Update the CloudFormation stack with the following command to start the load test:

$ aws cloudformation update-stack --stack-name wordpress 
--template-url https://s3.amazonaws.com/
awsinaction/chapter14/wordpress-loadtest.json 
--parameters ParameterKey=BlogID,UsePreviousValue=true  
ParameterKey=AdminPassword,UsePreviousValue=true 
ParameterKey=AdminEMail,UsePreviousValue=true 
--capabilities CAPABILITY_IAM

Watch for the following things to happen, with the help of the AWS Management Console:

1. Open the CloudWatch service, and click Alarms at left.

2. When the load test starts, the alarm called wordpress-CPUHighAlarm-* will reach the ALARM state after a few minutes.

3. Open the EC2 service and list all EC2 instances. Watch for two additional instances to launch. At the end, you’ll see five instances total (four web servers and the server running the load test).

4. Go back to the CloudWatch service and wait until the alarm named wordpressCPULowlarm-* reaches the ALARM state.

5. Open the EC2 service and list all EC2 instances. Watch for the two additional instances to disappear. At the end, you’ll see three instances total (two web servers and the server running the load test).

The entire process will take about 20 minutes.

You’ve watched auto-scaling in action: your WordPress setup can now adapt to the current workload. The problem with pages loading slowly in the evening is solved.

Cleaning up

Execute the following commands to delete all resources corresponding to the WordPress setup, remembering to replace $BlogID:

$ aws s3 rb s3://$BlogID --force
$ aws cloudformation delete-stack --stack-name wordpress

14.3.2. Scaling a dynamic server pool asynchronously decoupled by a queue

Decoupling a dynamic server pool in an asynchronous way offers an advantage if you want to scale based on your workload: because requests don’t need to be answered immediately, you can put requests into a queue and scale the number of servers based on the length of the queue. This gives you a very accurate metric to scale, and no requests will be lost during a load peak because they’re stored in a queue.

Imagine that you’re developing a social bookmark service where users can save and share their bookmarks. Offering a preview that shows the website behind a link is an important feature. But the conversion from URL to PNG is slow during the evening when most users add new bookmarks to your service. Customers are dissatisfied that previews don’t show up immediately.

To handle the peak load in the evening, you want to use auto-scaling. To do so, you need to decouple the creation of a new bookmark and the process of generating a preview of the website. Chapter 12 introduced an application called URL2PNG that transforms a URL into a PNG image. Figure 14.8 shows the architecture, which consists of an SQS queue for asynchronously decoupling and S3 to store generated images. Creating a bookmark will trigger the following process:

Figure 14.8. Auto-scaling virtual servers that convert URLs into images, decoupled by an SQS queue

1. A message is sent to an SQS queue containing the URL and the unique ID of the new bookmark.

2. EC2 instances running a Node.js application poll the SQS queue.

3. The Node.js application loads the URL and creates a screenshot.

4. The screenshot is uploaded to an S3 bucket, and the object key is set to the unique ID.

5. Users can download the screenshot of the website directly from S3 with the help of the unique ID.

A CloudWatch alarm is used to monitor the length of the SQS queue. If the length of the queue reaches the limit of five, a new virtual server is started to handle the workload. If the queue length is less than five, another CloudWatch alarm decreases the desired capacity of the auto-scaling group.

The code is in the book’s code repository on GitHub at https://github.com/AWSinAction/code. The CloudFormation template for the URL2PNG example is located in /chapter14/url2png.json.

Execute the following command to create a CloudFormation stack that spins up the URL2PNG application. Replace $ApplicationID with a unique ID for your application (such as url2png-andreas):

$ aws cloudformation create-stack --stack-name url2png 
--template-url https://s3.amazonaws.com/
awsinaction/chapter14/url2png.json 
--parameters ParameterKey=ApplicationID,ParameterValue=$ApplicationID 
--capabilities CAPABILITY_IAM

It will take up to five minutes for the stack to be created. Log in to the AWS Management Console, and navigate to the AWS CloudFormation service to monitor the process of the CloudFormation stack named url2png.

The CloudFormation template is similar to the template you used to create the synchronously decoupled WordPress setup. The following listing shows the main difference: the CloudWatch alarm monitors the length of the SQS queue instead of CPU usage.

Listing 14.8. Monitoring the length of the SQS queue

You’re ready to scale. We’ve prepared a load test that will quickly generate 250 messages for the URL2PNG application. New virtual servers will be launched to handle the load. After a few minutes, when the load test is finished, the additional virtual servers will disappear.

Update the CloudFormation stack with the following command to start the load test:

$ aws cloudformation update-stack --stack-name url2png 
--template-url https://s3.amazonaws.com/
awsinaction/chapter14/url2png-loadtest.json 
--parameters ParameterKey=ApplicationID,UsePreviousValue=true 
--capabilities CAPABILITY_IAM

Watch for the following things to happen, with the help of the AWS Management Console:

1. Open the CloudWatch service and click Alarms at left.

2. When the load test starts, the alarm called url2png-HighQueueAlarm-* will reach the ALARM state after a few minutes.

3. Open the EC2 service and list all EC2 instances. Watch for an additional instance to launch. At the end, you’ll see three instances total (two workers and the server running the load test).

4. Go back to the CloudWatch service and wait until the alarm named url2pngLowQueueAlarm-* reaches the ALARM state.

5. Open the EC2 service and list all EC2 instances. Watch for the additional instance to disappear. At the end, you’ll see two instances total (one worker and the server running the load test).

The entire process will take about 15 minutes.

You’ve watched auto-scaling in action. The URL2PNG application can now adapt to the current workload, and the problem with slowly generated screenshots for new bookmarks is solved.

Cleaning up

Execute the following commands to delete all resources corresponding to the URL2PNG setup, remembering to replace $ApplicationID:

$ aws s3 rb s3://$ApplicationID --force
$ aws cloudformation delete-stack --stack-name url2png

14.4. Summary

You can use auto-scaling to launch multiple virtual servers the same way by using a launch configuration and an auto-scaling group.
EC2, SQS, and other services publish metrics to CloudWatch (CPU utilization, queue length, and so on).
A CloudWatch alarm can change the desired capacity of an auto-scaling group. This allows you to increase the number of virtual servers based on CPU utilization or other metrics.
Servers need to be stateless if you want to scale them according to your current workload.
Synchronous decoupling with the help of a load balancer or asynchronous decoupling with a message queue is necessary in order to distribute load among multiple virtual servers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 14. Scaling up and down: auto-scaling and CloudWatch

Create new playlist

Sign In

Sign Up

Chapter 14. Scaling up and down: auto-scaling and CloudWatch

Figure 14.1. Typical traffic patterns for a web shop

14.1. Managing a dynamic server pool

Figure 14.2. Auto-scaling consists of an auto-scaling group and a launch configuration, launching and terminating uniform virtual servers.

Table 14.1. Launch configuration parameters

Table 14.2. Auto-scaling group parameters

Listing 14.1. Auto-scaling for a web app with multiple EC2 instances

14.2. Using metrics and schedules to trigger scaling

Figure 14.3. Triggering auto-scaling based on CloudWatch alarms or schedules

14.2.1. Scaling based on a schedule

Listing 14.2. Scheduling a one-time scaling action

Listing 14.3. Scheduling a recurring scaling action that runs at 20:00 o’clock UTC every day

14.2.2. Scaling based on CloudWatch metrics

Figure 14.4. Triggering auto-scaling based on a CloudWatch metric and alarm

Table 14.3. Parameters for a CloudWatch alarm that triggers scaling based on CPU utilization of all virtual servers belonging to an auto-scaling group

Listing 14.4. CloudWatch alarm based on CPU load of an auto-scaling group

Listing 14.5. Scaling policy that will add one server when triggered

14.3. Decoupling your dynamic server pool

Figure 14.5. Decoupling allows you to scale the number of virtual servers dynamically.

14.3.1. Scaling a dynamic server pool synchronously decoupled by a load balancer

Figure 14.6. WordPress running on multiple virtual servers, storing data on RDS but media files on the disks of virtual servers

Figure 14.7. Auto-scaling web servers running WordPress, storing data on RDS and S3, decoupled, with a load balancer scaling based on load

Listing 14.6. Scalable and highly available WordPress setup (part 1 of 2)

Listing 14.7. Scalable and highly available WordPress setup (part 2 of 2)

Note

14.3.2. Scaling a dynamic server pool asynchronously decoupled by a queue

Figure 14.8. Auto-scaling virtual servers that convert URLs into images, decoupled by an SQS queue

Listing 14.8. Monitoring the length of the SQS queue

14.4. Summary

Table of Contents for
Chapter 14. Scaling up and down: auto-scaling and CloudWatch