CHAPTER SIX

Autoscaling

In Chapter 5, we talked about installing hardware in your on-premises datacenter and its deployment to production. As it is well known, use of public cloud obviates the need for this. Further, it obviates the need for maintenance of the infrastructure and thereby boosts product development agility, which is crucial in an increasingly competitive landscape. For instance, in 2008, Netflix kicked off its migration from on-premises to Amazon Web Services (AWS), and in February 2016, it announced the completion of the migration of its streaming service to the cloud.1 Migration to the cloud eliminated the cycles spent on hardware procurement and datacenter maintenance, and resulted in higher development agility.

The use of a cloud service at large scale is much more expensive compared to the use of an in-house datacenter. This calls for the development of techniques to minimize the cost overhead associated with the use of a public cloud without sacrificing its various benefits such as elasticity. Autoscaling allows automatic scaling up of capacity when it is needed and scaling back down when it is not needed. An enterprise can set a minimum required capacity across Availability Zones (AZs)—isolated locations in a given geographic region—to ensure quick accessibility. This helps save cost by providing the best availability and performance for a given cost. For cases in which a reserved capacity is purchased, the cost can be amortized by scheduling jobs of noncritical and batch services during the off-peak hours.

NOTE

This part of the discussion is based in Arun’s experience at Netflix.

The Challenge

Capitalizing on the elasticity of the cloud efficiently is nontrivial. Specifically, you must be wary of the following:

  • Aggressive scale-down can potentially adversely affect latency and throughput (in the worst case, the service might become unavailable). Higher latency would degrade the experience of the end users. Further, from a corporate standpoint, lower throughput would adversely impact the bottom line (this holds in general for any end-user facing service).

  • Aggressive scale-up can result in overprovisioning, thereby ballooning the footprint on the cloud. Of course, higher operational costs would adversely affect the bottom line.

Figure 6-1 illustrates these caveats. Additionally, efficient exploitation of elasticity of the cloud across multiple applications can contain the overall footprint.

Excess vs Unserved Demand
Figure 6-1. Excess versus unserved demand

In the long run, on-demand usage is much more expensive than the use of reserved instances.2 Consequently, it is critical to develop novel techniques to exploit elasticity of the cloud systematically.

For the remainder of this chapter, we discuss the various aspects of autoscaling using Amazon EC2 as a reference public cloud. The key underlying concepts are, however, applicable for autoscaling on any public cloud.

Autoscaling on Amazon EC2

Amazon’s Auto Scaling service lets you launch or terminate EC2 instances (up to a defined minimum and maximum, respectively) automatically based on user-defined policies, schedules, and health checks. You can use Amazon’s CloudWatch for real-time monitoring of EC2 instances. Metrics such as CPU utilization, latency, and request counts are provided automatically by CloudWatch. Further, you can use CloudWatch to access up-to-the-minute statistics, view graphs, and set alarms (defined here):

Definition 1

An Amazon CloudWatch alarm is an object that watches over a single metric. An alarm can change state depending on the value of the metric. An action is invoked when an alarm changes state and remains in that state for a number of time periods.

You can configure a CloudWatch alarm to send a message to autoscaling whenever a specific metric has reached a threshold value. When the alarm sends the message, autoscaling executes the associated policy on an ASG to scale the group up or down. Note that an Auto Scaling action is invoked when the specified metric remains above the threshold value for a number of time periods. This is to ensure that a scaling action is not triggered due to a sudden spike in the metric.

Definition 2

A policy is a set of instructions for Auto Scaling that instructs the service how to respond to CloudWatch alarm messages.

Separate policies are instituted for autoscaling up and autoscaling down. The two key parameters associated with an Auto Scaling Policy are the following:

  • ScalingAdjustment: The number of instances by which to scale. AdjustmentType determines the interpretation of this number (e.g., as an absolute number or as a percentage of the existing ASG size). A positive increment adds to the current capacity and a negative value removes from the current capacity.

  • AdjustmentType: This specifies whether the ScalingAdjustment is an absolute number or a percentage of the current capacity. Valid values are ChangeIn​Capacity or PercentChangeInCapacity (described later in the chapter).

An autoscaling action, say scale up, usually takes a while to take effect. In light of this, you can specify a cooldown period (defined momentarily) to ensure that a new autoscaling event is triggered after the completion of the previous autoscaling event.

Definition 3

Cooldown is the period of time after autoscaling initiates a scaling activity during which no other scaling activity can take place. A cooldown period allows the effect of a scaling activity to become visible in the metrics that originally triggered the activity. This period is configurable and gives the system time to perform and adjust to any new scaling activities (such as scale-in and scale-out) that affect capacity.

On AWS, autoscaling also can be carried in a temporal fashion, referred to as scheduled scaling. In particular, scaling based on a schedule allows you to scale an application in response to predictable load changes. For instance, if the traffic begins to increase on Friday evening and remains high until Sunday evening, you can scale activities based on the predictable traffic patterns of the web application. To create a scheduled scaling action, you must specify the start time of the scaling action and the new minimum, maximum, and desired sizes for the scaling action. At the specified time, autoscaling updates the group with the values for minimum, maximum, and desired size specified by the scaling action.

At Netflix, we employed scaling by policy wherein a given cluster was scaled up/down based on the incoming request per second (RPS) of a given application. We used incoming RPS as the metric to drive autoscaling because it is independent of the application and directly relates to throughput.

Design Guidelines

In this section, we detail the various design guidelines underlying the algorithms for autoscaling discussed later in this chapter.

Avoiding the ping-pong effect

During a scale up event, new nodes are added to a given ASG. As a consequence, the RPS per node drops. However, if the RPS per node drops below the threshold specified for scaling down an ASG, it would trigger a scale-down event. This would result in alternating scale-up and scale-down events, as illustrated in Figure 6-4 (a), and referred to as a ping-pong effect. At Netflix, we observed that ping-ponging can potentially result in higher latency and, in the worst case, can cause violation of the Service-Level Agreement (SLA) of the service.

Thus, when defining the autoscaling policies, it is imperative to ensure that the policy is not susceptible to ping-pong effect. The desired autoscaling profile is exemplified by Figure 6-4 (b).

(a) Illustrating Ping-Pong effect (b) Desired autoscaling profile (Y-axis corresponds to the number of nodes in the ASG and X-axis corresponds to time)
Figure 6-4. (a) Illustrating ping-pong effect (b) Desired autoscaling profile (Y-axis corresponds to the number of nodes in the ASG and X-axis corresponds to time)

Be proactive, not reactive

As mentioned earlier, applications such as the recommendation engine at Netflix take a long time to start. This can be ascribed to a variety of reasons; for example, loading of metadata of Netflix subscribers and precomputation of certain features. For such applications, it is critical to trigger the scale-up event in a proactive fashion, not reactively.

Let us consider the scenario shown in Figure 6-5. The solid arrow in the figure corresponds to the need to scale-up a given ASG as mandated by the SLA and increasing traffic. However, owing to a long application startup time, the autoscaling up is triggered, signified by the dashed arrow in the figure, proactively. The proactive approach ensures that the ASG is sufficiently provisioned by the time the latency approaches the SLA and that the SLA is never violated!

Illustration of scaling in a proactive fashion. Solid arrow signifies the need to scale up (as governed by SLA of the application at hand) and the dashed arrow signifies the corresponding autoscaling event (governed by the start up of the application)
Figure 6-5. Illustration of scaling in a proactive fashion. Solid arrow signifies the need to scale up (as governed by SLA of the application at hand) and the dashed arrow signifies the corresponding autoscaling event (governed by the start up of the application)

Aggressive upwards, conservative downwards

Delivering the best user experience is critical for business. Thus, you might want to employ an aggressive scale-up policy so as to be able to handle a more-than-expected increase in traffic. Also, an aggressive scale-up approach provides a buffer for increase in traffic during the cooldown period. In contrast, you might want to employ a conservative scale-down policy so as to be able to handle a slower (than the historical trend) ramp-down of traffic. Aggressive scale-down might accidentally result in under-provisioning, thereby adversely affecting latency and throughout.

Scalability Analysis

Determining the threshold for scale-up is an integral step is defining an autoscaling policy. A low threshold will result in under-utilization of the instances in the ASG; conversely, a high threshold can result in higher latency, thereby degrading the user experience. To this end, load testing is carried out to determine the throughput corresponding to the SLA of application (see Figure 6-6).

Trade-off between latency and throughput (load)
Figure 6-6. Trade-off between latency and throughput (load)

Properties

Each scale-up event should satisfy the following:

Property 1

RPS per node after scale up should be more than the scale-down threshold (TD).

Property 1 ensures that scale-up does not induce a ping-pong effect. Likewise, each scale-down event should satisfy the following:

Property 2

RPS per node after scale-down should be less than the scale-up threshold (TU).

Akin to Property 1, Property 2 also ensures that scale-up does not induce a ping-pong effect.

Autoscaling by Fixed Amount

In this section, we present a technique for scaling an ASG up/down by a fixed number of instances and as per the guidelines laid out earlier. AdjustmentType for the scaling policy is set to ChangeInCapacity, which is defined as follows:

ChangeInCapacity

This AdjustmentType is used to increase or decrease the capacity by a fixed amount on top of the existing capacity. For instance, let’s assume that the capacity of a given ASG is three and that ChangeInCapacity is set to five. When the policy is executed, autoscaling will add five more instances to ASG.

Algorithm 1 (shown in the following section) details the parameters and the steps to determine the scaling thresholds (for both scaling up and scaling down). The scale-down value D and the scale-up value U are inputs to the algorithm. The constants 0.90 and 0.50 used in defining TU, TD were determined empirically so as to minimize the impact on user experience and contain ASG under-utilization. Loop L1 in Algorithm 1 corresponds to scaling up an ASG as the incoming traffic increases. Loop L2 in Algorithm 1 scales down an ASG as the incoming traffic decreases.

Algorithm 1—autoscaling up/down by a fixed amount

Input

An application with a specified SLA.

Parameters

D Scale down value

U Scale up value

TD Scale down threshold (RPS per node)

TU Scale up threshold (RPS per node)

Nmin Minimum number of nodes in the ASG

Let T (SLA) return the maximum RPS per node for the specified SLA.

TU 0.90 × T (SLA)

TD 0.50 × TU

Let Nc, RPSn denote the current number of nodes and RPS per node respectively

L1: /* Scale Up (if RPSn > TU */

repeat

Nc

Nc Nc + U

RPSn RPSn * / Nc

until RPSn > TU

L2: /* Scale Down (if RPSn < TD) */

repeat

Nc

Nc max(Nmin, NcD)

RPSn RPSn * / Nc

until RPSn < TD or Nc = Nmin

Illustration of Algorithm 1

For a better understanding of Algorithm 1, let’s walk through a case study. The parameters of the algorithm are listed before Tables 6-1 and 6-2. Initially, RPSASG = 500 and Nc = 6. As RPSASG increases to 1540, RPSn approaches TU. An autoscaling up-event is triggered thereby adding 3 (= U) nodes to the ASG. As RPSASG increases subsequently, autoscaling up-events are triggered. Note that all the entries in column six satisfy Property 1.

During scale-down, the initial RPSASG = 5000 and Nc = 18. As RPSASG decreases to 3240, and RPSn approaches TD. An autoscaling down-event is triggered, thereby deleting 2 (= D) nodes from the ASG. Note that all the entries in column 12 satisfy Property 2.

Illustration of Algorithm 1 (D =2, U = 3, TD = 180, TU = 230)

Table 6-1. Scale Up
# Nodes (current) Nodes added RPSASG RPSn Total nodes New RPSn
6 0 500 83.33 6
1740
3 9 193.33
2610
3 12 217.50
3480
3 15 232.00
4350
3 18 241.67
5520
3 21 248.57
Table 6-2. Scale Down
# Nodes (current) Nodes added RPSASG RPSn Total nodes New RPSn
18 5000 277.78 18
3240
2 16 202.50
2880
2 14 205.71
2520
2 12 210.00
2160
2 10 216.00
1800
2 8 225.00

Scaling by Percentage

In this section, we present  a technique for scaling an ASG up or down by a percentage of current capacity and as per the guidelines laid out earlier. AdjustmentType for the scaling policy is set to PercentChangeInCapacity, which is defined here:

PercentChangeInCapacity

This AdjustmentType is used to increase or decrease the capacity by a percentage of the desired capacity. For instance, let’s assume that an ASG has 15 instances and a scaling-up policy of the type PercentChangeInCapacity and adjustment set to 15. When the policy is run, autoscaling will increase the ASG size by two.

Note that if the PercentChangeInCapacity returns a value between 0 and 1, autoscaling will round it off to 1. If the PercentChangeInCapacity returns a value greater than 1, autoscaling will round it off to the lower value.

Algorithm 2 details the parameters and the steps to determine the scaling thresholds (for both scaling up and scaling down). The scale-down value D and the scale-up value U (note that both are percentages) are inputs to the algorithm. The constants 0.90 and 0.50 used in defining TU and TD were determined empirically so as to minimize impact on user experience and contain ASG under utilization. Loop L1 in Algorithm 2 corresponds to scaling up an ASG as the incoming traffic increases. Loop L2 in Algorithm 2 scales down an ASG as the incoming traffic decreases.

Algorithm 2—autoscaling up/down by a percentage of current capacity

Input

An application with a specified SLA.

Parameters

D Scale down percentage value

U Scale up percentage value

TD Scale down threshold (RPS per node)

TU Scale up threshold (RPS per node)

Nmin Minimum number of nodes in the ASG

Let T (SLA) return the maximum RPS per node for the specified SLA.

TU 0.90 × T (SLA)

TD 0.50 × TU

Let Nc, RPSn denote the current number of nodes and RPS per node respectively

L1: /* Scale Up (if RPSn > TU) */

repeat

Nc

Nc Nc + max(1, Nc × U/100)

RPSn RPSn * /Nc

until RPSn > TU

L2: /* Scale Down (if RPSn < TD */

repeat

Nc

Nc max(Nmin,Nc – max(1,Nc × D/100))

RPSn RPSn * /Nc

until RPSn < TD or Nc = Nmin

Illustration of Algorithm 2

For a better understanding of Algorithm 2, let’s again walk through a case study. The parameters of the algorithm are mentioned before Tables 6-3 and 6-4. Nmin is set to 1. Initially, RPSASG = 500 and Nc = 6. As RPSASG increases to 1540 and RPSn approaches TU, an autoscaling up-event is triggered, thereby adding 1 (= max(1, 6 × 10/100) node to the ASG. As RPSASG increases subsequently, autoscaling up-events are triggered. Note that all the entries in column six satisfy Property 1.

During scale-down, the initial RPSASG = 5000 and Nc =18. As RPSASG decreases to 4140, RPSn approaches TD. An autoscaling down-event is triggered thereby deleting 1 (= max(1, .18 * 8/100.)) node from the ASG. Note that all the entries in column 12 satisfy Property 2.

Illustration of Algorithm 2 (D = 8, U = 10, Nmin = 1, TD = 230, TU = 290)

Table 6-3. Scale Up
# Nodes (current) Nodes added RPSASG RPSn Total nodes New RPSn
6 0 500 83.33 6
1740
1 7 248.57
2030
1 8 253.75
2320
1 9 257.78
2610
1 10 261.00
2900
1 11 263.64
3190
1 12 265.83
3480
1 13 267.69
3770
1 14 269.29
4060
1 15 270.67
4350
1 16 271.88
4640
1 17 272.94
4930
1 18 273.89
5220
1 19 274.74
Table 6-4. Scale Down
# Nodes (current) Nodes added RPSASG RPSn Total nodes New RPSn
18 5000 277.78 18
4140
1 17 243.53
3910
1 16 244.38
3680
1 15 245.33
3450
1 14 246.43
3220
1 13 247.69
2990
1 12 249.17
2760
1 11 250.91
2530
1 10 253.00
2300
1 9 255.56
2070
1 8 258.75
1840
1 7 262.86
1610
1 6 268.33

Upon comparing the illustrations of Algorithms 1 and 2, we note that the threshold values U and D are higher in the case of the latter. This boosts hardware utilization and reduces the footprint on the cloud. 

Startup Time Aware Scaling

In this section, we extend Algorithm 2 to guide autoscaling for applications with long startup times. Long application startup times can be ascribed to a variety of reasons; for example, loading of metadata. As discussed earlier, in the presence of long startup times, autoscaling up needs to be done proactively. For this, we employ the following steps:

  • For a historical time–series of RPS in production, determine the change in RPS over every Astart minutes, where Astart denotes the application startup time. This would yield a time–series with these data points:

    • RPSAstart − RPS0

    • RPSAstart+1 − RPS1

    • RPSAstart+2 − RPS2

    • RPSAstart+3 − RPS3

    • ·

    • ·

    • ·

    where RPSt denotes the RPS at time t. The derived time–series, referred to as rolling RPS change, captures the change in RPS in any window of width Astart minutes.

  • Compute the 99th percentile of the rolling time–series, denoted by R_RPS.

  • Compute γ = TU − RRPS. The parameter γ is the effective threshold for scale up. The use of 99th percentile of the rolling RPS change time–series is consistent with the Aggressive Upwards guideline outlined earlier.

The top of Figure 6-7 shows an example RPS time–series (with one-minute granularity) of an application. The startup time of the application was 30 minutes. The corresponding rolling RPS time–series is shown at the bottom of Figure 6-7. The 99th percentile of the rolling time–series is 1.109.

Rolling change in RPS_n for an application startup time of 30 minutes
Figure 6-7. Rolling change in RPSn for an application startup time of 30 minutes

Algorithm 3 details the parameters and the steps to determine the scaling thresholds (for both scaling up and scaling down). The scale-down value D and the scale-up value U are inputs to the algorithm. The constants 0.90 and 0.50 used in defining TU, TD were determined empirically so as to minimize the impact on user experience and contain ASG under-utilization. Loop L1 in “Algorithm 3” corresponds to scaling-up an ASG as the incoming traffic increases. Loop L2 in Algorithm 3 scales-down an ASG as the incoming traffic decreases. Unlike scale-up, the threshold for scale-down D need not be adjusted, because applications do not induce a long delay during termination of instances on Amazon’s EC2.

Algorithm 3—application start up aware autoscaling up/down by a percentage of current capacity

Input

An application with a specified SLA.

Parameters

D Scale down percentage value

U Scale up percentage value

Astart Application start up time (mins)

TD Scale down threshold (RPS per node)

TU Scale up threshold (RPS per node)

Nmin Minimum number of nodes in the ASG

Let T(SLA) return the maximum RPS per node for the specified SLA.

TU 0.90 × T(SLA)

TD 0.50 × TU

Let Nc,RPSn denote the current number of nodes and RPS per node respectively

Transform RPS time series to a rolling Astart (min) time series

Let RRPS denote the 99th percentile of the rolling time series

Let γ = TURRPS

L1: /* Scale Up (if RPSn > γ) */

repeat

Nc

Nc Nc + max(1,Nc × U/100)

RPSn RPSn * /Nc

until RPSn > TU

L2: /* Scale Down (if RPSn < TD) */

repeat

Nc

Nc max(Nmin, Nc – max(1,Nc × D/100))

RPSn RPSn * /Nc

until RPSn < TD or Nc = Nmin

Illustration of Algorithm 3

For a better understanding of Algorithm 3, let’s one more time walk through a case study (refer to Tables 6-5 and 6-6). The RPS and the rolling RPS change time–series for the application are shown in Figure 6-7, respectively. The parameters of the algorithm are mentioned before Tables 6-5 and 6-6. Nmin is set to 1. Initially, RPSASG = 800, Nc = 170, and γ = 12.9. As RPSASG increases to 2193, RPSn approaches γ. An autoscaling up-event is triggered, thereby adding 25 (= max(1, 170 × 15/100)) nodes to the ASG. As RPSASG increases subsequently, autoscaling up-events are triggered. Note that all the entries in column seven satisfy Property 1.

During scale down, the initial RPSASG = 4400 and Nc = 389. As RPSASG decreases to 3890, RPSn approaches TD. An autoscaling down-event gets triggered thereby deleting 38 (= max(1, 389 x 10/100)) nodes from the ASG. Note that all the entries in column 13 satisfy Property 2.

Illustration of Algorithm 3 (D = 10, U = 15, Umin = 1, ASTART = 30, RRPS = 1.1, TD = 10, TU = 14)

Table 6-5. Scale Up
# Nodes (current) Nodes added RPSASG RPSn γ = TURRPS Total nodes New RPSn
170 0 800 4.71 12.9 170
2193
25 195 11.25
2515.5
29 224 11.23
2889.6
33 257 11.24
3315.3
38 295 11.24
3805.5
44 339 11.23
4373.1
50 389 11.24
5018.1
Table 6-6. Scale Down
# Nodes (current) Nodes added RPSASG RPSn Total nodes New RPSn
389 4400 11.31 389
3890
38 351 11.08
3510
35 316 11.11
3160
31 285 11.09
2850
28 257 11.09
2570
25 232 11.08
2320
23 209 11.10
2090

Potpourri

There have been cases wherein the CPU utilization on production nodes spiked without any increase in traffic. This can happen to a variety of accidental events. To handle such cases, instituting add-on scale-up policies (i.e., besides a scale-up policy based on RPS), as exemplified in Figure 6-8, helps to mitigate the impact on the end users.

Add-on policies to check “meltdown”
Figure 6-8. Add-on policies to check “meltdown”

In July 2015, AWS introduced new scaling policies with steps. For example, you can specify different responses for different levels of average CPU utilization, say <50%, [50%, 60%), [60%, 80%), and ≥80%. Further, if you create multiple-step scaling policies for the same resource (perhaps based on CPU utilization and inbound network traffic) and both of them fire at approximately the same time, autoscaling will look at both policies and choose the one that results in the change of the highest magnitude.

In certain scenarios, you might want to protect certain instances in an ASG from termination. For example, an instance might be handling a long-running work task, perhaps pulled from an SQS queue. Protecting the instance from termination will avoid wasted work. In a similar vein, an instance might serve a special purpose within the group; for example, it could be the master node of a Hadoop cluster, or a “canary” that flags the entire group of instances as up and running. To this end, you can use the Instance Protection Feature offered by AWS. In most cases, at least one instance in an ASG should be left unprotected; if all of the instances are protected, no scale action will be taken.

Leading companies such as Netflix and Facebook have been using autoscaling to improve cluster performance, service availability, and reduce costs. In its post, Facebook shared the following:

...a particular type of web server at Facebook consumes about 60 watts of power when it’s idle (0 RPS, or requests-per-second). The power consumption jumps to 130 watts when it runs at low-level CPU utilization (small RPS). But when it runs at medium-level CPU utilization, power consumption increases only slightly to 150 watts. Therefore, from a power-efficiency perspective, we should try to avoid running a server at low RPS and instead try to run at medium RPS.

For details, refer to the following:

Besides RPS, other metrics have been used for autoscaling: CPU utilization, memory usage, disk I/O bandwidth, network link load, peak workload, jobs in progress, service rate, the number of concurrent users, the number of active connections, jitter, delay, and the average response time per request. Regression modeling has been employed for predicting the amount of resources needed for the actual workload and possibly retract over-provisioned resources. Likewise, several other approaches have been employed for resource-demand prediction such as prediction based on changes in the request arrival rate (i.e., the slope of the workload). You can employ sensitivity analysis to characterize the different types of inputs and determine the types of resources that have the highest impact on the throughput (or the performance metric of interest) of the application. Subsequently, you can set up multiple autoscaling rules based on one or more resource types.

In recent years, the use of containers has received wide attention. Amazon EC2 Container Service (ECS), Google Container Engine, and Microsoft Azure Container Service are the most popular public container services. Multi-AZ clusters make the ECS infrastructure highly available, thereby providing a safeguard from potential zone failure. The AZ–aware ECS scheduler manages, scales, and distributes the tasks across the cluster, thus making the architecture highly available. On AWS, akin to the EC2 instances, autoscaling policies also can be defined for ECS instances. You can use the approaches discussed earlier in this chapter in the context of autoscaling container instances, as well. 

Advanced Approaches

Given the importance of exploiting the elasticity of the cloud in the best possible fashion, several advanced techniques have been proposed for autoscaling in both the industry and the academia. For instance, Facebook employed the classic control theory and proportional-integral (PI) controller to achieve fast reaction time. Netflix developed two prediction algorithms—one of which is an augmented linear regression–based algorithm, the other based on Fast Fourier Transform (FFT). One of the key highlights of the Netflix approach is that it’s predictive. Specifically, its approach learns the request pattern based on historical data and subsequently drives the scale-up or scale-down action. Both approaches have been deployed in production environments. Given that no comparative analysis was presented, it is difficult to assess how these techniques fare against the techniques that were proposed previously.

Many other approaches for autoscaling have been proposed based on control theory, queuing theory, fuzzy logic, neural networks, reinforcement learning, support vector machines, wavelet transform, regression splines, pattern matching, Kalman filters, sliding window, proportional thresholding, second-order regression, histograms, time–series models, the secant method, voting system, and look-ahead control. In most cases, these techniques “learn” from past traffic patterns and resource usage and hence are unable to adapt with any new pattern that might appear as a result of the dynamic nature of the web traffic.

NOTE

For more information, refer to the section “Readings”.

In practice, applicability of autoscaling approaches based on the preceding is limited owing to a wide variety of reasons. For instance, reinforcement learning–based approaches require a long time to learn and adapt only to slowly changing conditions. Therefore, you cannot apply such techniques to real applications that usually experience sudden traffic bursts. In a similar vein, queuing theory–based approaches impose hard assumptions that are typically not valid for real, complex systems. Besides, such approaches are intended for stationary scenarios and hence you will need to recalculate the queuing model when the conditions of the application change. In the case of control theory–based approaches, determining the gain parameters is nontrivial.

Summary

At times, we observe spikes in incoming traffic. This can happen due to a variety reasons. For instance, at the end of events such as the Super Bowl, you would observe (as expected) a sudden rise in incoming traffic (e.g., number of tweets). Figure 6-9 presents an example traffic profile with spikes. State-of-the-art autoscaling techniques do not fare well against such spikes.

Spikes in load in production
Figure 6-9. Spikes in load in production

Akin to the preceding, “burstiness” in the workload at finer timescales (in the order of seconds) can potentially adversely affect the efficacy of autoscaling techniques, as demonstrated in Figure 6-10.

A bursty workload
Figure 6-10. A bursty workload

In the under-provisioning scenario, fine-scale burstiness can potentially cause an increased queuing effect and a high request defection rate, thereby resulting in increased SLA violations. On the other hand, in the over-provisioning scenario, fine-scale burstiness can potentially result in reduced resource utilization at the application server. Thus, as a community, we need to build support for fine-grained monitoring and develop more agile, adaptive policies to guarantee effective elasticity under fine-scale burstiness.

Autoscaling a service down independent of the traffic upstream can potentially result in meltdowns. Thus, it is critical to develop autoscaling techniques that capture the interaction between different services in a Service-Oriented Architecture (SOA). Outages in the cloud and in datacenters (see “Resources”) have been becoming increasingly more frequent. One of the ways to minimize the impact of outages is to extend the SOA to span multiple Infrastructure as a Service (IaaS) vendors. This would in turn call for extending the techniques proposed in this chapter to be vendor-aware.

Readings

  1. http://docs.rightscale.com/cm/dashboard/manage/arrays/arrays_actions.html#set-up-autoscaling-using-voting-tags

  2. A. Ilyushkin et al. (2017). An Experimental Performance Evaluation of Autoscaling Policies for Complex Workflows.

  3. A. V. Papadopoulos et al. (2016). PEAS: A Performance Evaluation Framework for Auto-Scaling Strategies in Cloud Applications.

  4. M. Grechanik et al. (2016). Enhancing Rules For Cloud Resource Provisioning Via Learned Software Performance Models.

  5. C. Qu et al. (2016). A reliable and cost-efficient auto-scaling system for web applications using heterogeneous spot instances.

  6. L. Zheng et al. (2015). How to Bid the Cloud.

  7. A. N. Toosi et al. (2015). SipaaS: Spot instance pricing as a Service framework and its implementation in OpenStack.

  8. W. Guo et al. (2015). Bidding for Highly Available Services with Low Price in Spot Instance Market.

  9. S. Islam et al. (2015). Evaluating the impact of fine-scale burstiness on cloud elasticity.

  10. V. R. Messias et al. (2015). Combining time series prediction models using genetic algorithm to autoscaling Web applications hosted in the cloud infrastructure.

  11. M. Beltran. (2015). Defining an Elasticity Metric for Cloud Computing Environments.

  12. A. Y. Nikravesh et al. (2015). Towards an autonomic auto-scaling prediction system for cloud resource provisioning.

  13. M. Barati and S. Sharifian. (2015). A hybrid heuristic-based tuned support vector regression model for cloud load prediction.

  14. P. Padala et al. (2014). Scaling of Cloud Applications Using Machine Learning.

  15. T. Lorido-Botran et al. (2014). A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments.

  16. H. Alipour et al. (2014). Analyzing auto-scaling issues in cloud environments.

  17. H. Fernandez et al. (2014). Autoscaling Web Applications in Heterogeneous Cloud Infrastructures.

  18. N. R. Herbst et al. (2013). Self-adaptive workload classification and forecasting for proactive resource provisioning.

  19. E. Barrett et al. (2012). Applying reinforcement learning towards automating resource allocation and application scalability in the cloud.

  20. D. Villegas et al. (2012). An analysis of provisioning and allocation policies for infrastructure-as-a-service clouds.

  21. S. Islam et al. (2012). How a consumer can measure elasticity for cloud platforms.

  22. — (2012). Empirical Prediction Models for Adaptive Resource Provisioning in the Cloud.

  23. R. Han et al. (2012). Lightweight Resource Scaling for Cloud Applications.

  24. X. Dutreilh et al. (2011). Using Reinforcement Learning for Autonomic Resource Allocation in Clouds: towards a fully automated workflow.

  25. N. Roy et al. (2011). Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting.

  26. W. Iqbal et al. (2011). Adaptive resource provisioning for read intensive multi-tier applications in the cloud.

  27. M. Mao and M. Humphrey. (2011). Auto-scaling to minimize cost and meet application deadlines in cloud workflows.

  28. Nilabja Roy et al. (2011). Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting.

  29. Zhiming Shen et al. (2011). Cloudscale: Elastic resource scaling for multi-tenant cloud systems.

  30. P. Lama and X. Zhou. (2010). Autonomic Provisioning with Self-Adaptive Neural Fuzzy Control for End-to-end Delay Guarantee.

  31. E. Caron et al. (2010). Forecasting for Cloud computing on-demand resources based on pattern matching.

  32. Z. Gong et al. (2010). PRESS: PRedictive elastic resource scaling for cloud systems.

  33. S. Meng et al. (2010). Tide: Achieving self-scaling in virtualized datacenter management middleware.

  34. S. Yi et al. (2010). Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud.

  35. H. C. Lim et al. (2009). Automated control in cloud computing: Challenges and opportunities.

  36. E. Kalyvianaki et al. (2009). Self-adaptive and self-configured CPU resource provisioning for virtualized servers using Kalman filters.

  37. B. Urgaonkar et al. (2008). Agile dynamic provisioning of multi-tier internet applications.

Resources

  1. “Delta Meltdown Reflects Problems with Aging Technology.” (2016). http://on.wsj.com/2wCcsPq.

  2. “Southwest Outage, Canceled Flights Cost an Estimated $54M.” (2016). https://bloom.bg/2wsoJp4

  3. “WhatsApp apologises as service crashes on New Year’s Eve: Users worldwide unable to connect as messaging app goes offline.” (2015). http://dailym.ai/2vtE6J3.

  4. “Google Docs Outage Further Saps Friday Productivity.” (2015). http://on.wsj.com/2vtEbMR.

  5. “Slack outage cues massive freakout, but it’s significant for more than that.” (2015). http://mashable.com/2015/11/23/slack-down-reactions/.

  6. “AWS Outage.” (2012). http://aws.amazon.com/message/67457/.

  7. “Twitter Is Down, Again.” (2012). http://tcrn.ch/2wK5Ihz.

  8. “Twitter Outage.” (2012). http://bit.ly/twitter-outage-2012.

  9. “Google Talk Is Down: Worldwide Outage Since 6:50 AM EDT.” (2012). http://tcrn.ch/2vjNIqH.

  10. “AWS Outage.” (2011). http://aws.amazon.com/message/65648/.

  11. “Twitter Outage.” (2011). http://status.twitter.com/post/2369720246/streaming-outage.

  12. “Time is Money: The Value of On-Demand.” by Joe Weinman (2011). http://joeweinman.com/Resources/Joe_Weinman_Time_Is_Money.pdf

  13. “Lightning Strike Triggers Amazon EC2 Outage.” (2009). http://www.datacenterknowledge.com/archives/2009/06/11/lightning-strike-triggers-amazon-ec2-outage/.

  14. “Outage for Amazon Web Services.” (2009). http://www.datacenterknowledge.com/archives/2009/07/19/outage-for-amazon-web-services/.

  15. “Brief Power Outage for Amazon Data Center.” (Dec. 2009). http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/.

  16. “Major Outage for Amazon S3 and EC2.” (Feb. 2008). http://www.datacenterknowledge.com/archives/2008/02/15/major-outage-for-amazon-s3-and-ec2/.

  17. “Amazon EC2 Outage Wipes Out Data.” (Oct. 2007). http://www.datacenterknowledge.com/archives/2007/10/02/amazon-ec2-outage-wipes-out-data/.

  18. “List of web host service outages.” http://bit.ly/list-host-outages.

1 “Completing the Netflix Cloud Migration.” (2016) https://media.netflix.com/en/company-blog/completing-the-netflix-cloud-migration

2 We encourage you to compare the prices of Reserved and On-Demand instances on AWS at https://aws.amazon.com/ec2/pricing/.

3 For further information about Spot instances, go to http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html.

4 As per Mike Tung of Diffbot, bidding, allocation, and booting of Spot instances at Amazon’s end is quite fast—in the neighborhood of two to three minutes. Configuring the instance and loading the job is typically the bottleneck. The bottleneck is supposedly worse with Spot instances than with on-demand instances. For further details, go to http://blog.diffbot.com/setting-up-a-machine-learning-farm-in-the-cloud-with-spot-instances-auto-scaling/.

5 The key differences between Spot instances and On-Demand instances are that Spot instances might not start immediately, the hourly price for Spot instances varies based on demand, and Amazon EC2 can terminate an individual Spot instance as the hourly price for or availability of Spot instances changes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset