CHAPTER TWO

Setting Goals for Capacity

You wouldn’t begin mixing concrete before you know what you’re building. Similarly, you shouldn’t begin planning for capacity before you determine the requirements for performance, availability, and reliability of a site or mobile app. As Chapter 1 mentions, the requirements change as the end user’s expectations evolve. The requirements are also a function of the technology; for example, use of virtual machines (VMs) versus use of containers. Consequently, capacity planning is not a one-time process, but a continuous one.

Capacity planning involves a lot of assumptions related to why your enterprise needs the capacity. Some of these assumptions are obvious, whereas others are not. For example, if you don’t know that you should be serving the pages in less than three seconds, you’re going to have a tough time determining how many servers will be needed to satisfy that requirement. More important, it will be even tougher to determine how many servers you would need to add as the traffic grows.

Image

Common sense, right? Yes, but it’s amazing how many organizations don’t take the time to assemble a rudimentary list of operational requirements. Waiting until users complain about slow responses or time-outs isn’t a good strategy.

Establishing the acceptable speed or reliability of each part of the site can be a considerable undertaking, but it will pay off when you’re planning for growth and need to know what standard you should maintain. This chapter shows you how to understand the different types of requirements the management and customers will force you to deal with, and how architectural design helps with this planning.

Different Kinds of Requirements and Measurements

Now that we’re talking about requirements—which might be set by others, external to your group—we can look at the different types you would need to deal with. The managers, the end users, and the clients running websites all have varying objectives and measure success in different ways. Even in the serverless context, you must specify the Service-Level Agreement (SLA) or Service-Level Objectives (SLO) requirements for the service being used so that you can meet your own performance targets or SLAs (discussed later in this chapter). Ultimately, these requirements, or capacity goals, are interrelated and can be distilled into the following:

  • Performance, availability, and reliability

    • External service monitoring

    • Business requirements

    • User expectations

  • Capacity

    • System metrics

    • Resource ceilings

External Service Monitoring

A site or mobile app should be available not only to your colleagues performing tests on the website from a facility down the road, but also to real visitors who might be located on other continents with slow connections. This is particularly important given that the speed of mobile networks varies significantly around the globe, which is illustrated in Figure 2-1.

Speed of mobile networks around the globe (source: https://opensignal.com/reports/2016/08/global-state-of-the-mobile-network/)
Figure 2-1. Speed of mobile networks around the globe (source: https://opensignal.com/reports/2016/08/global-state-of-the-mobile-network/)

Likewise, the density of feature phones and the different generations of smartphones—say, iPhone 4 versus iPhone 5 versus iPhone 6 versus iPhone 7—varies significantly around the globe. Performance sensitivity of users of old generation phones is much higher than that of users of new generation phones. To this end, companies such as Facebook are testing stripped-down versions of their apps for lower-end phones.

Some large companies choose to have site performance (and availability) constantly monitored by services such as Catchpoint, Keynote, Gomez, and Soasta. These commercial services deploy worldwide networks of machines that constantly ping the web pages to record the response time and a variety of other metrics such as the following:

  • Domain Name Server (DNS) time

  • Secure Sockets Layer (SSL) time

  • Wire time

  • Wait time

  • Time to first byte (TTFB)

  • Page load time

  • Time to render start

  • Above-the-fold (AFT) time

  • Document completion time

The service providers keep track of all these metrics and build handy-dandy dashboards to evaluate how the site performance and uptime appears from many locations around the world. Because the aforementioned third parties are deemed “objective,” the statistics reported by them can be used to enforce or guide SLAs arranged with partner companies or sites (we talk more about SLAs later). Keynote and Gomez can be considered enterprise-level services. There are also plenty of low-cost alternatives, including PingDom, SiteUptime, and Alertra. Having visibility into the aforementioned metrics—preferably in the form of intelligent alerts—can help expose potential capacity issues. In light of the explosion in the number of metrics being collected nowadays, carrying visual analysis to detect anomalies is no longer practical. Further, visual analysis is error prone. A large number of false positives would result in alert fatigue. Hence, one should carry out anomaly detection algorithmically.

Image

It’s important to understand exactly what these services measure and how to interpret the numbers they generate. Because most of them are networks of machines rather than people, it’s essential to be aware of how those web pages are being requested. Some things to consider when you’re looking at a monitoring service include the following:

  • Are they simulating human users?

    The intent of the users has a direct impact on their interaction with a website or a mobile app. For instance, in the context of search, the following types of intent have been reported in prior research:

    • Navigational

    • Informational

    • Commercial

    • Transactional

    In a similar vein, how users interact with the content—primarily images and video—on a web page or in an app has direct ramifications on their online experiences. User settings such as whether Location Services, Notifications, and Limit Ad Tracking (a feature in iOS 10) are turned on or off determine the amount of data that an app can collect, which in turn affects the responsiveness of a mobile app.

    Thus, it is critical to understand how the monitoring services model behavior and how reflective their model is of your typical end user.

  • Are they caching objects like a normal web browser would? Why or why not?

    Recall that all modern browsers support caching. Having said that, cache size differs between a desktop browser and a mobile browser. Further, caching is not ubiquitous in the mobile app world. In the context of user acquisition (UA), you should plan for the worst-case scenario; that is, assume a cold cache.

  • Can you determine how much time is spent due to network transfer versus server time, in the aggregate as well as for each object?

  • Can you determine whether a failure or unexpected wait time is due to geographic network issues, server-side issues, or measurement failures?

    For example, guaranteeing consistency—the desired level differs on a case-by-case basis—in a distributed database or a distributed warehouse (as exemplified by Google’s Spanner and Mesa, respectively) often induces a performance penalty or can affect availability. Likewise, an unexpected wait time might stem from an issue with one or more network load balancers.

To address these questions, you would need to go through the documentation of the monitoring service. If it’s not clear from the documentation, you should reach out to the architect or technical lead of the service. If you think that the service monitoring systems are testing in a manner representative of the users when they visit the site, you have good reason to trust the numbers. Also keep in mind, the metrics you use for capacity planning or site performance measurement might ultimately find their way onto an executive dashboard somewhere, viewed by a nontechnical audience.

CFOs, CTOs, business development folks, and even CEOs can become addicted to qualitative assessments of operations. This can be a double-edged sword. On the one hand, you are being transparent about failures, which can help when you’re attempting to justify expenditures and organizational changes to support capacity. On the other hand, you also are giving a frequently obsessive crowd more to obsess about, so when there are any anomalies in this data—for example, such as long response times due to failures or sudden surge in traffic—you should be prepared to explain what they mean.

SLAs

So, what exactly is an SLA? It’s an instrument that makes business people comfortable, much like insurance. But in broader, less anxious terms, an SLA is a metric that defines how a service should operate within agreed-upon boundaries. It puts some financial muscle into the metric by establishing a schedule of credits for meeting goals, or possibly penalties if the service does not achieve them. With websites, SLAs cover mostly availability and performance. Thus, SLAs directly influence architectural design and the capacity planning process.

Image

Some SLAs guarantee that a service will be available for a preestablished percentage of time, such as 99.99 percent. What this means is that 0.01 percent of the time, the service can be unavailable, and it will still be within the bounds of the SLA. Other SLAs require that demand for a service stay within reasonable limits; request rate limits or storage and upload limits are typical parameters.

For example, you might find a web hosting company that uses verbiage similar to the following in its “Terms of Service” document:

Acme Hosting, Inc. will use commercially reasonable efforts to make the SuperHostingPlan available with a monthly uptime percentage (defined below) of at least 99.9% during any monthly billing cycle. In the event Acme Hosting, Inc. does not meet this commitment, you will be eligible to receive a service credit as described here:

Monthly uptime percentage Credit percentage
Between 99 and 99.9% 1 day credit
Less than 99% 1 week credit

Looks pretty reassuring, doesn’t it? The problem is, 99.9 percent uptime stretched over a month isn’t as great a number as you might think:

  • 30 days = 720 hours = 43,200 minutes

  • 99.9 percent of 43,200 minutes = 43,156.8 minutes

  • 43,200 minutes – 43,156.8 minutes = 43.2 minutes

This means for 43.2 minutes every month, this service can go down without penalty. If the site generates $3,000 worth of sales every minute, you could easily calculate how much money any amount of downtime will cost (along with the less measurable consequence of disgruntled customers). Table 2-1 shows percentages of uptime on a yearly basis.

Table 2-1. SLA percentages and acceptable downtimes
Uptime SLA Downtime per year
90.0 percent 36 days, 12 hours
95.0 percent 18 days, 6 hours
99.0 percent 87 hours, 36 minutes
99.50 percent 43 hours, 48 minutes
99.90 percent 8 hours, 45 minutes, 36 seconds
99.99 percent 52 minutes, 33 seconds
99.999 percent 5 minutes, 15 seconds
99.9999 percent 32 seconds

The term five-nines is commonly heard in discussions about SLAs and availability. This refers to 99.999 percent availability and it is used in marketing literature at least as much as it is in technical literature. Five-nines is usually used to indicate the site or system is deemed to be highly available. The table includes uptime percentages other than the ones discussed earlier. It’s not uncommon for operations folks to use 95 percent and two-nines and four-nines.

These SLA availability numbers aim to provide not only a level of confidence in a website’s service, but also imply that you can equate downtime to lost revenue. We don’t believe that this is actually accurate, because the straight math will bear out. If the service is unavailable for 10 minutes and it normally produces $3,000 of revenue every minute, you might assume the business has lost $30,000. In reality, customers might just pick up where they left off and buy what they were in the process of buying when the outage occurred. The business might be spending extra money on the customer service side to make up for an outage that has no impact on the earnings.

NOTE

For further discussion of the revenue impact on performance, refer to http://blog.catchpoint.com/2017/01/06/performance-impact-revenue-real/.

The point is, while the analysis of the financial impact of an outage might be neither true nor accurate, the importance of availability should be clear.

Business Capacity Requirements

The use of web services is becoming more and more prevalent in today’s Web 3.0 mashup-y and mobile app world. Although most web services and platforms offer open APIs (e.g., Facebook’s Graph API and Twitter’s Streaming APIs) for individual application developers to build upon, business-to-business relationships depend on them, as well. Therefore, companies usually tie revenue streams to having unfettered access to an API. This could mean a business relationship relies on a certain level of availability, or performance of the API, measured in a percentage uptime (such as 99.99 percent) and/or an agreed-upon rate of API requests.

Let’s assume that a website provides postal codes, given various inputs to the API that you have built. You might allow only one API call per minute to a regular or noncommercial user, but a shipping company might enter into a contract permitting it to call the API up to 10 times per second. In the context of social media, the numbers corresponding to the user base and traffic are much higher. For example, the statistics in Figure 2-2 were posted by Jan Koum (cofounder of WhatsApp) on Facebook on February 1, 2016.

WhatsApp user base and traffic as of February 1, 2016(source: https://www.facebook.com/photo.php?fbid=10153874647095011&set=a.10150731994525011.456435.500035010&type=1&theater)
Figure 2-2. WhatsApp user base and traffic as of February 1, 2016 (source: http://bit.ly/2vLGoEs)

Likewise, as of 26 April, 2017, Instagram has more than 700 million Monthly Active Users (MAUs),1 over 400 million Daily Active Users (DAUs), and more than 95 million photos/videos are uploaded on Instagram on a daily basis. Supporting such high and, more important, growing traffic calls for very systematic capacity planning. Website capacity planning is as much about justifying capital expenditures as it is about technical issues, such as scaling, architectures, software, and hardware. Because capacity concerns can have such a large impact on business operations, they should be considered early in the process of development. 

User Expectations

Obviously, the end goal of capacity planning is a smooth and speedy experience for the users. User expectations vary depending on what type of application they are using and even what portion of the application they are interacting with. For example, the expectation for speed when searching for vacation packages on a travel site is different than it is for loading the checkout page.

Image

It is well known that the perceived performance of a website or mobile app directly affects user engagement. To this end, several metrics have been proposed to quantify perceived performance:

  • First paint

  • Render start

  • DOM interactive

  • Speed index

  • AFT time

  • Object rendering time (aka Hero image timing)

  • Critical resources

Besides the aforementioned metrics, another alternative is to use the User Timing “Standard Mark Names” such as “mark_fully_loaded,” “mark_fully_visible,” “mark_above_the_fold,” and “mark_time_to_user_action.” Different metrics capture different aspects of the user experience; hence, you should not look for a one-size-fits-all metric.

NOTE

For links to the discussions of the aforementioned metrics, refer to “Resources”.

It’s possible to have plenty of capacity but a slow website nonetheless, and, in the worst case, the service can be unavailable. This is not uncommon today when most of the content of a web page is predominantly high-quality images or videos—the key drivers of high user engagement and conversion and potential sources of performance drag. In a similar vein, with the increasing use of third-party services, a web page can potentially become unavailable in spite of having ample capacity.

NOTE

Designing fast and highly available web pages is beyond the scope of this book, but you can find a lot of great information in Steve Souders’ excellent book, High Performance Web Sites (O’Reilly) and in Ilya Grigorik’s book titled High Performance Browser Networking (O’Reilly). To learn how to mitigate the impact on performance by rendering high-quality images and videos, check out Colin Bendell et al. High Performance Images (O’Reilly) and Ilya Grigorik’s book High Performance Browser Networking (O’Reilly); the book is also available for free at https://hpbn.co/. To learn how to achieve high availability, refer to Lee Atchison’s Architecting for Scale: High Availability for Your Growing Applications (O’Reilly).

Even though capacity is only one part of making the end-user experience fast, that experience is still one of the real-world metrics that we’ll want to measure and track in order to make capacity forecasts. For example, when serving static web content, you might reach an intolerable amount of latency at high volumes before any system-level metrics (CPU, disk, memory) raise a red flag. Again, this can have more to do with the construction of the web page than the capacity of the servers sending the content. But because capacity is one of the more expensive pieces to change, it warrants investigation. Perceived slowness of a web page could be the result of a page that is simply too heavy, and not from a lack of capacity. (This is one of the fundamentals of Souders’ book.) It’s a good idea to determine whether this is the case when any user-perceived slowness is analyzed. The problem can be solved by either adding capacity or changing the page weight. The former can sometimes involve more cost than the latter.

Determining the root cause of the perceived slowness of a web page is, relatively speaking, easier in the context of monolithic architectures than in the context of Service-Oriented Architecture (SOA) or microservice architecture (MSA) (Chapter 1 discusses these briefly). Services such as Twitter and Netflix comprise hundreds of microservices. In an MSA, typically each microservice provides a specific functionality; for example, at Twitter, there are different microservices for recommending “Who To Follow,” revealing relevant tweets for the “While You Were Away” product feature. The complex interactions between the different microservices results in cascading of performance issues—this adversely affects Mean Time to Resolution (MTTR). Further, the use of third-party vendors for key services—such as managed DNS, content delivery, content acceleration, adserving, analytics, behavioral targeting, content optimization, and widgets—makes diagnosis of performance issues even more challenging.

When John was at Flickr, tens of thousands of photos were served per second. Each photo server could serve a known and specific rate of images before reaching its maximum. The maximum was not defined in terms of disk I/O, CPU, or memory, but in terms of how many images could be served without the “time to serve” for each image exceeding the specified amount of time.

Architecture Decisions

The architecture is the basic layout of how all of the backend pieces—both hardware and software—are joined. Its design plays a crucial role in your ability to plan and manage capacity.

NOTE

Designing the architecture can be a complex undertaking, but there are a couple of great books available on the subject: Cal Henderson’s Building Scalable Web Sites (O’Reilly) and Theo Schlossnagle’s Scalable Internet Architectures (Pearson). In the mobile context, take a look at Maximiliano Firtman’s High Performance Mobile Web (O’Reilly). For developing high-performance iOS and Android apps, read Gaurav Vaish’s High Performance iOS Apps (O’Reilly) and Doug Sillars’ High Performance Android Apps (O’Reilly).

The architecture affects nearly every part of performance, reliability, and management. Establishing a good architecture almost always makes capacity planning easier.

Providing Measurement Points

Both for measurement purposes as well as for rapid response to changing conditions, the architecture should be designed such that you easily can split it into parts that perform discrete tasks. In an ideal world, each component of the backend should have a single job to do, but it could still do multiple jobs well, if needed. At the same time, its effectiveness on each job should be easy to measure. To this end, MSA has gained momentum in recent years. In particular, microservices are a way of developing and composing software systems such that they are built out of small, independent components that interact with one another over the network. By limiting dependencies on other parts of the system, MSAs can be changed much more quickly (as compared to their monolithic counterparts) in response to a bug or a feature request. With increasing containerization of MSAs—which is exemplified by support for containers in public clouds such as Amazon Web Services (AWS), Microsofts Azure, Google Cloud Platform (GCP), and IBM Bluemix—you can take advantage of the built-in support for monitoring in containers to measure task-level metrics.

NOTE

For references to information on containers, go to the section “Readings”.

For instance, let’s look at a simple database-driven web. To get the most bang for our buck, we have our web server and our database residing on the same hardware server. This means that all of the moving parts share the same hardware resources, as shown in Figure 2-3.

A simple, single-server web application architecture
Figure 2-3. A simple, single-server web application architecture

Suppose that you have configured measurements for both system and application-level statistics for the server.  You can measure the system statistics of this server via sar or rrdtool and application-level metrics such as web resource requests or database queries-per-second.

The difficulty with the setup in Figure 2-3 is that you can’t easily distinguish which system statistics correspond with the different pieces of the architecture. Therefore, one can’t answer basic questions that are likely to arise, such as:

  • Is the disk utilization the result of the web server sending out a lot of static content from the disk, or are the database’s queries being disk-bound?

  • How much of the filesystem cache, CPU, memory, and disk utilization is being consumed by the web server, and how much is being used for the database?

With careful research, you can make some estimates about which daemon is using which resource. In the best case, the resource demands of the different daemons don’t contend with one another. For example, the web server might be bound mostly by CPU and not need much memory, whereas the database might be memory-bound without using much CPU. But even in this ideal scenario, if usage continues to grow, the resource contention will grow to warrant splitting the architecture into different hardware components (Figure 2-4). The splitting enables performance isolation between the various services. At that point, you would really like to know how much CPU, cache, disk space, bus bandwidth, and so on that each daemon actually needs.

Separation of web server and database
Figure 2-4. Separation of web server and database

Splitting the nodes in this fashion makes it easier to understand the capacity demands, given that the resources on each server are now dedicated to each piece of the architecture. It also means that you can measure each server and its resource demands more distinctly. You could come to conclusions with the single-component configuration, but with less ease and accuracy. Of course, this division of labor also produces performance gains, such as preventing frontend client-side traffic from interfering with database traffic, but let’s forget about performance for the moment.

If we’re recording system- and application-level statistics, we can quantify what each unit of capacity means in terms of usage. With this new architecture, we can answer a few questions that we couldn’t before:

Database server

How do increases in database queries-per-second affect the following?

  • Disk utilization

  • I/O wait (percent of time the database waits due to network or disk operations)

  • RAM usage

  • CPU usage

Web server

How do increases in web server requests-per-second affect the following?

  • Disk utilization

  • I/O Wait

  • RAM usage

  • CPU usage

Being able to answer these questions is key to establishing how (and when) we would want to add more capacity to each piece.

Resource Ceilings

Now that you have a good idea of what’s required for each piece of this simple architecture, you can get a sense for whether you would want different hardware configurations.

For instance, back in our days at Flickr, for the most part, our MySQL database installations happened to be disk-bound, so there was no compelling reason to buy two quad-core CPUs for each database box. Instead, we spent money on more disk spindles and memory to help with filesystem performance and caching. We knew this to be our ideal database hardware configuration—for our database. We had different configurations for our image serving machines, our web servers, and our image processing machines; all according to what in-box resources they relied on most.

The last piece we’re missing in this discussion on architecture is what drives capacity forecasting: resource ceilings. The questions posed earlier regarding the effect of usage on resources point to an obvious culmination: when will the database or web server die?

Each server in our example possesses a finite amount of the following hardware resources:

  • Disk throughput

  • Disk storage

  • CPU

  • RAM

  • Network

High loads will bump against the limits of one or more of those resources. Somewhere just below that critical level is where one would want to determine the ceiling for  each piece of the architecture.  The ceiling is the critical level of a particular resource (or resources) that cannot be crossed without failure or violation of one or more SLAs. Armed with the current ceilings, you can begin to assemble the capacity plan. In practice, different services exert different resource pressure. Owing to this, public clouds offer a wide range of instance types; for example, as of November 26, 2016, AWS EC2 supports more than 40 different instance types and Google Compute Engine supports more than 15 instance types. We talk more about examples of ceilings in Chapter 3.

As you can see, changing architecture in simple ways can help you to understand for what purposes the capacity is being used. When thinking about architecture design, keep in mind the division of labor and the “small pieces, loosely joined” theory can go a long way toward giving you clues regarding how the site is being used. We touch more on architecture decisions throughout the book, and particularly in Chapter 3.

Hardware Decisions (Vertical, Horizontal, and Diagonal Scaling)

Choosing the right hardware for each component of the architecture can greatly affect costs. At the very least, when it comes to servers, you should have a basic idea (gleaned from measurement and usage patterns) of where you would want to invest money. Before perusing a vendor’s current pricing, be aware of what it is that you’re trying to achieve. Will this server be required to do a lot of CPU work? Will it need to perform a lot of memory work? Is it a network-bound gateway?

Today, the difference between horizontal and vertical scaling architectures are quite well known in the industry, but it’s worth reviewing in order to put capacity planning into context (Figure 2-5):

Illustration of Vertical versus Horizontal scaling
Figure 2-5. Illustration of Vertical versus Horizontal scaling
  • Being able to scale horizontally means having an architecture that allows for adding capacity by simply adding similarly functioning nodes to the existing infrastructure. For instance, a second web server to share the burden of website visits. Under horizontal scaling, only the resources required to address the bottlenecks of a service need to be scaled. Horizontal scaling is the typical choice in the case of MSAs.

  • Being able to scale vertically is the capability of adding capacity by increasing the resources internal to a server, such as CPU, memory, disk, and network. Vertical scaling is the typical choice in the case of monolithic architectures.

Since the emergence of tiered and shared-nothing architectures, horizontal scaling has been widely recognized for its advantages over vertical scaling as it pertains to web applications. Being able to scale horizontally means designing an application to handle various levels of database abstraction and distribution. You can find great approaches to horizontal application development techniques in the aforementioned books by Henderson and Schlossnagle.

The danger of relying solely on vertical scaling is that as you continue to upgrade components of a single computer, the cost rises dramatically. You also introduce the risk of a single point of failure (SPOF). Horizontal scaling involves the more complex issue of increasing the potential failure points as one expands the size of the server farm. In addition, one inherently introduces some challenges surrounding any synchronization one would need between the nodes. For example, guaranteeing strong consistency in a distributed database or a distributed warehouse requires synchronization between the various nodes. Likewise, in the context of multithreaded execution, thread synchronization is often needed (but not always) to guarantee correctness.

Diagonal scaling (a term coined by Arun) is the process of vertically scaling the horizontally scaled nodes that an enterprise already has in the infrastructure. Over time, CPU power and RAM become faster, cheaper, and cooler, and disk storage becomes larger and less expensive. Thus, it can be cost effective to keep some vertical scaling as part of the plan, but applied to horizontal nodes.

What this all boils down to is that for all the nodes bound on CPU or RAM, you can “upgrade” to fewer servers with more CPU and RAM. For disk-bound boxes, it also can mean that you might be able to replace them with fewer machines that have more disk spindles.

As an example, let’s consider an upgrade that we did while working at Yahoo! Initially, we had 67 dual-CPU, 4 GB RAM, single SATA drive web servers. For the most part, our frontend layer was CPU-bound, handling requests from client browsers, making backend database calls, and taking photo uploads. These 67 machines were equipped with Intel Xeon 2.80 GHz CPUs running Apache and PHP. When it was time to add capacity, we decided to try the new Quad Core CPU boxes. We found the dual-quad core machines had roughly three times the processing power of the existing dual-CPU boxes. With 8 CPU cores of Intel Xeon L5320 1.86 GHz CPUs, we were able to replace 67 existing boxes with only 18 new boxes. Figure 2-6 illustrates how much the server load average (across the entire cluster) dropped as a result.

Figure 2-6 shows the reduction in load average when the 67 machines were removed from the production pool and the 18 new boxes were allowed to take over for the same production load. This certainly makes for a very dramatic-looking graph, but load average might not be the best metric to illustrate this diagonal scaling exercise.

Load average drop by replacing 67 boxes with 18 higher capacity boxes
Figure 2-6. Load average drop by replacing 67 boxes with 18 higher-capacity boxes

Figure 2-7 represents the same time period as Figure 2-6, except that it details the number of Apache requests-per-second when the older servers were replaced. The shades of lines on the graph represent a single server, making it clear when the newer servers took over. Note that the amount of Apache requests-per-second actually went up by as much as 400 after the replacement, implying that the older machines were very close to their own bottlenecks.

Serving more traffic with fewer servers
Figure 2-7. Serving more traffic with fewer servers

Table 2-2 shows what this meant in terms of resources.

Table 2-2. Comparing server architectures
Servers CPU RAM Disk Power (kW) at 60 percent of peak usage
67 2 (2 cores) 4 GB 1 x 80 GB SATA 8.763
18 2 (8 cores) 4 GB 1 x 146 GB SATA 2.332

Based on traffic patterns, if we assume that the servers are working at an average of about 60 percent of their peak, this means that we’re using roughly 30 percent of the electrical power we were using previously. We’ve also saved 49U of rack space because each server needs only 1U of space. That’s more than one full, standard 42U rack emptied as a result of diagonal scaling. Not bad.

Disaster Recovery

Disaster Recovery pertains to saving business operations (along with other resources such as data, which we won’t consider in this book) after a natural or human-induced catastrophe. By catastrophe, we are not implying the failure of a single server, but a complete outage that’s usually external to the operation of the website infrastructure.

NOTE

Recent examples of outages include Delta’s IT outage (August 2016), Southwest outage (July 2016), Slack outage (June 2016), AWS outage (June 2016), Google Cloud Platform outage (April 2016), Twitter outage (January 2016), Verizon outage (January 2016), another AWS outage (September 2015). For a detailed list of outages in the recent years, refer to the section “Resources” in Chapter 6.

Examples of such disasters include datacenter power or cooling outages as well as physical disasters such as earthquakes. It also can include incidents, such as construction accidents or explosions that affect the power, cooling, or network connectivity relied upon by the site. Regardless of the cause, the effect is the same: your enterprise can’t serve the website or mobile app. Continuing to serve traffic under failure conditions is obviously an important part of web operations and architecture design. Contingency planning clearly involves capacity management. Disaster Recovery (DR) is only one part of what is termed Business Continuity Planning (BCP), which is the larger logistical plan to ensure continuity of business in the face of different failure event scenarios.

In most cases, the solution is to deploy complete architectures in two (or more) separate physical locations, which means multiplying the infrastructure costs. It also means multiplying the nodes the enterprise would need to manage, doubling all of the data replication, code, and configuration deployment, and multiplying all of the monitoring and measurement applications by the number of datacenters you deploy.

Clearly, DR plans raise both financial and technical concerns. DR and BCP are large topics in and of themselves and are beyond the scope of this book. If this topic is of particular interest to you, there are many books available—for example, Susan Snedakar’s Business Continuity and Disaster Recovery Planning for IT Professionals (O’Reilly)—dedicated specifically to this subject.

Readings

  1. T. Ruotsale et al. (2015). Interactive Intent Modeling: Information Discovery Beyond Search.

  2. J. C. Corbett et al. (2013). Spanner: Google’s Globally Distributed Database.

  3. A. Gupta et al. (2016). Mesa: A Geo-Replicated Online Data Warehouse for Google’s Advertising System.

  4. D. E. Eisenbud et al. (2016). Maglev: A Fast and Reliable Software Network Load Balancer.

Resources

  1. “Benchmarking Cassandra Scalability on AWS—Over a million writes per second.” (2011) http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html.

  2. “Mobile vs Desktop: 13 Essential User Behaviors.” (2016) http://bit.ly/mobile-vs-desktop-13.

  3. “Keywords Are Dead! Long Live User Intent!” (2013) http://bit.ly/keywords-are-dead.

  4. “Measuring Perceived Performance.” (2016) http://bit.ly/measuring-perceived.

  5. “A Practical Guide to SLAs.” (2016) http://bit.ly/sla-practical-guide.

  6. “The Very Real Performance Impact on Revenue.” (2017) http://blog.catchpoint.com/2017/01/06/performance-impact-revenue-real/.

  7. “Performance Impact of Third Party Components.” (2016) http://blog.catchpoint.com/2016/09/23/third-party-performance-impact/.

  8. “Speed Index.” https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics/speed-index.

  9. “Above the Fold Time: Measuring Web Page Performance Visually.” (2011) http://bit.ly/above-the-fold-time.

  10. “Hero Image Custom Metrics.” (2015) http://bit.ly/hero-image.

  11. “Critical Metric: Critical Resources.” (2016) http://bit.ly/crit-met-crit-res.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset