2. Distribute Your Work

When you hear the word distribute you might immediately think of grid computing—the concept of dividing tasks into small chunks of work that can be farmed out to two or more computers, each of which performs a piece of the task necessary for the final answer. If you’re interested in that topic, you should see Chapters 28 and 30 of The Art of Scalability. In this chapter we discuss how you can distribute your data and application services across multiple systems to ensure you have the ability to scale to meet your customer’s demands.

The concept of distributing work can be analogized to painting a picket fence. Let’s say you and your four friends want to play baseball but you’ve been tasked with painting the fence before you can play. If you have 25 pickets (vertical posts) that you need to paint white and each picket takes 1 minute to paint, you could complete this task in 25 minutes (give or take for cleanup and other miscellaneous tasks). Alternatively, your four buddies could each pick up a paintbrush, instead of lying around asking you to hurry up, and help paint. With five people painting 1 picket each per minute, you can be done and on your way to play baseball in a matter of just 5 minutes (25 pickets / 5 people × 1 picket per person per minute). The lesson learned is the more you can divide up the work the greater throughput (work/time) that you can achieve resulting in greater scalability.

This chapter discusses scaling databases and services through cloning and replication, separating functionality or services, and splitting similar data sets across storage and application systems. Utilizing these three approaches, you will be able to scale nearly any system or database to a level that approaches infinite scalability. We use the word approaches here as a bit of a hedge, but in our experience across more than a hundred companies and thousands of databases and systems these techniques have yet to fail. To help visualize these three approaches to scale we employ the AKF Scale Cube, a diagram we developed to represent these methods of scaling systems. Figure 2.1 shows the AKF Scale Cube, which is named after our partnership, AKF Partners.

Figure 2.1. AKF Scale Cube

image

At the heart of the AKF Scale Cube are three simple axes, each with an associated rule for scalability. The cube is a great way to represent the path from minimal scale (lower-left front of the cube) to near infinite scalability (upper-right back corner of the cube). Sometimes, it’s easier to see these three axes without the confined space of the cube. Figure 2.2 shows these three axes along with their associated rules. We cover each of these three rules in this chapter.

Figure 2.2. Three axes of scale

image

Rule 7—Design to Clone Things (X Axis)

The hardest part of a system to scale is almost always the database or persistent storage tier. The beginning of this problem can be traced back to Edgar F. Codd’s 1970 paper “A Relational Model of Data for Large Shared Data Banks”1 which is credited with introducing the concept of the Relational Database Management System (RDBMS). Today’s most popular RDBMSs, such as Oracle, MySQL, and SQL Server, just as the name implies, allow for relations between data elements. These relationships can exist within or between tables. The tables of most On Line Transactional Processing (OLTP) systems are normalized to third normal form,2 where all records of a table have the same fields, nonkey fields cannot be described by only one of the keys in a composite key, and all nonkey fields must be described by the key. Within the table each piece of data is related to other pieces of data in that table. Between tables there are often relationships, known as foreign keys. Most applications depend on the database to support and enforce these relationships because of its ACID properties (see Table 2.1). Requiring the database to maintain and enforce these relationships makes it difficult to split the database without significant engineering effort.

Table 2.1. ACID Properties of Databases

image

One technique for scaling databases is to take advantage of the fact that most applications and databases perform significantly more reads than writes. A client of ours that handles booking reservations for customers has on average 400 searches for a single booking. Each booking is a write and each search a read, resulting in a 400:1 read to write ratio. This type of system can be easily scaled by creating read-only copies (or replicas) of the data

There are a couple ways that you can distribute the read-copy of your data depending on the time sensitivity of the data. Time sensitivity is how fresh or completely correct the read-copy has to be relative to the write copy. Before you scream out that the data has to be instantly, real time, in sync, and completely correct across the entire system, take a breath and appreciate the costs of such a system. While perfectly in sync data is ideal, it costs...a lot. Furthermore, it doesn’t always give you the return that you might expect or desire for that cost.

Let’s go back to our client with the reservation system that has 400 reads for every write. They’re handling reservations for customers so you would think the data they display to customers would have to be completely in sync. For starters you’d be keeping 400 sets of data in sync for the 1 piece of data that the customer wants to reserve. Second, just because the data is out of sync with the primary transactional database by 3 or 30 or 90 seconds doesn’t mean that it isn’t correct, just that there is a chance that it isn’t correct. This client probably has 100,000 pieces of data in their system at any one time and books 10% of those each day. If those bookings are evenly distributed across the course of a day they are booking one reservation just about every second (0.86 sec). All things being equal, the chance of a customer wanting a particular booking that is already taken by another customer (assuming a 90 second sync of data) is 0.104%. Of course even at 0.1% some customers will select a booking that is already taken, which might not be ideal but can be handled in the application by doing a final check before allowing the booking to be placed in their cart. Certainly every application’s data needs are going to be different, but from this discussion hopefully you will get a sense of how you can push back on the idea that all data has to be kept in sync in real time.

Now that we’ve covered the time sensitivity, let’s start discussing the ways to distribute the data. One way is to use a caching tier in front of the database. An object cache can be used to read from instead of going back to the application for each query. Only when the data has been marked expired would the application have to query the primary transactional database to retrieve the data and refresh the cache. Given the availability of numerous excellent, open source key-value stores that can be used as object caches, this is a highly recommended first step.

The next step beyond an object cache between the application tier and the database tier, is replicating the database. Most major relational database systems allow for some type of replication “out of the box.” MySQL implements replication through the master-slave concept—the master database being the primary transactional database that gets written to and the slave databases are read-only copies of the master databases. The master database keeps track of updates, inserts, deletes, and so on in a binary log. Each slave requests the binary log from the master and replays these commands on its database. While this is asynchronous, the latency between data being updated in the master and then in the slave can be very low. Often this implementation consists of several slave databases or read replicas that are configured behind a load balancer. The application makes a read request to the load balancer, which passes the request in either a round robin or least connections manner to a read replica.

We call this type of split an X axis split, and it is represented on the AKF Scale Cube in Figure 2.1 as the X axis – Horizontal Duplication. An example that many developers familiar with hosting Web applications will recognize is on the Web or application tier of a system, running multiple servers behind a load balancer all with the same code. A request comes in to the load balancer that distributes it to any one of the many Web or application servers to fulfill. The great thing about this distributed model on the application tier is that you can put dozens, hundreds, or even thousands of servers behind load balancers all running the same code and handling similar requests.

The X axis can be applied to more than just the database. Web servers and application servers typically can be easily cloned. This cloning allows the distribution of transactions across systems evenly for horizontal scale. This cloning of application or Web services tends to be relatively easy to perform, and allows us to scale the number of transactions processed. Unfortunately, it doesn’t really help us when trying to scale the data we must manipulate to perform these transactions. In memory, caching of data unique to several customers or unique to disparate functions might create a bottleneck that keeps us from scaling these services without significant impact to customer response time. To solve these memory constraints we’ll look to the Y and Z axes of our scale cube.

Rule 8—Design to Split Different Things (Y Axis)

When you put aside the religious debate around the concepts of services (SOA) and resources (ROA) oriented architectures and look deep into their underlying premises, they have at least one thing in common. Both concepts force architects and engineers to think in terms of separation of responsibilities within their architectures. At a high and simple level, they do this through the concepts of verbs (services) and nouns (resources). Rule 8, and our second axis of scale, takes the same approach. Put simply, Rule 8 is about scaling through the separation of distinct and different functions and data within your site. The simple approach to Rule 8 tells us to split up our product by either nouns or verbs or a combination of both nouns and verbs.

Let’s split up our site using the verb approach first. If our site is a relatively simple ecommerce site, we might break our site into the necessary verbs of signup, login, search, browse, view, add-to-cart, and purchase/buy. The data necessary to perform any one of these transactions can vary significantly from the data necessary for the other transactions. For instance, while signup and login might be argued to need the same data, they have some data that is unique and distinct. Signup, for instance, probably needs to be capable of checking whether a user’s preferred ID has been chosen by someone else in the past, while login might not need to have a complete understanding of every other user’s ID. Signup likely needs to write a fair amount of data to some permanent data store, while login is likely a read-intensive application to validate a user’s credentials. Signup may require that the user store a fair amount of personally identifiable information including credit card numbers, while login does not likely need access to all of this information at the time that a user would like to establish a login.

The differences and resulting opportunities for this method of scale become even more apparent when we analyze obviously distinct functions as is the case between search and login. In the case of login we are mostly concerned with validating the user’s credentials and potentially establishing some notion of session (we’ve chosen the word session rather than state for a reason we explore in Rule 40). Login is concerned about the user and as a result needs to cache and interact with data about that user. Search, on the other hand, is concerned about the hunt for an item and is most concerned about user intent (vis-à-vis a search string, query, or search terms typically typed into a search box) and the items that we have in stock within our catalog of items. Separating these sets of data allows us to cache more of them within the confines of memory available on our system and process transactions faster as a result of higher cache hit ratios. Separating this data within our backend persistence systems (such as a database) allows us to dedicate more “in memory” space within those systems and respond faster to the clients (application servers) making requests. Both systems respond faster as a result of better utilization of system resources. Clearly we can now scale these systems more easily and with less memory constraints. Moreover, the Y axis adds transaction scalability by splitting up transactions in the same fashion as Rule 7, the X axis of scale.

Hold on! What if we want to merge information about the user and our products such as in the case of recommending products? Note that we have just added another verb—recommend. This gives us another opportunity to perform a split of our data and our transactions. We might add a recommendation service that asynchronously evaluates past user purchase behavior with users of similar purchase behaviors. This in turn may populate data in either the login function or the search function for display to the user when he or she interacts with the system. Or it can be a separate synchronous call made from the user’s browser to be displayed in an area dedicated to the result of the recommend call.

Now how about using nouns to split items? Again, using our ecommerce example we might identify certain resources upon which we will ultimately take actions (rather than the verbs that represent the actions we take). We may decide that our ecommerce site is made up of a product catalog, product inventory, user account information, marketing information, and so on. Using our noun approach, we may decide to split up our data by these categories and then define a set of high-level primitives such as create, read, update, and delete actions on these primitives.

While Y axis splits are most useful in scaling data sets, they are also useful in scaling code bases. Because services or resources are now split, the actions we perform and the code necessary to perform them are split up as well. This means that very large engineering teams developing complex systems can become experts in subsets of those systems and don’t need to worry about or become experts on every other part of the system. And of course because we have split up our services, we can also scale transactions fairly easily.

Rule 9—Design to Split Similar Things (Z Axis)

Often referred to as sharding and podding, Rule 9 is about taking one data set or service and partitioning it into several pieces. These pieces are often equal sized but may be of different sizes if there is value in having several unequal sized chunks or shards. One reason to have unequal sized shards is to enable application rollouts that limit your risk by affecting first a small customer segment, and then increasingly large segments of customers as you feel you have identified and resolved major problems.

Often sharding is accomplished by separating something we know about the requestor or customer. Let’s say that we are a timecard and attendance tracking software as a service provider. We are responsible for tracking the time and attendance for employees for each of our clients who are in turn enterprise class customers with more than 1,000 employees each. We might determine that we can easily partition or shard our solution by company, meaning that each company could have its own dedicated Web, application, and database servers. Given that we also want to leverage the cost efficiencies enabled by multitenancy, we also want to have multiple small companies exist within a single shard. Really big companies with many employees might get dedicated hardware, while smaller companies with fewer employees cohabit within a larger number of shards. We have leveraged the fact that there is a relationship between employees and companies to create scalable partitions of systems that allow us to employ smaller, cost-effective hardware and scale horizontally (we discuss horizontal scale further in Rule 10).

Maybe we are a provider of advertising services for mobile phones. In this case, we very likely know something about the end user’s device and carrier. Both of these create compelling characteristics by which we can partition our data. If we are an ecommerce player, we might split users by their geography to make more efficient use of our available inventory in distribution centers. Or maybe we create partitions of data that allow us to evenly distribute users based on the recency, frequency, and monetization of their purchases. Or, if all else fails, maybe we just use some modulus or hash of a user identification (userid) number that we’ve assigned the user at signup.

Why would we ever decide to partition similar things? For hyper growth companies, the answer is easy. The speed with which we can answer any request is at least partially determined by the cache hit ratio of near and distant caches. This speed in turn indicates how many transactions we can process on any given system, which in turn determines how many systems we need to process a number of requests. In the extreme case, without partitioning of data, our transactions might become agonizingly slow as we attempt to traverse huge amounts of monolithic data to come to a single answer for a single user. Where speed is paramount and the data to answer any request is large, designing to split different things (Rule 8) and similar things (Rule 9) become necessities.

Splitting similar things obviously isn’t just limited to customers, but customers are the most often and easiest implementation of Rule 9 within our consulting practice. Sometimes we recommend splitting product catalogs for instance. But when we split diverse catalogs into items such as lawn chairs and diapers, we often categorize these as splits of different things. We’ve also helped clients shard their systems by splitting along a modulus or hash of a transaction ID. In these cases, we really don’t know anything about the requestor, but we do have a monotonically increasing number upon which we can act. These types of splits can be performed on systems that log transactions for future reference as in a system designed to retain errors for future evaluation.

Summary

We maintain that three simple rules can help you scale nearly everything. There are undoubtedly more ways to scale systems and platforms, but armed with these three rules, few if any scale related problems will stand in your way:

Scale by cloning— Cloning or duplicating data and services allows you to scale transactions easily.

Scale by splitting different things— Use nouns or verbs to identify data and services to separate. If done properly, both transactions and data sets can be scaled efficiently.

Scale by splitting similar things— Typically these are customer data sets. Set customers up into unique and separated shards or swimlanes (see Chapter 9, “Design for Fault Tolerance and Graceful Failure,” for swimlane definition) to enable transaction and data scaling.

Endnotes

1 Edgar F. Codd, “A Relational Model of Data for Large Shared Data Banks,” 1970, www.eecs.umich.edu/~klefevre/eecs584/Papers/codd_1970.pdf.

2 Wikipedia, “Third normal form,” http://en.wikipedia.org/wiki/Third_normal_form.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset