Introduction to the hyper-scale cloud infrastructure

When deploying systems or stacks to the cloud, it is important to understand the scale at which leading cloud providers operate. The three largest cloud providers have created a footprint of data centers, spanning almost every geography. They have circled the globe with large bandwidth fiber network trunks to provide low latency, high throughput connectivity to systems running across their global data center deployment. The scale at which these three top-tier cloud providers operate are so much larger than the other players that it's necessitated the industry to adopt a new designation, hypercloud. The following diagram depicts the global footprint of AWS, the largest cloud provider by total compute power (estimated by Gartner):

Figure 5.1: https://aws.amazon.com/about-aws/global-infrastructure/

In this image, each orange dot represents an autonomous region. The number within each orange dot represents the number of AZs. Green circles represent regions that have been announced but are not yet generally available.

In addition to the fleet of data centers that have been built and are managed, hypercloud providers have deployed high bandwidth trans-oceanic/continental fiber networks to effectively manage traffic between each of their data center clusters. This allows customers of the respective platforms to create distributed applications across multiple geographies with very low latency and cost.

Now let's have a look at the image that captures AWS global data center and network trunk infrastructure (AWS shared their global schematics publicly for the first time at their annual developer conference re:Invent in November 2016, which was presented by AWS VP and the distinguished engineer James Hamilton). The image is as follows:

Figure 5.2

Google Cloud Platform (GCP) and Microsoft Azure provide similar geographic coverage. All three hypercloud platforms provide core cloud services from an infrastructure abstraction called a region. A region is a collection of data centers that operate together under strict latency and performance requirements. This in turn allows consumers of the cloud platform to disperse their workloads across the multiple data centers comprising the region. AWS calls this availability zones (AZs) and GCP calls it zones.

Cloud Native Architecture Best Practice: Disperse workloads across multiple zones within a region to make the stack highly available and more durable against hardware or application component failure. This often costs nothing extra to implement, but gives stacks a distinct operational advantage by dispersing compute nodes across multiple isolated data centers.

The following diagram shows three different AZs within one region:

Figure 5.3

As shown in the previous diagram, dispersing compute workloads across multiple AZs within a region reduces the blast radius in the event of a service interruption (whether it's an application or machine failure). This usually costs nothing extra from a cost perspective, as the AZs are purposefully built to be highly performant and extensible with each other within a region.

A key concept to understand and minimize when designing cloud-native systems is the blast radius. We define the blast radius as the applications or ancillary systems that can be affected by the failure of a core design component. This concept can be applied to a data center or a single microservice. The purpose is to understand the dependencies within your own architecture, evaluate your appetite for risk, quantify the cost implications of minimizing your blast radius, and design your stack within these parameters.

Central to minimizing your blast radius in the cloud is effectively utilizing the distributed quality of the regions and zones. There is a series of services that are offered by major providers that help architects do this effectively: load balancers and auto-scaling groups.

Load balancers are not a unique development to cloud, but the major platforms all have native services that provide this functionality. These native services provide much higher levels of availability, as they are running on many machines instead of a virtual load balancer running on a VM operated by the cloud consumer.

Cloud Native Architecture Best Practice: Use the native, cloud-offered load balancers whenever possible. These relinquish operational maintenance from the cloud consumer to the cloud provider. The cloud user no longer has to worry about maintaining the uptime and health of the load balancer, as that will be managed by the CSP. The load balancer runs across a series of containers or fleet of VMs, meaning that hardware or machine failures are dealt with seamlessly in the background without impact to the user.

The concept of load balancers should be familiar to all who work in the IT industry. Flowing from and adding to the concept of LBs are cloud services, which allow users to provide DNS services coupled with load balancing. This combination allows users to build globally available applications that extend seamlessly to geographies of your choosing. Services such as Amazon Route53 on the AWS platform allow users to engineer latency-based routing rules and geo-restriction to connect end consumers to the most performant stack (or to preclude availability of these services based on the end users' location). An example of this would be restricting access to your application to users based in Iran, Russia, or North Korea for the purposes of following the current sanctions laws.

Cloud Native Architecture Best Practice: Use cloud-native Domain Name System (DNS) services (AWS Route53, Azure DNS, or GCP Cloud DNS). These services integrate natively with load balancing services on each platform. Use routing policies such as latency-based routing (LTR) to build globally available performant applications that run across multiple regions. Use features such as Geo DNS to route requests from specific geographies (or countries) to specific endpoints.

Another important tool in the cloud-native toolbox is the deployment of auto scaling groups (ASG) – a novel service abstraction that allows users to replicate and expand application VMs dynamically based on various alarms or flags. In order to deploy an ASG, a standardized image of the application must first be pre-configured and stored. ASGs must almost always be coupled with load balancers in order to be used effectively, since traffic from the application consumer must be intelligently routed to an available and performant compute node in the ASG fleet. This can be done in several different ways, including round robin balancing or deploying a queueing system. A basic auto-scaling configuration across two AZs is shown in the following diagram:

Figure 5.4

The ASG in the previous diagram is set to a minimum of two VMs, one in each availability zone (Web Servers #1 and Web Servers #2). Traffic to these nodes is depicted in black. As more users access the application, the ASG reacts by deploying more servers to the ASG across the multiple AZs (Web Servers #3 and Web Servers #4), with the additional traffic depicted in gray.

Now we have introduced the critical elements in a resilient cloud system: load balancing, auto-scaling, multi-AZ/ region deployments, and global DNS. The auto-scaling feature of this stack can be replicated using elastically expanding/contracting fleets of containers as well.

Cloud Native Architecture Best Practice: When architecting a stack, it is more cost-effective to use smaller, more numerous VMs behind a load balancer. This gives you greater cost granularity and increases redundancy in the overall system. Creating a stateless architecture is also recommended, as it removes dependencies on a single application VM, making session recovery in the event of a failure far simpler.

Let's have a look at the comparison between auto-scaling groups with different VM sizes:

Figure 5.5

There is an advantage to creating auto-scaling groups with smaller VMs. The blue line in the preceding graph represents the demand or traffic for a given application. The blocks each represent one virtual machine. The VMs in the auto-scaling group deployment on the left contain more memory, computing power, and a higher hourly cost; the VMs in the right deployment have less memory, computing power, and hourly cost.

In the preceding graph, we have demonstrated the advantage of using smaller compute nodes in an ASG fleet. Any spaces within a block that appear above the blue line are wasted resources that are paid for. This means that the performance of the system is not ideally tuned to the application's demand, leading to idle resources. By minimizing the VM specs, significant cost savings can be achieved. Furthermore, by reducing the size of the VMs, the application stack becomes more distributed in nature. The failure of one VM or container will not jeopardize the entire health of the stack since there are more VMs in the group to fail over to.

Utilizing load balancers services with auto scaling groups; dynamic routing; stateless architectures utilizing highly available and performant DB services, dispersed among multiple zones (or groups of data centers); represents a mature, cloud-native architecture.

Table of Contents for Introduction to the hyper-scale cloud infrastructure

Create new playlist

Sign In

Sign Up

Table of Contents for
Introduction to the hyper-scale cloud infrastructure