Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3
Designing Solutions to Meet Technical Requirements

The Google Cloud Professional Architect exam will test your ability to understand technical requirements that are explicitly stated, as well as implied, in case studies and questions. Technical requirements may specify a particular hardware or software constraint. For example, an application may need to use a MySQL 8.0 database or be able to transmit 1 GB of data between an on-premises data center and the Google Cloud Platform. Technical requirements do not necessarily specify all details that you will need to know. If a question states that a virtual private cloud will have three subnets, then you will have to infer from that statement that the subnets will need to be configured with distinct, nonoverlapping address spaces. It is common for questions about technical requirements to require you to choose among multiple solutions and to understand some unstated implication of the requirement so that you can make a choice among possible solutions.

In this chapter, we will consider three broad categories of technical requirements.

High availability
Scalability
Reliability

We will use the case studies as jumping-off points for discussing these kinds of requirements. We will consider how each of these factors influences the choices we make about compute, storage, networking, and specialized services.

The most important piece of information to take away from this chapter is that availability, scalability, and reliability are not just important at the component or subsystem level but across the entire application infrastructure. Highly reliable storage systems will not confer high reliability on a system if the networking or compute services are not reliable.

High Availability

High availability is the continuous operation of a system at sufficient capacity to meet the demands of ongoing workloads. Availability is usually measured as a percent of time that a system is available and responding to requests with latency not exceeding some certain threshold. Table 3.1 shows the amount of allowed downtime at various service-level agreement (SLA) levels. An application with a 99 percent availability SLA can be down for 14.4 minutes per day, while a system with a 99.999 percent availability can be down for less than one second per day without violating the SLA.

TABLE 3.1 Example availability SLAs and corresponding downtimes

Percent Uptime	Downtime/Day	Downtime/Week	Downtime/Month
99.00	14.4 minutes	1.68 hours	7.31 hours
99.90	1.44 minutes	10.08 minutes	43.83 minutes
99.99	8.64 seconds	1.01 minutes	4.38 minutes
99.999	864 milliseconds	6.05 seconds	26.3 seconds
99.9999	86.4 milliseconds	604.8 milliseconds	2.63 seconds

High availability SLAs, such as these, must account for the fact that hardware and software fails. Individual physical components, such as a disk drive running in a particular disk array, may have a small probability of failing in a one-month period. If you are using thousands of drives, then it is much more likely that at least one of them will fail.

When designing high availability applications, you have to plan for failures. Failures can occur at multiple points in an application stack:

An application bug
A service that the application depends on is down
A database disk drive fills up
A network interface card fails
A router is down
A network engineer misconfigures a firewall rule

We can mitigate the risk of hardware failures, in part, with redundancy. Instead of writing data to one disk, we write it to three disks. Rather than have a single server running an application, we create instance groups with multiple servers and load balance workload among them. We install two direct network connections between our data center and the GCP—preferably with two different telecommunication vendors. Redundancy is also a key element of ensuring scalability, but it also requires autohealing or other automated repair mechanisms to ensure continued availability.

We compensate for software and configuration errors with software engineering and DevOps best practices. Code reviews, multiple levels of testing, and running new code in a staging environment can help identify bugs before code is released to production. Canary deployments, in which a small portion of a system's workload is routed to a new version of the software, allow us to test code under production conditions without exposing all users to new code. If there is a problem with the new version of software, it will affect only a portion of the users before it is rolled back. Automating infrastructure deployments, by treating infrastructure as code, reduces the need for manual procedures and the chance to make a mistake when entering commands.

As you design systems with an eye for high availability, keep in mind the role of redundancy and best practices for software development and DevOps.

Compute Availability

The GCP offers several compute services. We'll consider availability in four of these services.

Compute Engine
Kubernetes Engine
App Engine
Cloud Functions

Each of these services can provide high availability compute resources, but they vary in the amount of effort required to achieve high availability.

High Availability in Compute Engine

High availability in Compute Engine is ensured by several different mechanisms and practices.

Hardware Redundancy and Live Migration

At the physical hardware level, the large number of physical servers in the GCP provide redundancy for hardware failures. If a physical server fails, others are available to replace it.

Google also provides live migration, which moves VMs to other physical servers when there is a problem with a physical server or scheduled maintenance occurs. Live migration is also used when network or power systems are down, security patches need to be applied, or configurations need to be modified. Live migration is not available for preemptible VMs, however, but preemptible VMs are not designed to be highly available. At the time of this writing, VMs with GPUs attached are not available to live migrate. Constraints on live migration may change in the future. The descriptions of Google services here are illustrative and designed to help you learn how to reason about GCP services so you can answer exam questions. For up-to-date details on services, always consult Google Cloud documentation.

Managed Instance Groups

High availability also comes from the use of redundant VMs. Managed instance groups are the best way to create a cluster of VMs, all running the same services in the same configuration. A managed instance group uses an instance template to specify the configuration of each VM in the group. Instance templates specify machine type, boot disk image, and other VM configuration details. If a VM in the instance group fails, another one will be created using the instance template.

Managed instance groups (MIGs) provide other features that help improve availability. A VM may be operating correctly, but the application running on the VM may not be functioning as expected. Instance groups can detect this using an application-specific health check. If a VM instance fails the health check, the managed instance group will kill the failing instance and create a new instance. This feature is known as autohealing.

Managed instance groups use load balancing to distribute workload across instances. If an instance is not available, traffic will be routed to other servers in the instance group. Instance groups can be configured as regional instance groups. This distributes instances across multiple zones. If there is a failure in a zone, the application can continue to run in the other zones.

Multiple Regions and Global Load Balancing

Beyond the regional instance group level, you can further ensure high availability by running your application in multiple regions and using a global load balancer to distribute workload. This would have the added advantage of allowing users to connect to an application instance in the closest region, which could reduce latency. You would have the option of using the HTTP(S), SSL Proxy, or TCP Proxy load balancers for global load balancing.

High Availability in Kubernetes Engine

Kubernetes Engine is a managed Kubernetes service that is used for container orchestration. Kubernetes is designed to provide highly available containerized services. High availability in GKE Kubernetes clusters comes both from Google's technical processes and from the design of Kubernetes itself.

VMs in a GKE Kubernetes cluster are members of a managed instance group, so they have all the high availability features described previously.

Kubernetes continually monitors the state of containers and pods. Pods are the smallest unit of deployment in Kubernetes; they usually have one container, but in some cases a pod may have two or more tightly coupled containers. If pods are not functioning correctly, they will be shut down and replaced. Kubernetes collects statistics, such as the number of desired pods and the number of available pods, which can be reported to Cloud Monitoring.

Kubernetes Engine clusters can be zonal or regional. To improve availability, you can create a regional cluster in GKE, the managed service that distributes the underlying VMs across multiple zones within a region. GKE replicates control plane servers and nodes across zones. Control plane servers run several services including the API server, scheduler, and resource controller and, when deployed to multiple zones, provide for continued availability in the event of a zone failure.

High Availability in App Engine and Cloud Functions

App Engine and Cloud Functions are fully managed compute services. Users of these services are not responsible for maintaining the availability of the computing resources. The Google Cloud Platform ensures the high availability of these services.

Of course, App Engine and Cloud Functions applications and functions may fail and leave the application unavailable. This is a case where the software engineering and DevOps best practices can help improve availability.

High Availability Computing Requirements in Case Studies

All four case studies have requirements for high availability computing.

In the EHR Healthcare case study, the executive statement includes “We want to use Google Cloud to leverage a scalable, resilient platform that can span multiple environments seamlessly and provide a consistent and stable user experience that positions us for future growth.” While the statement does not explicitly cite high availability, we can assume if the company wants a scalable and resilient platform to build on, then they will want high availability as well. The organization has containerized their customer-facing applications and runs them in Kubernetes, which of course is designed to provide for high availability of services.
In the Helicopter Racing League case study, high availability is needed for both streaming services and predictive analytics. Helicopter Racing League streams races live and so needs highly available services to ensure content is delivered with low latency to all viewers. If a streaming service were unavailable during a race, it would significantly impact viewer experience. Since the company is focusing on improving predictions made during races and providing their partners with access to predictive models, any platform used for creating those predictions will need to be highly available.
In the Mountkirk Games case study, there is no explicit requirement for high availability, but it is implied. Given the nature of online games, users expect to be able to continue to play once they start and until they decide to stop. The case study specifies that a new game platform will run on Kubernetes Engine so that the company can take advantage of autoscaling. In addition, they plan to collect more telemetry and player data noting that “[w]e were able to analyze player behavior and game telemetry in ways that we never could before.” Services must be highly available to ingest, process, and analyze telemetry and player data without losing data or experiencing delays in analysis.
In the TerramEarth case study, the company needs highly available data ingestion and dealer-facing applications. Over two million vehicles are in operation, and that number is growing at 20 percent per year. These vehicles transmit critical data during operation so it must be reliably collected, and any ingestion system will need to scale with the projected growth. The company is also developing APIs for dealers who will likely expect highly available services. TerramEarth developers and contractors also need reliable and scalable workflows and CI/CD pipelines.

Storage Availability

Highly available storage is storage that is available and functional at nearly all times. The storage services can be grouped into the following categories:

Object storage
File and block storage
Database services
Caching

Let's look at availability in each type of storage service.

Availability of Object, File, and Block Storage

Cloud Storage is a fully managed object storage service. Google maintains high availability of the service. As with other managed services, users do not have to do anything to ensure high availability.

Cloud Filestore is another managed storage service. It provides filesystem storage that is available across the network. High availability is ensured by Google.

Persistent disks (PDs) are SSDs and hard disk drives that can be attached to VMs. These disks provide block storage so that they can be used to implement filesystems and database storage. Persistent disks continue to exist even after the VMs shut down. One of the ways in which persistent disks enable high availability is by supporting online resizing. Also, GCP offers both zone persistent disks and regional persistent disks. Regional persistent disks are replicated in two zones within a region. Persistent disks are further categorized by performance characteristics into several types:

Zonal standard PDs, which are efficient and reliable block storage devices up to 64 TB
Regional standard PDs, which are like zonal standard PDs but replicated across two zones in a region
Zonal balanced PDs, which have higher IOPS rates than standard PDs
Regional balanced PDs, which are like zonal balanced PDs but replicated across two zones in a region
Zonal SSD PDs, which have higher IOPS rates than balanced or standard PDs
Regional SSD PDs, which are like zonal SSD PDs but replicated across two zones in a region
Zonal extreme PDs, which are the highest-performance block storage option

The higher the performance of the persistent disk, the higher the cost. Durability also varies across persistent disk type. Zonal standard persistent disks have better than 99.99 percent durability while zonal balanced PDs, zonal SSD PDs, and regional standard PDs, have better than 99.999 percent durability. Zonal extreme PDs and regional SSD PDs have better than 99.9999 percent durability.

Availability of Databases

GCP users can choose between running database servers in VMs that they managed or using one of the managed database services.

Self-Managed Databases

When running and managing a database, you will need to consider how to maintain availability if the database server or underlying VM fails. Redundancy is the common approach to ensuring availability in databases. How you configure multiple database servers will depend on the database system you are using.

For example, PostgreSQL has several options for using combinations of primary servers, hot standby servers, and warm standby servers. A hot standby server can take over immediately in the event of a primary server failure. A warm standby may be slightly behind in reflecting all transactions. PostgreSQL employs several methods for enabling failover, including the following:

Shared disk, in which case multiple databases share a disk. If the primary server fails, the standby starts to use the shared disk.
Filesystem replication, in which changes in the master server filesystem are mirrored on the failover server's filesystem.
Synchronous multimaster replication, in which each server accepts writes and propagates changes to other servers.

Other database management systems offer similar capabilities. The details are not important for taking the Professional Cloud Architect exam, but it is important to understand how difficult it is to configure and maintain highly available databases. In contrast, if you were using Cloud SQL, you could configure high availability in the console by opting for a high availability configuration.

Managed Databases

GCP offers several managed databases. All have high availability features.

Fully managed and serverless databases, such as Cloud Firestore and BigQuery, are highly available, and Google attends to all of the deployment and configuration details to ensure high availability.

The database servers that require users to specify some server configuration options, such as Cloud SQL and Bigtable, can be made more or less highly available based on the use of regional replication. For example, in Bigtable, regional replication enables primary-primary replication among clusters in different zones. This means that both clusters can accept reads and writes, and changes are propagated to the other cluster. In addition to reads and writes, regional replication in Bigtable replicates other changes, such as updating data, adding or removing column families, and adding or removing tables.

In general, the availability of databases is based on the number of replicas and their distribution. The more replicas and the more they are dispersed across zones, the higher the availability. Keep in mind that as you increase the number of replicas, you will increase costs and possibly latency if all replicas must be updated before a write operation is considered successful. Also consider if the data storage system you choose is available within a zone or across regions.

Availability of Caching

Caching is the practice of storing data in low-latency storage to improve application or database performance. For example, if a particular query is frequently invoked in a database application, the query may respond faster if the data is in memory than if the memory were retrieved from a standard persistent disk. Caches are typically optimized for low latency and often come with low durability. Snapshots of the state of a cache may be saved to persistent storage to provide a point of recovery, but such snapshots are not as general purpose as a database table saved to persistent disk.

Cloud Memorystore is a high availability cache service in Google Cloud that supports both Memcached and Redis. This managed cache service can be used to improve availability of data that requires low latency access. Instead of storing data in the memory of a virtual machine or container, which can fail and lose the state of memory, application designers can use Cloud Memorystore to provide high availability of data that requires low latency.

High Availability Storage Requirements in Case Studies

The Mountkirk Games case study notes that the company plans to offer a global leader board using Cloud Spanner, which provides for both high availability and multiregion to global strongly consistent transactions. Game player data, such as the state of play and possessions and attributes of players, could be stored in a NoSQL database such as Bigtable, which provides both low-latency reads/writes and scalability.

One of the technical requirements is to “store game activity logs in structured files for future analysis,” which is a good candidate for Cloud Storage, which can scale to store log files as needed. When it is time to analyze log data, files can be loaded into BigQuery, a fully managed analytical database, or accessed by as external, federated tables stored in Cloud Storage.

TerramEarth needs to store telemetry data ingested in real time. This data is time-series data, which means that each record has a time stamp, identifiers indicating the piece of equipment that generated the data, and a series of metrics. Bigtable is a good option when you need to write large volumes of data in real time at low latency. Bigtable has support for regional replication, which improves availability.

EHR Healthcare uses a combination of relational and NoSQL databases. If the company continues to manage their databases rather than use a managed service, such as Cloud SQL, then they should consider using regional or zonal persistent disks and choose based on their availability requirements.

Helicopter Racing League performs encoding and transcoding in the cloud. If low-latency persistent storage access is important for these processes, then the company should consider extreme PDs or local SSDs if their encoding and transcoding pipelines can tolerate the loss of a zonal or local disk. Object storage is used with Helicopter Racing League's current cloud provider to store content; Cloud Storage could provide the same function in Google Cloud. The focus on building predictive models means the company will need to store large volumes of content, such as all race recordings. Telemetry data from racing helicopters as well as from viewers watching races could be stored in Bigtable, which provides for scalability, low-latency reads and writes, as well as key lookup and range scan lookups. Bigtable could be the source of structured data, such as time-series data, for building machine learning models, while Cloud Storage could store unstructured contents, such as audio and video.

Network Availability

When network connectivity is down, applications are unavailable. There are two primary ways to improve network availability:

Use redundant network connections
Use Premium Tier networking

Redundant network connections can be used to increase the availability of the network between an on-premises data center and Google's data center. One type of connection is a Dedicated Interconnect, which can be used with a minimum of 10 Gbps throughput and does not traverse the public internet. A Dedicated Interconnect is possible when both your network and the Google Cloud network have a point of presence in a common location, such as a data center. When your network does not share a common point of presence with the Google Cloud network, you have the option of using a Partner Interconnect. When using a Partner Interconnect, you provision a network link between your data center and a Google network point of presence. Traffic flows through a telecommunication provider's network from your data center to Google Cloud's network. Traffic does not travel over the internet.

VPNs can also be used when sending data over the internet is not a problem. You should choose among these options based on cost, security, throughput, latency, and availability considerations. Google Cloud offers a high availability VPN, known as HA VPN, which uses redundant connections and offers a 99.99 percent SLA.

Data within the GCP can be transmitted among regions using the public internet or Google's internal network. The latter is available as the Premium Network Tier, which costs more than the Standard Network Tier, which uses the public internet. The internal Google network is designed for high availability and low latency, so the Premium Tier should be considered if global network availability is a concern. Note, if you plan to use global load balancing, you will need to use Premium Tier networking.

High Availability Network Requirements in Case Studies

The case studies do not provide explicit networking requirements other than an implied expectation that the network is always available. An architect should inquire about additional requirements that might determine if Premium Tier networking is required or if multiple network connections among on-premises and Google data centers are needed.

Application Availability

Application availability builds on compute, storage, and networking availability. It also depends on the application itself. Designing software for high availability is beyond the scope of this book, and it is not a subject you will likely be tested on when taking the Professional Cloud Architect exam.

Architects should understand that they can use Cloud Monitoring and Cloud Logging to observe the state of applications so that they can detect problems as early as possible. Applications that are instrumented with custom metrics can provide application-specific details that could be helpful in diagnosing problems with an application.

Scalability

Scalability is the process of adding and removing infrastructure resources to meet workload demands efficiently. Different kinds of resources have different scaling characteristics. Here are some examples:

VMs in a managed instance group scale by adding or removing instances from the group.
Kubernetes scales pods based on load and configuration parameters.
NoSQL databases scale horizontally, but this introduces issues around consistency.
Relational databases can scale horizontally, but that requires server clock synchronization if strong consistency is required among all nodes. Cloud Spanner uses the TrueTime service, which depends on atomic clocks and GPS signals to ensure a low, upper bound on the difference in time reported by clocks in a distributed system.

As a general rule, scaling stateless applications horizontally is straightforward. Stateful applications are difficult to scale horizontally, and vertical scaling is often the first choice when stateful applications must scale. Alternatively, stateful applications can move state information out of the individual containers or VMs and store it in a cache, like Cloud Memorystore, or in a database. This makes scaling horizontally less challenging.

Remember that different kinds of resources will scale at different rates. Compute-intensive applications may need to scale compute resources at a faster rate than storage. Similarly, a database that supports large volumes that is not often queried may need to scale up storage faster than compute resources. To facilitate efficient scaling, it helps to decouple resources that scale at different rates.

For example, front-end applications are often needed to scale according to how many users are active on the system and how long requests take to process. Meanwhile, the database server may have enough resources to meet peak demand load without scaling up. When resources are difficult to scale, consider deploying for peak capacity. Relational databases, other than Cloud Spanner, and network interconnects are examples of resources that are difficult to scale. In the case of a non-Spanner relational database, you could scale by running the database on a server with more CPUs and memory. This is vertical scaling, which is limited to the size of available instances. For networks, you could add additional interconnects to add bandwidth between sites. Both of these are disruptive operations compared to scaling a stateless application by adding virtual machines to a cluster, which users might never notice.

Scaling Compute Resources

Compute Engine and Kubernetes Engine support automatic scaling of compute resources. App Engine and Cloud Functions autoscale as well, but they are managed by the Google Compute Platform.

Scaling Compute in Compute Engine

In Compute Engine, you can scale the number of VMs running your application using managed instance groups, which support autoscaling. Adding VMs to a managed instance group is known as scaling out or scaling up. Removing VMs from a managed instance group is known as scaling in or scaling down. Autoscaling is not available when a managed instance group has a stateful configuration. Unmanaged instance groups do not support autoscaling. Compute Engine autoscaling should not be used by managed instance groups owned by Kubernetes Engine; cluster autoscaling should be used in those cases.

Autoscaling can be configured to scale based on several attributes, including the following:

Average CPU utilization
HTTP load balancing utilization
Customer monitoring metrics

The autoscaler collects the appropriate performance data and compares it to targets set in an autoscaling policy. For instance, if you set the target CPU utilization to 80 percent, then the autoscaler will add or remove VMs from the managed instance group to keep the CPU utilization average for the group close to 80 percent.

Autoscalers can make decisions based on multiple metrics. An autoscaler will calculate a recommended number of VMs per metric and then choose the maximum number of VMs recommended.

In addition to autoscaling based on metrics, you can also schedule autoscaling based on time using a scaling schedule. A scaling schedule has a capacity, which is the minimum number of required VMs, and a schedule that includes a start time, duration, and recurrence frequency, such as daily or weekly. You can also enable predictive autoscaling to forecast future loads. This works best when an application has a long startup time and the workload varies predictably over days or weeks.

Keep in mind that autoscaling is independent of health checks. If you use autohealing and a VM fails a health check, the autohealer will try to re-create the instance that failed.

When adding a VM to a managed instance group, the application running on the VM will take some time to initialize. This is known as the cooldown period. Autoscalers will use data from VMs in a cooldown period for scale-in decisions but not scale-out decisions. By default, the cooldown period is 60 seconds, but that can be changed.

When scaling in, the autoscaler considers the peak load during the previous 10 minutes, which is known as the stabilization period. The autoscaler ensures there are enough VMs to meet the peak load during the stabilization period.

Abrupt scale-in events can increase application latency. You can control scale-in operations by specifying a maximum allowed reduction in VMs within a specified time period known as the trailing time window. The trailing time window is the time window the autoscaler monitors for making scaling decisions. The autoscaler does not resize below the peak size less the maximum allowed reduction in VMs.

Before a VM is removed from a group, it can optionally run a shutdown script to clean up. The shutdown script is run on a best-effort basis.

When an instance is added to the group, it is configured according to the configuration details in the instance template.

Scaling Compute in Kubernetes Engine

Kubernetes is designed to manage containers in a cluster environment. Recall that containers are an isolation mechanism that allows processes on the same operating system to run with isolated resources. Kubernetes does not scale containers directly; instead, autoscaling is based on Kubernetes abstractions.

The smallest computational resource in Kubernetes is a pod. Pods contain containers. Pods run on nodes, which are VMs in managed instance groups. Pods usually contain one container, but they can include more. When pods have more than one container, those containers are usually tightly coupled, such as one container running analysis code while the other container runs ancillary services, such as data cleansing services. Containers in the same pod should have the same scaling characteristics since they will be scaled up and down together.

A deployment specifies updates for pods and ReplicaSets, which are sets of identically configured pods running at some point in time. An application may be run in more than one deployment at a time. This is commonly done to roll out new versions of code. A new deployment can be run in a cluster, and a small amount of traffic can be sent to it to test the new code in a production environment without exposing all users to the new code. This is an example of a canary deployment.

Applications running on a set of pods can be exposed using a Service. A Service provides a stable abstraction for accessing an application running in a deployment, which can have pods and associated IP addresses that change.

Kubernetes can scale the number of nodes in a cluster, and it can scale the number of replicas and pods running a deployment. Kubernetes Engine automatically scales the size of the cluster based on load. If a new pod is created and there are not enough resources in the cluster to run the pod, then the autoscaler will add a node. Nodes exist within node pools, which are nodes with the same configuration. When a cluster is first created, the number and type of nodes created become the default node pool. Other node pools can be added later if needed.

When you deploy applications to Kubernetes clusters, you have to specify how many replicas of an application should run. A replica is implemented as a pod running application containers. Scaling an application is changing the number of replicas to meet the demand.

Kubernetes provides for autoscaling the number of replicas. When using autoscaling, you specify a minimum and maximum number of replicas for your application along with a target that specifies a resource, like CPU utilization, and a threshold, such as 80 percent. Since Kubernetes Engine 1.9, you can specify custom metrics in Cloud Metrics as a target.

One of the advantages of containerizing applications is that they can be run in Kubernetes Engine, which can automatically scale the number of nodes or VMs in a cluster. It can also scale how those cloud resources are allocated to different services and their deployments.

Scaling Storage Resources

Storage resources are virtualized in GCP, and some are fully managed services, so there are parallels between scaling storage and compute resources.

The least scalable storage system is locally attached SSDs on VMs. Locally attached storage is not considered a persistent storage option. Data will be retained during reboots and live migrations, but it is lost when the VM is terminated or stopped. Local data is lost from preemptible VMs when they are preempted.

Zonal and regional persistent disks and persistent SSDs can currently scale up to 64 TB per VM instance. You should also consider read and write performance when scaling persistent storage. Standard disks have a maximum sustained read IO operations per second (IOPS) of 0.75 per gigabyte and write IOPS of 1.5 per gigabyte. Persistent SSDs have a maximum sustained read and write IOPS of 30 per gigabyte. As a general rule, persistent disks are well suited for large-volume batch processing when low-cost and high-storage volume are important. When performance is a consideration, such as when running a database on a VM, persistent SSDs are the better option.

Adding storage to a VM is a two-step process. You will need to allocate persistent storage and then issue operating system commands to make the storage available to the filesystem. The commands are operating system specific.

Managed services, such as Cloud Storage and BigQuery, ensure that storage is available as needed. In the case of BigQuery, even if you do not scale storage directly, you may want to consider partitioning data to improve query performance. Partitioning organizes data in a way that allows the query processor to scan smaller amounts of data to answer a query. For example, assume that Mountkirk Games is storing summary data about user sessions in BigQuery. The data includes a date indicating the day that the session data was collected. Analysts typically analyze data at the week and month levels. If the data is partitioned by week or month, the query processor would scan only the partitions needed to answer the query. Data that is outside the date range of the query would not have to be scanned. Since BigQuery charges by the amount of data scanned, this can help reduce costs.

Network Design for Scalability

Connectivity between on-premises data centers and Google data centers doesn't scale the way storage and compute scales. You need to plan ahead for what is the upper limit of what will be needed. You should plan for peak capacity, although you may only pay for bandwidth used depending on your provider.

Reliability

Reliability is a measure of the likelihood of a system being available and able to meet the needs of the load on the system. When analyzing technical requirements, it is important to look for reliability requirements. As with availability and scalability, these requirements may be explicit or implicit.

Designing for reliability requires that you consider how to minimize the chance of system failures. For example, we employ redundancy to mitigate the risk of a hardware failure leaving a crucial component unavailable. We also use DevOps best practices to manage risks with configuration changes and when managing infrastructure as code. These are the same practices that we employ to ensure availability.

You also need to consider how to respond when systems do fail. Distributed applications are complicated. A single application may depend on multiple microservices, each with a number of dependencies on other services, which may be developed and managed by another team within the organization or may be a third-party service.

Measuring Reliability

There are different ways to measure reliability, but some are more informative than others.

Total system uptime is one measure. This sounds simple and straightforward, but it is not—at least when dealing with distributed systems. Specifically, what measure do you use to determine whether a system is up? If at least one server is available, is a system up? If there was a problem with an instance group in Compute Engine or the pods in a Kubernetes deployment, you may be able to respond to some requests but not others. If your definition of uptime is based on just having one or some percentage of desired VMs or pods running, then this may not accurately reflect user experience regarding reliability.

Rather than focus on the implementation metrics, such as the number of instances available, reliability is better measured as a function of the work performed by the service. The number of requests that are successfully responded to is a good basis for measuring reliability. Successful request rate is the percentage of all application requests that are successfully responded to. This measure has the advantage of being easy to calculate and of providing a good indicator for the user experience.

Reliability Engineering

As an architect, you should consider ways to support reliability early in the design stage. This should include the following:

Identifying how to monitor services. Will they require custom metrics?
Considering alerting conditions. How do you balance the need for early indication that a problem may be emerging with the need to avoid overloading DevOps teams with unactionable alerts?
Using existing incident response procedures with the new system. Does this system require any specialized procedures during an incident? For example, if this is the first application to store confidential, personally identifying information, you may need to add procedures to notify the information security team if an incident involves a failure in access controls.
Implementing a system for tracking outages and performing post-mortems to understand why a disruption occurred.

Designing for reliability engineering requires an emphasis on organizational and management issues. This is different than designing for high availability and scalability, which is dominated by technical considerations. As an architect, it is important to remember that your responsibilities include both technical and management aspects of system design.

Summary

Architects are constantly working with technical requirements. Sometimes these requirements are explicitly stated, such as when a line-of-business manager states that the system will need to store 10 TB of data per day or that the data warehouse must support SQL. In other cases, you must infer technical requirements from other statements. If a streaming application must be able to accept late-arriving data, this implies the need to buffer data when it arrives and to specify how long to wait for late data.

Some technical requirements are statements of constraints, such as requiring that a database be implemented using MySQL 8.0. Other technical requirements require architects to analyze multiple business needs to identify specific requirements. Many of these fall into the categories of high availability, scalability, and reliability. Compute, storage, and networking services should be designed to support the levels of availability, scalability, and reliability that the business requires.

Exam Essentials

Understand the differences between availability, scalability, and reliability. High availability is the continuous operation of a system at sufficient capacity to meet the demands of ongoing workloads. Availability is usually measured as a percentage of time that a system is available. Scalability is the process of adding and removing infrastructure resources to meet workload demands efficiently. Reliability is a measure of how likely it is that a system will be available and capable of meeting the needs of the load on the system.
Understand how redundancy is used to improve availability. Compute, storage, and network services all use redundancy combined with autohealing or other forms of autorepair to improve availability. Clusters of identically configured VMs behind a load balancer is an example of using redundancy to improve availability. Making multiple copies of data is an example of redundancy used to improve storage availability. Using multiple direct connections between a data center and Google Cloud is an example of redundancy in networking.
Know that managed services relieve users of many responsibilities for availability and scalability. Managed services in GCP take care of most aspects of availability and scalability. For example, Cloud Storage is highly available and scalable, but users of the service do not have to do anything to enable these capabilities.
Understand how Compute Engine and Kubernetes Engine achieve high availability and scalability. Compute Engine uses managed instance groups, which include instance templates and autoscalers, to achieve high availability and scale to meet application load. Kubernetes is a container orchestration service that provides higher-level abstractions for deploying applications on containers. Pods scale as needed to meet demands of deployments, while clusters can autoscale nodes to meet the demand for pods and associated resources.
Understand reliability engineering is about managing risk. Designing for reliability requires you to consider how to minimize the chance of system failures. For example, architects employ redundancy to mitigate the risk of a hardware failure leaving a crucial component unavailable. Rather than focus on the implementation metrics, such as number of instances available, reliability is better measured as a function of the work performed by the service. The number of requests that are successfully responded to is a good basis for measuring reliability.

Review Questions

You are advising a customer on how to improve the availability of a data storage solution. Which of the following general strategies would you recommend?
1. Keeping redundant copies of the data
2. Lowering the network latency for disk writes
3. Using a NoSQL database
4. Using Cloud Spanner
A team of data scientists is analyzing archived data sets. Their statistical model building procedures run in batches. If the model building system is down for up to 30 minutes per day, it does not adversely impact the data scientists' work. What is the minimal percentage availability among the following options that would meet this requirement?
1. 99.99 percent
2. 99.90 percent
3. 99.00 percent
4. 99.999 percent
Your development team has recently triggered three incidents that resulted in service disruptions. In one case, an engineer mistyped a number in a configuration file and in the other cases specified an incorrect disk configuration. What practices would you recommend to reduce the risk of these types of errors?
1. Continuous integration/continuous deployment
2. Code reviews of configuration files
3. Vulnerability scanning
4. Improved access controls
Your company is running multiple VM instances that have not had any downtime in the past several weeks. Recently, several of the physical servers suffered disk failures. The applications running on the servers did not have any apparent service disruptions. What feature of Compute Engine enabled that?
1. Preemptible VMs
2. Live migration
3. Canary deployments
4. Redundant array of inexpensive disks
You have deployed an application on a managed instance group. Occasionally the application experiences an intermittent malfunction and then resumes normal operation. Which of these is a reasonable explanation for what is happening?
1. The application shuts down when the instance group time-to-live (TTL) threshold is reached.
2. The application shuts down when the health check fails.
3. The VM shuts down when the instance group TTL threshold is reached and a new VM is started.
4. The VM shuts down when the health check fails and a new VM is started.
An online gaming company is growing its user base in North America, Europe, and Asia. Executives are concerned that players in Europe and Asia will have a degraded experience if the game backend runs only in North America. What would you suggest to improve latency and game experience for users in Europe and Asia?
1. Use Cloud Spanner to have a globally consistent, horizontally scalable relational database.
2. Create instance groups running the game backend in multiple regions across North America, Europe, and Asia. Use global load balancing to distribute the workload.
3. Use Standard Tier networking to ensure that data sent between regions is routed over the public internet.
4. Use a Cloud Memorystore cache in front of the database to reduce database read latency.
What configuration changes are required to ensure high availability when using Cloud Storage or Cloud Filestore?
1. A sufficiently long TTL must be set.
2. A health check must be specified.
3. Both a TTL and health check must be specified.
4. Nothing. Both are managed services. GCP manages high availability.
The finance director at your company is frustrated with the poor availability of an on-premises finance data warehouse. The data warehouse uses a commercial relational database that only scales by buying larger and larger servers. The director asks for your advice about moving the data warehouse to the cloud and if the company can continue to use SQL to query the data warehouse. What GCP service would you recommend to replace the on-premises data warehouse?
1. Bigtable
2. BigQuery
3. Cloud Datastore
4. Cloud Storage
TerramEarth has determined that it wants to use Cloud Bigtable to store equipment telemetry received from vehicles in the field. It has also concluded that it wants two clusters in different regions. Both clusters should be able to respond to read and write requests. What kind of replication should be used?
1. Primary–hot primary
2. Primary–warm primary
3. Primary–primary
4. Primary read–primary write
Your company is implementing a hybrid cloud computing model. Line-of-business owners are concerned that data stored in the cloud may not be available to on-premises applications. The current network connection is using a maximum of 40 percent of bandwidth. What would you suggest to mitigate the risk of that kind of service failure?
1. Configure firewall rules to improve availability.
2. Use redundant network connections between the on-premises data center and Google Cloud.
3. Increase the number of VMs allowed in Compute Engine instance groups.
4. Increase the bandwidth of the network connection between the data center and Google Cloud.
A team of architects in your company is defining standards to improve availability. In addition to recommending redundancy and code reviews for configuration changes, what would you recommend including in the standards?
1. Use of access controls
2. Use of managed services for all compute requirements
3. Use of Cloud Monitoring to alert on changes in application performance
4. Use of Bigtable to collect performance monitoring data
Why would you want to run long-running, compute-intensive backend computation in a different managed instance group than on web servers supporting a minimal user interface?
1. Managed instance groups can run only a single application.
2. Managed instance groups are optimized for either compute or HTTP connectivity.
3. Compute-intensive applications have different scaling characteristics from those of lightweight user interface applications.
4. There is no reason to run the applications in different managed instance groups.
An instance group is adding more VMs than necessary and then shutting them down. This pattern is happening repeatedly. What would you do to try to stabilize the addition and removal of VMs?
1. Increase the maximum number of VMs in the instance group.
2. Decrease the minimum number of VMs in the instance group.
3. Increase the time autoscalers consider when making decisions.
4. Decrease the cooldown period.
A clothing retailer has just developed a new feature for its customer-facing web application. Customers can upload images of their clothes, create montages from those images, and share them on social networking sites. Images are temporarily saved to locally attached drives as the customer works on the montage. When the montage is complete, the final version is copied to a Cloud Storage bucket. The services implementing this feature run in a managed instance group. Several users have noted that their final montages are not available even though they saved them in the application. No other problems have been reported with the service. What might be causing this problem?
1. The Cloud Storage bucket is out of storage.
2. The locally attached drive does not have a filesystem.
3. The users experiencing the problem were using a VM that was shut down by an autoscaler, and a cleanup script did not run to copy the latest version of the montage to Cloud Storage.
4. The network connectivity between the VMs and Cloud Storage has failed.
Your development team has implemented a new application using a microservices architecture. You would like to minimize DevOps overhead by deploying the services in a way that will autoscale. You would also like to run each microservice in containers. What is a good option for implementing these requirements in Google Cloud Platform?
1. Run the containers in Cloud Functions.
2. Run the containers in Kubernetes Engine.
3. Run the containers in Cloud Dataproc.
4. Run the containers in Cloud Dataflow.
TerramEarth is considering building an analytics database and making it available to equipment designers. The designers require the ability to query the data with SQL. The analytics database manager wants to minimize the cost of the service. What would you recommend?
1. Use BigQuery as the analytics database, and partition the data to minimize the amount of data scanned to answer queries.
2. Use Bigtable as the analytics database, and partition the data to minimize the amount of data scanned to answer queries.
3. Use BigQuery as the analytics database, and use data federation to minimize the amount of data scanned to answer queries.
4. Use Bigtable as the analytics database, and use data federation to minimize the amount of data scanned to answer queries.
Line-of-business owners have decided to move several applications to the cloud. They believe the cloud will be more reliable, but they want to collect data to test their hypothesis. What is a common measure of reliability that they can use?
1. Mean time to recovery
2. Mean time between failures
3. Mean time between deployments
4. Mean time between errors
A group of business executives and software engineers are discussing the level of risk that is acceptable for a new application. Business executives want to minimize the risk that the service is not available. Software engineers note that the more developer time dedicated to reducing risk of disruption, the less time they have to implement new features. How can you formalize the group's tolerance for risk of disruption?
1. Request success rate
2. Uptime of service
3. Latency
4. Throughput
Your DevOps team recently determined that it needed to increase the size of persistent disks used by VMs running a business-critical application. When scaling up the size of available persistent storage for a VM, what other step may be required?
1. Adjusting the filesystem size in the operating system
2. Backing up the persistent disk before changing its size
3. Changing the access controls on files on the disk
4. Updating disk metadata, including labels

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.