Self-healing infrastructures

Another important paradigm to adopt for cloud-native applications as it relates to scalability and availability are self-healing infrastructures. A self-healing infrastructure is an inherently smart deployment that is automated to respond to known and common failures. Depending on the failure, the architecture is inherently resilient and takes appropriate measures to remediate the error.

The self-healing aspect can apply at the application, system, and hardware levels. The cloud has completely taken responsibility for "hardware self-healing". No such thing technically exists, as we have yet to figure out a way to repair broken hard disks, torched CPUs, or to replace burned-out RAM without human interaction. As cloud consumers, however, the current state of affairs mimics an idealistic future. CSPs deploy people behind the scenes to repair and replace failing hardware resources quickly and furtively. By our strict definition, we have yet to approach self-healing physical infrastructures since human intervention is still needed. However, as cloud consumers, we do not need worry about this, as we are completely divorced from the physical layer.

At the system and application level, we have many methods at our disposal to help build self-healing infrastructures and cloud-native applications. A few examples are given as follows, and we will cover tools to use later in this chapter:

Auto-scaling groups are a perfect example of self healing systems. While we typically associate ASGs with scalability, ASGs can be tuned to discard and reprovision unhealthy VMs with new ones. Sending custom health metrics or heartbeats to your monitoring system is key to utilizing this capability. It's important to architect apps as stateless, since this will allow sessions to be handed across one VM to another within an ASG.
DNS Health Checks available on the cloud platforms allow users to monitor and act upon the health of a specific resource, the status of native monitoring services, and the states of other health checks. You can then intelligently and automatically reroute traffic based on the health of the stack.
Instance (VM) Autorecovery is a feature that CSPs such as AWS provide to automatically recover an unhealthy instance when there is an underlying hardware failure. If there is a loss of network connectivity, loss of system power, software issues on the physical host, or hardware issues on the physical host that impact network reachability, AWS will replicate the instance and notify the user.
Database failover or cluster features are available through managed DB services on the major CSPs. Services such as AWS RDS (Relational Database Service) give users the ability to provision multi-AZ deployments. By using SQL Server Mirroring for SQL; Multi-AZ deployments for Oracle, PostgreSQL, MySQL and MariaDB; and clustering for Amazon's proprietary DB engine called Aurora, these services support highly available DB stack components. In the event of a failure, the DB service automatically fails over to a synchronized standby replica.

Table of Contents for Self-healing infrastructures

Create new playlist

Sign In

Sign Up

Table of Contents for
Self-healing infrastructures