© Julian Soh, Marshall Copeland, Anthony Puca, and Micheleen Harris 2020
J. Soh et al.Microsoft Azurehttps://doi.org/10.1007/978-1-4842-5958-0_10

10. Virtual Machines

Julian Soh1 , Marshall Copeland2, Anthony Puca3 and Micheleen Harris1
(1)
Washington, WA, USA
(2)
Texas, TX, USA
(3)
Colorado, CO, USA
 

Virtual machines (VMs) usually begin an organization’s experience in the cloud because of a few key factors that make VMs the easiest cloud service to deploy. Most organizations have used virtualization on servers for more than a decade at this point, so there’s a fair amount of familiarity with provisioning, operations, and management concepts.

VMs are a supported environment since the customer manages everything from the operating system up the stack. Tens of thousands of applications have been validated to run no differently on VMs. Microsoft and third-party vendors have had tools to migrate workloads from VMware, Citrix, Microsoft, and AWS virtualization solutions to Azure for many years.

Azure VMs run on a customized Hyper-V platform. The tools and processes are well used, understood, and trusted. Development and testing, applications running on VMs in the cloud, and extending your datacenter into the cloud are the top use cases for VMs in Azure.

In this chapter, we review some of the VM features released in Azure over the last few years. These new features include high availability, disaster recovery, scalability, monitoring, VM resources, and security. While this is not a complete list, we try to focus on the newer and more popular items that have received a lot of recognition from the global community.

Creating and Managing Virtual Machines

The life cycle of a VM in Azure is simple. Azure VMs can exist in several different states. These states determine whether the VM is available or not and if it is billed. This is important to understand because it directly impacts what you’re billed for. Table 10-1 outlines the VM life cycle states and billing. It’s worth repeating: we’re talking about the billing of the compute, not the underlying storage, which is billed in blocks.
Table 10-1

VM Life Cycle State

State

Description

Billing

Starting

Boot VM

Not billed

Running

Normal state of operational VMs

Billed

Stopping

Temporary state between “running” and “stopped”

Billed

Stopped

vHardware is still allocated to the VM and remains on the host

Disks are billed but compute is not

Deallocating

Temporary status

Not billed

Deallocated

VM has stopped, and vHardware is no longer allocated on host

Not billed

Costs for VMs vary by the series, size, Azure cloud, region, number of disks added to the VM, and if deployed from the Azure Shared Image Gallery (covered later in this chapter), and the amount and type of software preloaded in the VM.

Azure charges an hourly price for VMs based on processing power, memory, and storage capacity. For partial hours, Azure prorates to the minute.

Let’s review reserved instances (RI). Azure Reserved Instances help you save money by committing to one- or three-year consumption plans on workloads such as Azure Compute. The commitment is targeted to specific subscriptions, resources groups, and VM series. Reserved instances are available on VMs and are specified as one- or three-year reservations. Azure reserved instances can save you up to 72% on the compute cost of Windows or Linux VMs due to the pre-committed nature of the reserved instance.

Let’s agree to use 730 hours as the number of hours in a month; this equates to 30.42 days, which is the number we use for cost calculations due to the variety of 31-, 30-, and 28-day months. If you estimate the number of hours your workload runs, you find that usually 300 hours per month is the threshold. If you expect to run that many or more, Azure Reserved Instances is cheaper. You need to do this on a case-by-case VM series and size because Azure Reserved Instances are discounted differently depending on the series and number of cores.

It is safe to say that if you are running a workload all the time, or over half the month, it’s a no-brainer that Azure Reserved Instances save you money. We have seen many examples where the three-year Azure reserved instance is cheaper than using a one-year and some pay-as-you-go due to the workload expected to retire in one or two years.

For more information on Azure reserved instances and how to apply them, please refer to https://docs.microsoft.com/en-us/azure/cost-management-billing/reservations/save-compute-costs-reservations.

Azure VMs have an industry-best monthly service-level agreement (SLA) of 99.9% for a single VM if you’re deployed with premium storage. The SLA goes up to 99.95 when you have two or more VMs in an Azure Availability Set, which we cover later in this chapter. If you would like more specifics on any of Azure’s services’ SLAs, please refer to https://azure.microsoft.com/en-us/support/legal/sla/.

Operating Systems (Windows, Linux)

Microsoft Azure supports various Microsoft and Linux distributions. What Microsoft supports as an operating system environment (OSE) in Azure is different than what is supported on-premises, available in the Azure Shared Image Gallery, and the Marketplace. Each location has a unique list of systems that can be deployed, some of which in the marketplace are running hardened, proprietary OSEs, such as those made by firewall vendors for network virtual appliances (NVA).

Azure has many services, such as Azure Security Center and Log Analytics, that can run in a hybrid architecture, which means that they can run on VMs in Azure, as well as VMs and physical servers on-premises or in other cloud providers. This results in a very complicated support architecture. Microsoft supports only 64-bit OSEs in Azure. In early 2020, Microsoft supported the following Windows operating systems.
  • Windows Server 2019

  • Windows Server 2016

  • Windows Server 2016 Core

  • Windows Server 2012 R2 (64-bit)

  • Windows Server 2012 (64-bit)

  • Windows Server 2008 R2 (64-bit)

  • Windows 10 (64-bit)

Microsoft maintains a subset of this list in the Azure Shared Image Gallery for customers to deploy images of these OSEs that have been patched and updated, from Microsoft directly, to minimize the gap of updates needed once the VM is deployed to make it current. The following OSEs are not available in the Azure Shared Image Gallery.
  • Windows Server 2016 Core

  • Windows Server 2012 (64-bit)

  • Windows Server 2008 R2 (64-bit)

Note

The (64-bit) instances listed are because that OS included a 32-Bit version.

Contrary to popular belief, Azure IaaS is not only for Microsoft-based OSEs. Microsoft has approximately 50% of the VMs in Azure running some version of a Linux distribution. Three years ago, that number was 40%. The following lists the Linux distributions available to Azure customers as of spring 2020.
  • CentOS 6.3+, 7.0+, 8.0+

  • CoreOS 494.4.0+

  • Debian 7.9+, 8.2+, 9, 10

  • Oracle Linux 6.4+, 7.0+

  • Red Hat Enterprise Linux (RHEL) 6.7+, 7.1+, 8.0+

  • SUSE Linux Enterprise (SLES)/SLES for SAP 11 SP4, 12 SP1+, 15

  • openSUSE Leap 42.2+

  • Ubuntu 12.04+

Microsoft requires Linux distribution manufacturers to update their images quarterly at a minimum. This is a more frequent cadence than what is seen from Linux distributors when comparing to the cadence of their version releases. Microsoft also works with these manufacturers to tune” their kernels for the Azure platform, incorporating new features and performance improvements. Azure-tuned Kernels include those from CentOS, Debian, SLES, and Ubuntu. For more information on the specifics of endorsed Linux distributions in Azure, please refer to https://docs.microsoft.com/en-us/azure/virtual-machines/linux/endorsed-distros.

Shared Image Gallery

The Shared Image Gallery is a service managed by Microsoft that replicates the images globally and exposes the gallery to customers at the tenant level, allowing RBAC to be utilized in the Shared Image Gallery across subscriptions. The Shared Image Gallery supports the versioning of images to facilitate change and release management across your cloud infrastructure. The Shared Image Gallery has a few resource types that assist in the management of your enterprise images in the cloud. The following list details these Resources and their purpose.
  • Managed image: A basic image made from a Sysprep VM via the Windows Sysprep.exe tool or a generalized VM that allows repeated creation of VMs from a single image in storage. Figure 10-1 shows the Sysprep tool user interface. Linux systems use the Microsoft Azure Linux Agent waagent utility. The purpose of Managed Images is to quickly build VMs and preload apps and load configurations to minimize deployment times when the same installs and configurations are needed. For more information on how to use Sysprep to generate generalized installs, please refer to https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/sysprep--generalize--a-windows-installation.

../images/336094_2_En_10_Chapter/336094_2_En_10_Fig1_HTML.jpg
Figure 10-1

Sysprep tool for master image preparation

  • Snapshot: A copy of a VHD that is in a deployed state. It is good for backing a point in time of a VM or a VHD that you may want to go back to for regression testing, rollback, cloning a disk, and so forth.

  • Image gallery: A repository integrated into the Azure portal’s virtual machines’ service blade for managing and sharing images.

  • Image definition: A description field where you can include information like the operating system, configurations, and release notes. The Image Definition field has three mandatory parameters for grouping and other management tasks: publisher, offer, and SKU. There are an additional nine optional parameters that may be used for easing resource tracking.

  • Image version: Creates a VM when using a gallery. Multiple versions of an image are supported to make differencing the images as easy as possible.

Azure Shared Image Gallery includes high thresholds: 100 per subscription per region, 10,000 image versions per subscription per region, deployments up to 1000 VM instances in a single virtual machine scale set. It also leverages zone-redundant storage (ZRS) to mitigate entire zone failures in regions with Availability Zones, which we cover later in this chapter. Azure Shared Image Gallery may replicate from one Azure region to another automatically, making your images available locally to remote regions and workforces. The only billing associated with Shared Image Gallery is the storage and egress traffic when the image replication traffic or downloading leaves the Azure region it is located in.

For more information on the Shared Image Gallery, please refer to https://docs.microsoft.com/en-us/azure/virtual-machines/windows/shared-image-galleries.

Uploading Custom Images

Customer’s existing images, or newly created images on-premises, may be uploaded to Azure easily by uploading the VHD file to an Azure managed disk. There is no longer a need to stage in an Azure Storage account and moving the VHD around afterward. Now you merely create an empty Azure managed disk, then upload your VHD to it, if it isn’t over the 32 TB threshold allowed.

To upload your own custom on-premises images to Azure, follow these directions.
  1. 1.
     
  2. 2.
     
  3. 3.

    Upload a VHD file. You can easily create one on Microsoft Hyper-V on any Windows OS from Windows 8 up. If you want a ready-made Windows 10 Enterprise VM with Visual Studio installed, download it from https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/.

    The VHD needs to have a fixed-size disk; it cannot be a dynamically expanding virtual hard disk (see Figure 10-2).

     
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig2_HTML.jpg
Figure 10-2

Fixed-size disk type in Microsoft Hyper-V

  1. 4.

    Determine the VHD file size in bytes.

    Note Use the Size value from Windows Explorer, not the “Size on Disk” value.

     
  2. 5.
    Open a command prompt and type az login. This opens a browser to authenticate you to Azure from the CLI. You then see the list of subscriptions that you have access to.
    You have logged in. Now let us find all the subscriptions to which you have access...
    [
        {
           "cloudName": "AzureCloud",
           "homeTenantId": "724243233.86f1.44af.91ah...."
     
  3. 6.
    Run the following command from the Azure CLI with the file size determined in step 4.
    az disk create -n filename.vhd --subscription yoursubscriptionnamehere -g yourresourcegroupnamehere -l westus2 --for-upload --upload-size-bytes 2147484160 --sku standard_lrs --verbose

    Note If you would like to upload a premium SSD or standard SSD, replace standard_lrs with premium_LRS or standardssd_lrs.

     
  4. 7.
    You see a page’s worth of text results, including the following.
    "location": "westus2",
    "name": "filename.vhd"
    "resourceGroup": "yourresourcegroupnamehere"
    "sku": {
          "name": "Standard_LRS"
          "tier": "Standard"
    },
    Command ran in x seconds.
     
  5. 8.

    Look at the disk in the resource group blade in the portal. It is provisioned in a ReadyToUpload disk state, as shown in Figure 10-3.

     
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig3_HTML.jpg
Figure 10-3

Disk in ReadyToUpload state

  1. 9.

    To generate a shared access signature (SAS), which provides secure delegated access to resources in your storage account by securing it with the storage account key and the timeframe you specify at the end in seconds, follow the steps in Figure 10-3. Note how a 2 GB VHD is given 24 hours to upload.

    From the command line, run the following:
    az disk grant-access -n filename.vhd --subscription yoursubscriptionnamehere -g yourresourcegroupnamehere --access-level Write --duration-in-seconds 86400
     
  2. 10.

    It takes several seconds to show -running... You should see the confirmation shown in Figure 10-4.

     
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig4_HTML.jpg
Figure 10-4

SAS generation success

Note

To save you time and aggravation, copy the preceding returned SAS URI— everything from HTTPS to the end—and save to a text editor.

  1. 11.
    Use AzCopy to copy the local VHD to Azure. To do so, run the command built from the path to your VHD and the SAS URI you saved from the previous step. Your result looks like Figure 10-5, except for the time, which is based on your VHD size and Internet bandwidth.
    AzCopy.exe cp "C:OneDriveDocumentsBookAzure Second EditionChapter 10data_disk1.vhd" "https://md-impexp-df0jd4vhqf4k.blob.core.windows.net/k5ph55c4qvbc/abcd?sv=2017-04-17&sr=b&si=7fb359dd-c20d-47f4-b671-f6ed8eb2a5be&sig=Py3kYsySaDxhUyStp0yjG4S1q2QpdELW0r%2FPqzRjcnY%3D" --blob-type PageBlob
     
Note

AzCopy logs to C:Users%username%.azcopy, which is for troubleshooting.

../images/336094_2_En_10_Chapter/336094_2_En_10_Fig5_HTML.jpg
Figure 10-5

SAS upload completed

  1. 12.
    Revoke access to the VHD in Azure. This is necessary to change the status of the VHD from ReadyToUpload, as shown in step 8, to Unattached. Run the following command to revoke access:
    az disk revoke-access -n filename.vhd --subscription yoursubscriptionnamehere -g yourresourcegroupnamehere
     
  2. 13.

    By browsing the Azure portal to the disk, you see that its status has changed to Unattached, as shown in Figure 10-6. It is ready to attach to a VM or be used as your base image.

     
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig6_HTML.jpg
Figure 10-6

VHD successfully uploaded to Azure

It is important to understand what was done with the shared access signature in this exercise. It temporarily (for a timeframe specified by the administrator) allowed access to an Azure resource group within a subscription and allowed the uploading of a file. Minimizing the timeframe and ensuring that access is revoked when the work is completed is imperative for security reasons. If anyone has access to the SAS, the token is in the string after the storage URI.

This is a simple exercise that allows you to upload any images you want to Azure very easily, including the “az login” step, in which there are only five steps to seed Azure with your virtual hard disks.

Virtual Machine Disks

Azure virtual machines (VM) use virtual hard disks (VHD) to store their OS and data. Disks are provisioned as specific, user-selected, sizes, and types. Azure disks are completely managed by Azure’s fabric. The disks you create on top of this storage infrastructure include ultra disks, premium solid-state drives (SSD), standard SSDs, and standard hard disk drives (HDD). Each VM you deploy in Azure has an OS disk, as well as a temporary disk that is not a managed disk. It provides temporary storage for applications and is intended to be used like a swap file, but as an entire disk.

Data on temporary disks are destroyed under some events such as redeployment or various types of maintenance events. This is important to understand because the data persists through a reboot and lull users into thinking this is a standard disk. Data disks can be deployed at image deployment time or anytime afterward. Remember to go into Disk Management within Windows after attaching a data disk to assign a drive letter and present it to the OS.

Azure managed disks are designed to have 99.999% availability. Every block is, at minimum, triple replicated in a locally redundant storage (LRS) container. Think of the underlying Azure Storage container as a type of RAID (redundant array of inexpensive disks). LRS has three copies in your Azure datacenter, zone-redundant storage (ZRS) has six copies in a single Azure region with three copies in one Availability Zone and three in another. Geo-redundant storage with read access (GRS-RA) has three copies in your Azure region’s datacenter and three copies in another Azure region’s datacenter, with at least 600 miles between the datacenters for geographic zone fault tolerance. This storage architecture has led to Microsoft having an industry-leading 0% annualized failure rate.

Azure managed disks are integrated with Azure availability sets to ensure that all of your disks are not in the same storage scale unit (stamp) and to avoid another failure point. If a stamp failed due to hardware or software failure, your VMs would continue to run because the disks would be running in another stamp as part of the availability set.

Azure managed disks support role-based access control (RBAC) to assign permissions to a disk. RBAC permissions can be configured to prevent administrators from performing actions such as exporting the VHD. This is of importance to those who have severe lockdown procedures in place on virtual workloads. For example, this prevents someone from exporting a domain controller VHD to brute force its Security Account Manager (SAM) Active Directory (AD) offline without being detected.

Azure managed disks are automatically encrypted at the storage layer by Azure server-side encryption. This is done to ensure the data in Azure satisfies the various regulatory compliance requirements Microsoft is upheld to. Customers also can encrypt disks with their own customer-managed key instead of the Azure platform’s. To protect your data and the OS or application level, Microsoft recommends you use Azure Disk Encryption, which is enabled in the Azure portal by going into the VM, selecting disks, clicking Encryption on the ribbon, and selecting your desired option, as shown in Figure 10-7. This encryption process is integrated with Azure Key Vault and allows you to manage your disk encryption keys.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig7_HTML.jpg
Figure 10-7

Azure disk encryption on a VM

Image Builder

Azure Image Builder is a new service that allows organizations to build baseline, gold, or master images for large groups of users or services with different requirements. They can also be easily customized to suit the specific needs of a user or service. Whatever the common denominator for your deployment, it can be built into a Master Image and patched at intervals to keep it current to streamline future deployments.

Azure Image Builder integrates with several configuration server solutions, including DSC, Chef, Puppet, Terraform, System Center Configuration Manager (SCCM), File Shares, and so forth. Once the images are created, they can be published to the Azure Shared Image Gallery for administrators to deploy in a streamlined process.

Monitoring the Health of Virtual Machines

Monitoring the health of VMs in Azure is accomplished in several ways. Microsoft provides basic telemetry data of a VM from the Azure portal on the Overview tab of the VM, as shown in Figure 10-8. These performance graphs illustrate CPU usage, network usage, total disk bytes, and disk IOPS. Mouse over any of the graphs, and you’ll see the data for the point on which you are hovering.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig8_HTML.jpg
Figure 10-8

Azure portal VM monitoring

Azure customers can enable Boot Diagnostics for an analysis of why a VM is not booting in Azure. It can be done at deployment time in the portal, via Azure Policy, Azure Scripting, or after deployment in the portal on the VM under the Boot Diagnostics section of the VM. Scroll down toward the bottom of the screen pictured in Figure 10-8. Boot Diagnostics, under the Support and Troubleshooting section of the virtual machine, writes the boot data into an Azure Storage account. You can use one Storage Account for multiple VMs. It is a best practice to keep that storage account local to the region the VMs are in and to have it named accordingly for the VMs that are writing to it.

It is highly recommended to enable the collection of guest OS diagnostics data. This can be done during the VM creation in the Azure portal, or any of the methods described earlier for enabling boot diagnostics. By adding this extension to Windows or Linux VMs, administrators get much more CPU, memory, and disk data. The key piece of monitoring here is access to how much RAM the guest VM is using. This is a requirement if the deployed service requires autoscaling, where the Azure autoscaling service would need to know what’s in use within the VMs.

Alerts can be created for any of the performance and billing metrics Azure can monitor. These alerts are triggered by a performance counter exceeding or going below a threshold. Azure alerts are configured in the portal, by using templates, or from the Azure CLI. Azure alert rules include dynamic thresholds, where the monitoring alert rule adjusts itself dynamically for you by adapting to the changes of the counter.

Dynamic thresholds use machine learning (ML) to automatically detect metric patterns and adapt over time to their changes. This accomplishes suppressing alerts that may occur because of seasonality, such as daily, weekly, or month-end events that could naturally create spikes that are normal for that time. Ultimately, dynamic thresholds use the ML algorithm to filter out what is known as noise in the IT monitoring and management field.

Figure 10-9 shows how a dynamic threshold detected an anomaly in the CPU counter.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig9_HTML.jpg
Figure 10-9

Azure dynamic threshold monitoring

Azure Resource Health is another subcomponent of a VM that provides administrators insights into whether the Azure platform is experiencing issues that would be affecting the VM.

Azure Service Health is an Azure service that provides a view into the Azure portal. It identifies which service is having issues, what the issues are, their root cause once resolved, and the regions the issue is impacting. It provides time stamps and several other attributes relevant to the issue reported. Azure Service Health is a convenient and time-saving tool to use whenever you think you need to contact Microsoft about an issue you are experiencing.

To get the most robust and detailed visibility into a VM’s performance, configuration, and operations, Azure Monitor and its Application Insights subcomponent should be deployed. Azure Monitor is an agent-based solution that uses the Microsoft Management Agent to run within the Windows or Linux VM and send data to Azure Log Analytics workspaces. The data is then analyzed to determine if there are application configuration changes needed to satisfy best practices, performance issues, dependencies on other systems based on network connections, and so forth.

Application Insights, which is a feature of Azure Monitor, is the evolution of Microsoft’s Application Performance Management (APM). Application Insights runs at the .NET or Java EE (pronounced J-2-E) application layer and detects performance anomalies to help administrators identify where the anomalies are coming from. The application monitored can be on-premises, in Azure, or any other cloud provider environment. Application Insights integrates with Visual Studio App Center and solves the age-old problem of application developers pointing their finger at the network, and system or network administrators pointing their finger at the application. Figure 10-10 shows the Azure Monitor providing basic health reporting in the dashboard of a virtual machine.

Application Insights monitors the following application metrics.
  • Request rates, response times, and failure rates: Learn which pages are most popular, at what times of day, and where your users are. See which pages perform best. If your response times and failure rates are higher when there are more requests, then perhaps you have a resourcing problem.

  • Dependency rates, response times, and failure rates: Learn whether external services are slowing you down.

  • Exceptions: Analyze the aggregated statistics or pick specific instances and drill into the stack trace and related requests. Both server and browser exceptions are reported.

  • Page views and load performance: Reported by your users’ browsers.

  • AJAX calls from web pages: Rates, response times, and failure rates.

  • User and session counts.

  • Performance counters from your Windows or Linux server machines, such as CPU, memory, and network usage.

  • Host diagnostics from Docker or Azure.

  • Diagnostic trace logs from your app so that you can correlate trace events with requests.

  • Custom events and metrics that you write yourself in the client or server code, to track business events such as items sold or games won.

../images/336094_2_En_10_Chapter/336094_2_En_10_Fig10_HTML.jpg
Figure 10-10

Azure Monitor

Figure 10-11 shows Azure Advisor recommendations on the security and configuration of a virtual machine. Azure Advisor provides tailored guidance on the resources monitored by Azure Monitor. The guidance provided by Azure Advisor is prioritized on the severity of the configuration detected and its risk exposure. Azure Advisor uses industry best practices to point out what’s misconfigured or insecure and simultaneously provides actionable steps to remediate the vulnerability. In this case, we see how Azure Advisor can point out vulnerabilities both within the virtual machine and improvements available within the Azure fabric to improve the VM’s availability.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig11_HTML.jpg
Figure 10-11

Azure Advisor

All the data is presented to the administrator in easy to view charts and graphs with weights applied to the various items discovered and reported to prioritize the most urgent at the top. The data collected and reported also surfaces in Azure Security Center, enabling you to focus on the tasks needed to properly secure your environment. In Azure Advisor, recommendations are made to give your workloads the best performing and most secure configuration.

Securing Virtual Machines

Azure secures VMs in several ways, including deployments based on Azure Blueprints, Azure Fabric, Disk Encryption, network encryption, and network shielding from various threats, including DDOS and known botnets.

Change tracking, security, and monitoring data in Azure Security Center identify risks with your configuration. Just-in-time (JIT) access in Security Center, prioritization of actions needed to harden your VMs in Azure Advisor, and more than 1300 policies enforce a security posture your organization needs for each specific workload.

Azure Security Center’s just-in-time access, shown in Figure 10-12, is a service in Azure Security Center that allows customers to lock down the external-facing ports of VM until the time an administrator needs access. This reduces the risk exposure by not exposing the workload to the world where users and bad actors can try brute force and other types of hacking attempts to gain access to the workload.

Just-in-time work at the Azure Baric layer, which allows Azure Security Center to manipulate the Azure Network Security Groups and Azure firewalls discussed in Chapter 2 and Chapter 9. Azure Security Center opens the NSG on the specified port(s) to the IP address(s) listed in the configuration for the specified amount of time. When the time is up, Azure Security Center shuts the port on the Azure NSG and Firewalls accordingly.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig12_HTML.jpg
Figure 10-12

JIT VM access configuration

Azure policies enforce the configurations to maintain consistency over time, personnel change, upgrades, and so forth. Azure Policy has similar behavior to Microsoft Group Policy Object (GPO). Still, it does not require the Windows system to be part of an Active Directory (AD) domain, and it works on Windows and Linux systems.

Azure Security Center (ASC) enables administrators to quickly assess how secure their various workloads are and prioritizes what they need to address based on a risk score determined by Microsoft’s tickets and cybersecurity team weighing the importance of certain infrastructure deficiencies. The lessons learned by Microsoft Support and Security have directly influenced how Azure Security Center is designed and the data that manifests itself to each customer based on their global security posture experience.

The most common challenges addressed by Azure Security Center are
  • Rapidly changing workloads

  • Increasingly sophisticated

  • Security skills are in short supply

Key benefits of Azure Security Center include
  • Strengthens security posture

  • Protects against threats

  • Becomes secure faster

Azure Security Center is where administrators can go to check their Secure Score, which is a rank of where they are compared to where they should be from a security point of view based on the services deployed in the subscriptions viewed at the time. It is also where Azure policies, threat protection, regulatory compliance, recommendations, and several other security protections are surfaced in the Azure portal. Figure 10-13 shows the Azure Security Center dashboard.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig13_HTML.jpg
Figure 10-13

Azure Security Center dashboard

Azure Security Center is covered in more detail in Chapter 2 as part of the native Azure services that help strengthen customers’ security posture. Figure 10-14 shows how Azure Security Center can display the network topology as it provides continuous monitoring of the LAN and WAN and illustrates the relationship between resources talking on each. This is a demonstration of Azure Security Center monitoring over 400 resources across 100 virtual networks across approximately 25 subscriptions, some of which are on different Azure AD tenants also.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig14_HTML.jpg
Figure 10-14

ASC mapping a network topology

While we’re discussing the security benefits of Microsoft Azure, we should briefly overview Microsoft’s Extended Security Updates (ESU). Microsoft has had a consistent method of providing security updates on their products for decades. Microsoft provides five years of mainstream support followed by another five years of extended support, which includes regular security updates. This provides a total of ten years of security-based support.

Once you have exceeded the ten-year threshold, you typically had only two options: get no security support or pay for extended security updates, which cost 75% of the cost of the licenses each year for up to three years. That’s a total of 225% of the cost of the product over three years because it has not been upgraded for whatever reasons you may have. These frequently include a workload that is retiring or lack of vendor support for the newer OS or SQL platforms.

Many customers faced with Microsoft SQL Server 2008 and Windows Server 2008 hitting the end of their respective lives in July 2019 and January 2020. This process repeats for deployments of Windows Server 2012, which has an end of life in 2023.

You may be thinking, what does Microsoft Azure have to do with the boxed product end-of-life dates? There are a couple of benefits Azure provides with this issue. First, Azure SQL, the PaaS service, is continuously updated by Microsoft on the back end. This means customers never have to worry about end-of-life support issues with Azure SQL instances like Azure SQL Database or Azure SQL Managed Instance. Another benefit available only to Azure customers is the ability to move 2008 workloads, Windows, or SQL Server, to Azure, and get the extended support updates without having to pay the 75% cost of the license annually for the updates. This gives customers three more years to continue running the legacy database and operating systems while they determine an exit strategy.

For more information on the Microsoft Lifecycle Policy, please refer to https://support.microsoft.com/en-us/lifecycle/selectindex.

Troubleshooting

Some services have adjustable limits. When deploying virtual machines in Azure, there are several quotas administrators should be aware of. Your subscription has default quota limits in place that could impact the deployment of many VMs for your project. The limit on a per subscription basis is 20 VMs per region. This may seem, and is, quite small. This is to prevent administrators from mistaking deploying large deployments without realizing the financial implications of doing so. Limits can easily be raised by filing a no-charge Microsoft support ticket requesting an increase to the quota.

Quota increases do not apply to free trial subscriptions. Free subscriptions have several limitations placed on them, including quotas on monetary credits, and the number of and which resources are available. High-end compute series for VMs do not show up in free subscriptions. If you exceed your monetary credit in a free subscription, all the resources are shut down until the next billing cycle starts. For example, if you exceed your $50, $100, or $150 MSDN credit before 30 days have passed, the resources halt until the 30-day clock resets.

These are common causes for why administrators cannot start or resize virtual machines in Azure. For more information on Azure subscription and service limits, quotas, and constraints, please refer to https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits.

Other common issues with virtual machines include uploading a Windows VM as specialized, and it is generalized or uploading it as generalized, and it is specialized. Both generate timeout errors while the VM is stuck at the out-of-box experience (OOBE) screen. If uploading as generalized, the image must be Sysprepped. Collecting Azure VM Activity Logs can assist in troubleshooting VM deployments.

Improving VM Availability

In the following sections, we review the high availability and business continuity/disaster recovery solutions natively available in Azure. We review the primary services designed to provide these abilities to virtual machines in this chapter, not PaaS services. PaaS services have resiliency and zone-redundancy options built into them. There are also hundreds of solutions available in the Azure Marketplace from third-party vendors and OEMs. These solutions provide high availability and disaster recovery using software and licensing.

Availability Zones

Azure Availability Zones are separate from one another. The zones within an Azure region that have their own power, cooling, and networking. Availability zones protect workloads in an Azure region from datacenter failure, providing high availability within the Azure region. These Availability Zones provide an SLA on Azure VMs that utilize the zone redundancy an SLA of 99.99%.

Availability Zones support two categories of Azure Services.
  • Zonal devices: Unique Azure services that can be pinned to an Availability Zone, such as a virtual machine or a managed disk.

  • Zone-redundant devices: Azure services that are replicated across Availability Zones automatically as part of their services, such as zone-redundant storage or Azure SQL databases.

Note

The numbering of Azure Availability Zones is not persistent in a zone. Availability Zone 3 in one subscription might be different from Availability Zone 3 in another subscription within the same Azure region. Do not rely on Availability Zone numbering across workloads to mean anything.

Because Azure Availability Zones provide a high-availability service within an Azure region, they support high-speed data transfers due to the distance being relatively small in comparison to cross-geography data transfers. This means Azure Availability Zones can support synchronous replication of the application and data across the zones for high availability. Administrators can then use services such as Azure Site Recovery (covered later in this chapter), for disaster recovery, by utilizing asynchronous replication across geography from Azure region to Azure region.

Availability Sets

Availability sets are a group of virtual machines deployed across a datacenters’ multiple fault domains to mitigate hardware or large-scale outages at the rack or datacenter or datacenter where the virtual machines reside (see Figure 10-15). A fault domain is the hardware that the compute, storage, and networking run on. Availability sets make sure there is not only redundancy, but also a separation of the systems across hosts, racks, storage, networking equipment, and the power subsystem servicing all of these components. Availability sets allow Microsoft to provide and meet the SLA, even if hardware failure occurs. Virtual machines in an availability set are unique, with their own names and configurations, usually driven by the dependency of a commercial off-the-shelf (COTS) application.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig15_HTML.png
Figure 10-15

Azure availability set architecture

Disaster Recovery

Disaster recovery (DR) , also known as business continuity/disaster recovery (BCDR), is frequently confused with high availability (HA). The most basic definition of high availability is no disruption to the service for a long time. Most organizations’ administrators define that as running a workload across maintenance windows without interruption. High availability is usually measured in uptime.

Disaster recovery—sometimes referred to as continuous operations or Continuity of Operations (COOP) —is the process of planning, defining, and instrumenting how an organization continues to provide essential services or mission-critical functions during an event that disrupts the normal business operations. The disruptions events planned for can vary from natural disasters to acts of terrorism or war, depending upon what the service is and its necessity after the disaster occurs. The key difference here is a small outage, which could be anywhere from minutes to days, could be considered acceptable. For this reason, disaster recovery services frequently employ asynchronous replication technologies to replicate data over long distances and away from the primary location, which would have been subjected to the disaster incident.

Microsoft Azure allows customers to replicate from one region to another region that is hundreds of miles or kilometers apart. Depending upon your geography, you’ll find that 300 miles or greater are preferred for the Azure services predefined replication partners. This distance ensures the durability of customer data and workloads in the event of a regional disaster, such as an act of nature.

Due to the speed of light over fiber traveling at approximately 124,000 miles per second, synchronous replication is limited to approximately 100 statute miles as a rule of thumb, which gets about one millisecond of latency per 100 miles, or two seconds, if you’re counting the round trip. Asynchronous replication technology becomes the preferred solution for long distances. Asynchronous replication is very tolerant of high-latency and can work over thousands of miles.

Azure Site Recovery

Cloud providers now have numerous disaster recovery as a service (DRaaS) solutions. Azure Site Recovery (ASR) is a disaster recovery solution that protects workloads by allowing Azure to be the target for failover in the event of a disaster for on-premises physical x64 systems, Microsoft Hyper-V virtual machines, VMware virtual machines, and Azure virtual machines, including on-premises Azure Stack virtual machines. Azure Site Recovery protects both Windows and Linux operating systems and workloads by replicating the blocks of data on the source systems to the disks on the targeted systems, which, if they are in Azure, are powered off virtual machines generating no cost.

Azure Site Recovery can failover workloads to Azure or another customer location. The failing over can be automated or approved as part of a workflow. Failing back to on-premises can also be performed once the disastrous event has been resolved. Figure 10-16 shows how ASR is enabled on a VM in an Azure region in two clicks and how the replication partner is predefined to leverage Microsoft Azure’s global network backbone, which has been optimized for this type of traffic between Azure region pairs. The destination Azure region is configurable.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig16_HTML.jpg
Figure 10-16

Azure site recovery predefined replication

Azure Site Recovery is frequently used as a migration tool, by failing over virtual machines to Azure and then leaving them running there. Azure Site Recovery also supports protecting Amazon Web Service (AWS) virtual machines and other cloud providers but lacks the ability to fail-back due to the lack of access to the source’s infrastructure. Azure Site Recovery replicates data in an encrypted form and then stored the data on disks in Azure that are encrypted at rest as discussed in Chapters 2 and 7.

In relation to backups or disaster recovery, the terms recovery point objective (RPO) and recovery time objective (RTO) express the amount of data that is lost, and the timeframe expected to restore it. Azure Site Recovery provides continuous replication for Azure VMs, Azure Stack VMs, VMware VMs, and a replication frequency as low as 30 seconds for Hyper-V VMs.

You can further reduce RTO by integrating with Azure Traffic Manager to automate the DNS resolution of the workloads. We’re not saying that this is the timeframe that you will get because that is dependent upon network latency, bandwidth, and the volume of change the protected workloads are experiencing. We’re merely stating this is what the service provides and supports, and you what can expect when using the Microsoft backbone.

Azure Site Recovery is not a backup service. ASR does not restore to a point in time. It is continuously trying to mirror the source system to Azure. In doing so, a corrupted file on the source would be replicated to Azure as such. A changed file cannot be reverted to a previous version. These are tasks that Azure backup should be utilized for.

Azure Site Recovery supports the sequencing of multitier apps, which allows administrators to start one workload in an app before another workload that would be dependent on the predecessor. For example, you can start the database server before you start the web server that connects to the database server. Azure Site Recovery also allows administrators to test their DR plans by simulating failovers without impacting the production workloads.

Figure 10-17 illustrates how an n-tier app can be protected, provide automated failover, and client redirection to Azure.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig17_HTML.png
Figure 10-17

Azure site recovery protection

An entire book could be written on Azure Site Recovery, the DR plans, architectures, and various possibilities from other cloud providers to using it as a migration tool. Several different Azure Site Recovery architectures could be deployed to address several workloads and requirements. For a quick guide on how to set up ASR for your workload, please refer to https://docs.microsoft.com/en-us/azure/site-recovery/azure-to-azure-quickstart. The following list outlines Azure Site Recovery’s capabilities.
  • Simple BCDR solution

  • Azure VM replication

  • On-premises VM replication

  • Workload replication

  • Data resilience

  • RTO and RPO targets

  • Keep apps consistent over failover

  • Testing without disruption

  • Flexible failovers

  • Customized recovery plans

  • BCDR integration

  • Azure automation integration

  • Network integration

Azure Site Recovery is relatively cheap in comparison to other solutions on the market for DR. Protecting workloads to your own DR site with Azure Site Recovery can be done for as little as $16 per month, but you are still responsible for all of the costs associated with the secondary location that you’re failing over to.

Azure Site Recovery costs $25 per month, and you aren’t paying for the cost of the VMs in Azure until they’re powered on when the failover occurs. There is no charge for ingress traffic to Azure, as discussed in Chapter 9. Whether you are using Azure Site Recovery to replicate your workloads to your datacenter or Azure, the first 31 days are free, which allows customers to use Azure Site Recovery at no cost.

Scale Sets

Scale sets, also known as virtual machine scale sets , frequently confused with availability sets, are built from availability sets, but with identical virtual machines, making autoscaling and high availability, a much easier goal to achieve. Scale sets, illustrated in Figure 10-18, include the load balancing capability built-in the service; no third-party solution is required.

Scale sets have APIs that can be called to provide a virtual desktop infrastructure or VDI-like experience with deployments, imaging, updates, and so forth. For example, a scale set of 100 DS13v2 Ubuntu Linux virtual machines can be deployed in about five minutes, each having its own 4 TB data disk, which is only responsible for approximately 17 seconds of the overall deployment time!

These virtual machines each have 8 cores and 60 GB of RAM, providing 800 cores and 6 TB of memory to the service. This example illustrates the very rapid deployment and scaling capability provide by scale sets, while still incorporating all the resiliency of the availability sets, since they include no less than five fault domains.
../images/336094_2_En_10_Chapter/336094_2_En_10_Fig18_HTML.png
Figure 10-18

Azure scale set architecture

A sample template to deploy the scale set is at https://github.com/Azure/azure-quickstart-templates/tree/master/201-vmss-public-ip-linux.

Dedicated Hosts

Azure Dedicated Hosts were recently released as a new service enabling customers to use the same hardware customers’ virtual machines are hosted on but without the sharing of resources across customers or subscriptions. Azure Dedicated Hosts’ guests must all reside in the same Azure subscription. Customers may provision dedicated hosts within a region, Availability Zone, and fault domain, but they cannot use Scale Sets, and there are limited VM Series available. Check the Azure products available by region website at https://azure.microsoft.com/en-us/global-infrastructure/services/ for a detailed list of which regions and VM series are available.

Dedicated Hosts support DSv3, ESv3, and FSv2. Dedicated Hosts allow Azure customers to isolate all of their VMs on the host from other customers’ workloads while using the same shared network and storage. Maintenance windows may include updates to the Azure platform infrastructure to improve reliability, performance, security, and to launch new features. Dedicated Hosts provide customers the ability to opt-in or out of maintenance windows within an Azure region, for up to 35 days, to more finely control the maintenance impact to their workloads.

Customers are limited to 3000 vCPUs on dedicated hosts per Azure region. Like most Azure Quotas, this is a default setting and can be upgraded by filing a request through the Azure portal. This process usually takes a couple of days to be approved by the Azure provisioning team. When looking at Azure Quotas, it is important to evaluate all the quotas that may be in effect on a resource. For example, you may have a lower quota available for a specific virtual machine size within a certain region, which could be lower than the Dedicated Hosts quota.

Just like on-premises, when using Dedicated Hosts, if high availability is a desire, virtual machines must be deployed across a minimum of two or more dedicated hosts. Azure Availability Zones provide an additional level of fault tolerance. When deploying virtual machines on dedicated hosts in an Availability Zone, all virtual machines deployed to the hosts must be created in the same Availability Zone. Availability Zones are dedicated, isolated parts of an Azure region. Each Availability Zone is made up of one or more datacenters equipped with independent power, cooling, and networking.

When dedicated hosts are deployed, customers are billed per host. The number, size, and usage of the virtual machines on the host are no longer factored into billing. Customers only see a bill for the hosts; the virtual machines are on the billing, but with a price of $0. Storage, networking, and other services or licenses are billed as usual.

The Azure Dedicated Hosts service is different from most cloud services due to this billing model, where the customers are reserving the host entirely for themselves. This model is very similar to colocation hosting , but Microsoft owns the compute assets; customers are renting not only the space of the hardware footprint but the asset itself. A virtual machine’s state has no impact on billing. A dedicated host with no virtual machines generates the same bill as one with dozens of virtual machines.

Proximity Placement Groups

Proximity placement groups are a new Azure resource that allows customers to group resources that have low latency requirements between each other, to minimize the impact of traversing an Azure region as much as possible. Proximity placement groups enable the co-locating VMs that need low latency to stay within the same datacenter within an Azure region. This is important because an Azure region may have dozens of datacenters, and the distance between servers in the same region could still be significant. Proximity placement groups are used with virtual machines, availability sets, or virtual machine scale sets. They provide the following features.
  • Low latency between stand-alone VMs

  • Low Latency between VMs in a single availability set or a virtual machine scale set

  • Low latency between stand-alone VMs, VMs in multiple Availability Sets, or multiple scale sets (You can have multiple compute resources in a single placement group to bring together a multitiered application.)

  • Low latency between multiple application tiers using different hardware types

When using virtual machines from different VM series, they’re usually configured with different networking and storage capabilities. They have different hardware architectures, which indicates that they’re likely in different racks. When moving existing virtual machines into a Proximity Placement Group for colocation purposes, Azure administrators should shut the virtual machine down to allow it to be moved across the Azure region infrastructure.

Note

When seeking the lowest latency possible between workloads, place the virtual machines in a proximity placement group and the entire solution in a zone. When seeking the most resilient architecture, place your instances across multiple Availability Zones.

To create a proximity placement group (PPG) using the Azure CLI, try the following.
az group create --name myPPGGroup --location westus
az ppg create
   -n myPPG
   -g myPPGGroup
   -l westus
   -t standard
Next, place a virtual machine in the proximity placement group.
az vm create
   -n myVM
   -g myPPGGroup
   --image UbuntuLTS
   --ppg myPPG  
   --generate-ssh-keys
   --size Standard_D1_v2  
   -l westus

Finally, measuring the performance of the virtual machine’s latency is important to serve as a baseline to be referenced later. The act of measuring this is referred to as benchmarking . Many tools benchmark systems performance with their various virtual and physical hardware. Although Ping.exe measures access and latency, ideally, you want to simulate the workload as closely as possible to achieve the most accurate benchmark.

When using proximity placement groups, it is best to measure between two virtual machines vs. pinging Bing.com, where the results can naturally vary over time. Latency measurements are useful in the following scenarios.
  • Establish a benchmark for network latency between the deployed VMs

  • Compare the effects of changes in network latency after changes are made to the following.
    • Operating system or network stack software, including configuration changes

    • A VM deployment method, such as deploying to an Availability Zone or proximity placement group

    • VM properties, such as Accelerated Networking or size changes

    • A virtual network, routing, or filtering changes

ICMP is frequently blocked across network firewalls and routers. To measure latency, Microsoft provides two different tool options.

Using these tools help ensure that only TCP or UDP payload delivery times are measured and not ICMP (Ping) or other packet types that aren’t used by applications and don’t affect their performance.

Spot Virtual Machines

Spot virtual machines allow customers to capitalize on running workloads on idle resources in Azure for a highly reduced price. Spot VMs are only available if Azure can provide the service. When Azure needs the capacity, Azure evicts Spot virtual machines. This means that the workloads run on Spot virtual machines must tolerate that type of service interruption.

Workloads like batch processing jobs and dev/test are good examples of Spot virtual machine uses cases. A Spot VM offers no high availability guarantees, and when Azure needs the capacity back, the Azure infrastructure evicts Spot VMs with 30 seconds notice. Production workloads that have any kind of SLA are not recommended.

Summary

While this chapter reviewed many of the features available to Azure virtual machines, it’s hard not to feel like we barely scratched the surface of what’s available. We hope this chapter provides some clarity around the various options available to make your workloads much more highly available and fault-tolerant.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset