7. Hybrid scenarios (Microsoft SQL IaaS)

This final chapter builds on what you've learned in the previous six chapters and discusses the various ways in which you can develop a hybrid environment, leveraging Azure services to complement your on-premises SQL Server environment.

We will explore several Azure licensing and technical offerings, including Azure Hybrid Benefit and Backup to URL. We will discuss the basic principles of disaster recovery, and then provide use cases for Azure VMs running SQL Server on Windows and Linux, known as infrastructure as a service (IaaS). We'll also cover the ways you can keep a workload in sync between your on-premises and Azure environments, and how these relate back to scalability, migration, and disaster recovery scenarios. We’ll finish with a summary of the chapter.

What is Azure Hybrid Benefit?

The majority of organizations that use SQL Server do so in an on-premises environment with SQL Server Standard or Enterprise editions running on physical or virtual machines in a datacenter.

With the advent of Azure cloud services, you might be tempted to explore this new world but feel that your investment in on-premises licensing is keeping you from doing so. With Azure Hybrid Benefit, you can make use of Azure services at a reduced cost, provided you have Software Assurance or an equivalent subscription license.

Note

Software Assurance is not a license. Instead, you can think of it as an additional benefit that either comes with the license or something that you purchase as an add-on as part of your volume licensing deal. You can read the Software Assurance Frequently Asked Questions at this Microsoft documentation1.

This unique offering gives you the opportunity to build scalable, highly available, and disaster-resilient solutions for your organization, without needing to invest in ongoing hardware maintenance, saving you money and resources in the process. Azure Hybrid Benefit even gives you a free asynchronous disaster recovery replica in Azure if you use availability groups. We discuss this scenario later in the chapter, in the Use cases for SQL Server on Azure VMs section.

There are three key scenarios that a hybrid SQL Server infrastructure provides:

  • A fully redundant disaster recovery environment in the cloud
  • Secure offsite backups to Azure Storage using Backup to URL
  • Read scalability across regions with readable secondaries and replication

    Note

    You can read more about Azure Hybrid Benefit in the Frequently Asked Questions available at this Microsoft documentation2.

Now that we've covered Azure Hybrid Benefit, let's dig into the basics of disaster recovery.

What is disaster recovery?

When disaster strikes, your job as a SQL Server database administrator is to ensure business continuity by recovering the data estate to a previously known good state, in as short a time as possible. Disaster recovery is your organization's insurance policy and relies on support from the organization as well.

Note

A disaster is any event that causes an unplanned interruption in business continuity through unrecoverable failure.

At all times, remember that high availability is not disaster recovery. You're considering what happens when high availability could also fail, even if it makes use of the same underlying technology.

A good disaster recovery plan starts with a healthy database. You can make use of the maintenance features inside SQL Server, including maintenance plans with SQL Agent inside SQL Server Management Studio and Azure Data Studio, PowerShell cmdlets, from the command line using SQLCMD, and those provided by third parties, to keep your databases in good health.

Take native backup—always. If you need to perform point-in-time recovery for any reason, then you need to know about the recovery models SQL Server offers (full, bulk-logged, and simple). This, in turn, will help you learn the difference between full, differential, and transaction log backups. Practice backing up and restoring databases, including system databases. Learn when you would need to do a copy-only backup, and how they affect differential backups.

You can back up your data estate using Azure Storage as a target, which we discuss in the Backing up databases to a URL section later in this chapter.

Once you've backed up your database, the only way to prove that your backups can be recovered to a previously known good state is to continuously test that backup. The best way to test a SQL Server backup is by restoring it and running a data consistency check with the DBCC CHECKDB Transact-SQL (T-SQL) command against it. No other method can provide the same peace of mind. If your organization uses file-system-level backups, make sure you also have SQL Server native backups close by.

Finally, and especially for the purposes of this chapter, you need to make sure that your verified backups are copied securely offsite (remember that most databases contain sensitive data) so that if something happens to your on-premises environment, you can get those backups and restore them as fast as possible. Encrypt your backups, even if you don't encrypt your database, and you should encrypt your database.

Note

Please refer to Microsoft Docs3 for information about disaster recovery with Azure SQL Database.

Disaster recovery is defined by your organization in a service-level agreement or SLA. This document may be part of a larger business continuity plan, and it should comprise at least two main points:

  • Recovery point objective (RPO)
  • Recovery time objective (RTO)

We'll talk about each of these in the following sections and give you a refresher on Accelerated database recovery (ADR).

Recovery point objective

When disaster strikes, your goal is to ensure minimal data loss, which is usually measured in seconds or minutes. The RPO defines how much data your organization is prepared to lose should something go wrong. For example, your office building suffers an electrical fire and the server room burns down. If your SLA states that up to 15 minutes of data loss is acceptable to the business, and you have a plan in place to keep your disaster recovery site in sync within a 5- to 10-minute window, then you are probably going to be within the requirements of the SLA.

Even with a hybrid setup, you need to make sure you are taking regular database backups and the backup frequency should be at least one half of the data loss window to allow time for backups to find their way to a secure offsite location. For example, say your RPO is 15 minutes. If your transaction log backups are set to a 15-minute interval and a disaster occurs after a backup has taken place but before those backups were copied securely offsite, your data loss will be greater than 15 minutes, taking you outside the requirements of the SLA.

Recovery time objective

After a disaster has taken down your data estate, the RTO dictates how quickly you need to bring up the environment again, which is usually measured in minutes or hours. It doesn't matter how you achieve this, and generally, budget and resource constraints will define the most cost-effective manner to do so. A major consideration is how you prioritize data recovery. You may have a lot of archive data that is infrequently queried, which does not need to be brought online immediately. Your SLA must define terms for this process as well.

Again, even in a hybrid environment, you must ensure that your database backups can be restored as quickly as possible. It is feasible that a failover could itself fail, so think of your backups as a last resort that must always work. Use a combination of full, differential, and transaction log backups to reduce this recovery window.

Note

You can think of transaction log backups as incremental backups, whereas differential backups can be used as a shortcut to avoid having to restore each transaction log in sequence. Both differential and log backups can be combined with the last full backup to bring your database online.

The DR plan should also make allowances for network-related failover, including DNS propagation. We recommend that you set your time-to-live (TTL) values to between 5 and 15 minutes, ensuring a faster failover if an IP address must be updated.

Accelerated database recovery

ADR is a new feature introduced in SQL Server 2019 that works at the database level. It is not enabled by default because it changes how SQL Server uses the data file. However, its benefits are most apparent during crash recovery, which happens whenever a database is brought online, restored from backup, or when SQL Server starts up. Instead of using the transaction log file or TempDB, ADR makes use of the main data file for keeping track of transaction state and can dramatically reduce rollback times for long-running transactions.

Note

ADR is also available in Azure SQL Database.

ADR increases data file usage, but the trade-off is improved recovery times, which benefits disaster recovery.

Now let's address some licensing benefits around disaster recovery before getting into the next section.

How does licensing influence disaster recovery?

Under the terms of Azure Hybrid Benefit, you are allowed two failover instances for disaster recovery for each primary workload, provided that one of these is located on an Azure VM. For the second server, the only condition is that it must be dedicated for your use. It can be hosted on-premises or on an Azure VM.

The benefit is calculated based on the number of cores licensed for your primary workload and depends on whether your primary workload is located on-premises or on an Azure VM. In both cases, you get one free passive core for high availability / disaster recovery and one free passive core for disaster recovery (asynchronous commit only). If your primary workload is on-premises, you also get one free passive core for DR on SQL Server on an Azure VM (asynchronous commit only).

Note that these benefits apply exclusively to the SQL Server core license. You are responsible for Azure-related costs including Azure Storage, compute, and networking. If you have questions, you can always speak to your Microsoft licensing specialist to determine what rights you have as per your enterprise agreement.

Note

Read more about these benefits on the SQL Server blog4. Additional information on SQL service licensing and disaster recovery benefits can be found at this Microsoft blog5.

In the next section, we'll look at an easy way to get database backups copied securely off-site.

Backing up databases to a URL

Starting with SQL Server 2012 with Service Pack 1 CU 2, you can back up your SQL Server database to an Azure Storage account, also known as Backup to URL. Since SQL Server 2016, you can even back up your database to a URL simultaneously with a regular on-premises backup. This allows you to have both a local and secure offsite backup of your databases from one backup command.

Note

Backup to URL can be executed from the SQL Server Management Studio (SSMS) backup wizard, T-SQL, SQL Server Management Objects (SMO), and PowerShell cmdlets, including third-party PowerShell modules. For more about this feature, visit Microsoft Docs6.

Backup to URL gives you the peace of mind that when disaster strikes, you will be able to access your backups almost immediately. With a solid disaster recovery plan, you can have a standby Azure VM restoring those backups on a regular schedule (see the As a backup-restore target section later in this chapter). You can even configure your Azure Storage account to have a geo-redundant replica in a secondary region, which provides even greater redundancy.

How to back up to a URL

To back up to a URL, you first need an Azure Storage account and container. The container must be set to private (the default) to ensure authorized access only. Then, you will create a Shared Access Signature (SAS) in the Azure portal to allow access to that container. Inside SQL Server, you will create a credential that uses that SAS when performing the actual backup and restore tasks.

Once you have a Storage account, container, and credential, you will then choose a blob type. With the SAS credential, you will choose the block blob type. If your backup exceeds 200 GB in size, you can use backup compression and striping to back up larger databases.

To get the best performance out of Backup to URL, Microsoft recommends using the following arguments:

  • BLOCKSIZE = 65536 (65,536)
  • MAXTRANSFERSIZE = 4194304 (4,194,304)

Now let's dive into the main section of this chapter, namely SQL Server on Azure VMs.

Use cases for SQL Server on Azure VMs

This section will demonstrate ways to leverage Azure VMs running SQL Server in a hybrid scenario, for scalability, migration, and disaster recovery.

As we mentioned previously in this book, SQL Server 2019 runs on both Windows and Linux, with almost complete feature parity between the two operating systems. You will access SQL Server on Linux using the same set of tools, and many of the common tools for scalability, migration, and disaster recovery, including backup and restore, log shipping, consistency checks, transactional replication, and so on. Even availability groups can be leveraged in a hybrid environment using an asynchronous commit.

The following sections present three sample use cases for Azure VMs in a hybrid environment:

  • As a backup-restore target
  • As an availability group replica
  • As a transactional replication subscriber

The important thing to remember here is that a technology or feature doesn't have a single use. Each sample can be used for read scalability, database migration, and disaster recovery. It is worth mentioning that these methods should be considered read-only where your on-premises instance is the single version of the truth. In other words, the data flows in one direction.

As a backup-restore target

You can keep a standby server in sync by restoring transaction log backups for one or more databases on a regular schedule. There is already a feature built into SQL Server called Log Shipping, which uses a shared network location to transfer regular backups between servers on the same network and restores them on a schedule. Using those same principles, you can achieve the same outcomes with an Azure VM as your target. You just need to ensure that the transaction log backups are securely copied to a shared location, where the standby server is waiting to pick them up and restore them in the correct order.

When leveraging an Azure VM as a standby server, you can back up your transaction logs to a URL, which stores those files in your Azure Storage account. After a successful backup, you can generate a T-SQL restore script from the backup history in your on-premises msdb database, and transfer that script to the same storage account (using a synchronization tool like AzCopy inside a PowerShell script, for example). On the Azure VM side, you will retrieve the T-SQL script on a schedule, also using AzCopy, and then run the script that restores the databases directly from Azure Storage.

Backup-restore is a cost-effective way to get your on-premises database to synchronize with a standby server, and it works on any edition of SQL Server. Keep in mind that all connections to the restoring database will be dropped when the next transaction log is restored.

As an availability group replica

While Always On availability groups provide a high availability solution to ensure that your environment has minimal downtime, they are not best suited to sharing a workload between your on-premises Windows instance and an offsite replica (running on Windows or Linux). For this, you would use either one or more clusterless replicas, or an entire distributed availability group, in combination with ADR.

A clusterless availability group replica is kept mostly in sync using the asynchronous commit mode. Depending on network latency and throughput, you may find that your VM is able to keep up with your workload. If it falls behind, you can monitor this according to the requirements of your SLA.

If you want more redundancy, with the understanding that this is still not a high availability scenario, you can build a distributed availability group with multiple nodes in its own availability group. Your on-premises primary replica will send data to the distributed availability group via the forwarder, which is the primary replica in the distributed availability group.

Note

You can read more about distributed availability groups from Microsoft Docs7.

As a transactional replication subscriber

A third option is transactional replication, which shares your workload between your on-premises environment and SQL Server on an Azure VM at the individual table level. You set up a publication and distributor on-premises and have the Azure VM as the subscriber. You define which tables will be synchronized, and let replication keep your databases in sync. This flexibility allows you to point end-users to the new Azure VM for read-only purposes, thereby scaling out your environment without the expense of scaling up your on-premises infrastructure.

The two types of SQL Server replication supported by both Windows and Linux are:

  • Transactional replication: Best for servers that are in constant communication and need to deliver data downstream all the time. Data flows from the primary database (distributor) to one or more secondary databases (subscribers) elsewhere. Remember to ensure that your replication environment is appropriately licensed.
  • Snapshot replication: Used for creating the original snapshot before transactional replication takes over and is useful when you need to perform a refresh of the entire data set.

    Note

    Peer-to-peer transactional replication and merge replication are not supported on Linux. If you require bi-directional sync, your publisher and subscriber should both use Windows Server. For more information about SQL Server replication, refer to this Microsoft documentation8.

SQL Server on Linux supports Active Directory (AD) authentication, which means that AD is supported with replication. Provided that the appropriate network ports are open on the Azure VM firewall (and associated Azure Network Security Group), SQL Server does not care whether the underlying operating system is Windows or Linux.

Note

You can follow a detailed walkthrough for setting up replication on Linux at Microsoft Docs9.

Hybrid scenarios

Taking what you've learned from this book up to this point, including the earlier parts of this chapter, you can start to build a picture in your mind of how SQL Server on an Azure VM can help you with creating a hybrid solution at any scale.

Whether for read-scalability, migration, or disaster recovery, you can run your on-premises SQL Server instance on Windows Server and have the same workload in SQL Server on an Azure VM at the same time.

Keep in mind that with Software Assurance, the free passive SQL Server replica can be used to synchronize with your primary replica (using asynchronous commit and manual failover) and run these maintenance operations:

  • Database consistency checks
  • Full and transaction log backups
  • Monitoring resource usage data

Additionally, you can run disaster recovery testing every 90 days with primary and disaster recovery replicas running simultaneously for brief periods.

Should you desire additional operations for your replica, including synchronous commit and automatic failover, your secondary disaster recovery replica must be appropriately licensed. With Azure Hybrid Benefit, for example, you can have a failover secondary in synchronous commit mode with automatic failover.

Scenario 1: Read scale workloads

You have an on-premises SQL Server database located on the West Coast of the United States and customers all over the world: Johannesburg, Toronto, Seoul, London, and Sydney. Your customers want to query the database for reporting and analysis, but they are complaining about performance introduced by network latency.

As the DBA, you notice performance issues on your production server due to locking and blocking. If you could somehow offload that workload into a read-only copy of the data that is closer to them, you could free up resources in your production environment.

This is a typical read-scale scenario. Using the customer distribution above, you can place Azure VMs running SQL Server on Windows or Linux in the regions closest to your customers, for example, West US (for customers closest to you), South Africa North, Canada Central, Korea South, UK South, and Australia East.

Note

You can see all available Azure regions at this Microsoft documentation10.

You can use all three use cases described in the previous section, with the following provisions:

  • Backup-restore: Customers will be unable to connect during a restore process. To ensure a better user experience, use ADR to speed up the rollback portion of the restore process, and add retry logic to your application code to work around disconnections.
  • Readable secondary replica: Queries on the readable secondary might impact performance on the primary replica and vice versa.
  • Transactional replication: Each table that will be replicated must be configured individually, but the benefit is that you only need to replicate the tables that are needed for read scalability. You are trading more configuration up front at the publisher and subscriber level, against more granular access to the data for your customers.

In fact, you could use a combination of all three, because customers in each location may have different requirements. You have a lot of flexibility with read-scale workloads, and a hybrid Azure infrastructure is well-suited to this scenario.

Scenario 2: Migrating a workload

Assume you are preparing to lift and shift an on-premises database or instance to SQL Server on an Azure VM.

You may think that the easiest way to move a database from one server to another is by detaching the database, moving or copying the files to the new server, and reattaching them; however, if you move the files, there is no way to roll back if the reattach fails. So always (always!) ensure that you have a valid, tested, full SQL Server backup of your database.

Note

Read more about detaching and attaching databases at Microsoft Docs11.

Irrespective of whether you restore or reattach the database, and depending on the version of SQL Server at the other end, the internal system tables will be automatically upgraded to that build. Once the database is brought online again, you can continue using it.

Both backup-restore and detach-attach require downtime. With techniques like log shipping, clusterless availability groups, or even replication, you can reduce the downtime dramatically.

We'll cover ways to keep your primary database up for as long as possible before cutting over to the new server, which also makes it easier to roll back if something goes wrong.

In this scenario, you can use two of the use cases described in the previous section, with the following provisions:

  • Backup-restore: This use case requires some downtime while failing over to the new environment. You should plan a maintenance window to cut over to the new VM and allow time for data consistency checks, DNS migrations (see the note on TTL earlier in this chapter), and updating application connection strings.
  • Distributed availability group: Instead of having a read-only replica, you will create a new distributed availability group. During a planned maintenance window, you will wait for any outstanding transactions to synchronize to the new availability group, then update application connection strings to point to the new availability group listener. The advantage is that the new availability group is already highly available.

SQL Server replication is not recommended for a migration scenario. While peer-to-peer transactional replication and merge replication both offer bi-directional synchronization, those benefits are outweighed by the complexity and lack of flexibility at a database level.

Note

You can use Azure Site Recovery for migrating a workload to Azure, but you must plan your failover during a maintenance window, and your target operating system must match your on-premises environment. You cannot use Azure Site Recovery to fail over from Windows to Linux. Visit Microsoft Docs12 for more information.

Scenario 3: Disaster recovery

Your disaster recovery plan is defined by the organization's SLA, and that SLA is guided by how much money the organization is prepared to spend, along with the resources it has available to perform the recovery. Your solution might be as simple as a PowerShell script that creates an Azure VM on the fly and restores the databases from a shared location like Azure Storage.

Or, as we've demonstrated in this chapter, you can dramatically reduce the RTO by having a standby server ready to go, which is kept as up to date as possible in an automated way, using backup-restore, clusterless replicas, distributed availability groups, Azure Site Recovery, or even replication. This way, when disaster strikes, you have much less to worry about. Even so, it is always worthwhile having a Plan B (where B stands for "backups"), where you can restore your databases no matter what.

Note

As mentioned throughout this chapter, using the same features to achieve different goals is common in SQL Server.

Should an unplanned failure occur and the potential data loss due to latency is within your RPO, you can be up and running on your new server in seconds or minutes, especially with ADR enabled.

In this final scenario, you can use two of the use cases described in the previous section, plus a bonus option:

  • Backup-restore: This can be used to keep your database in sync in the event of a disaster. Whether you use the built-in log shipping configuration or roll your own solution, SQL Server can keep restoring your log backups on a regular schedule, such that the amount of data you can afford to lose (defined by the RPO) is protected by the frequency of log file restores on your Azure Linux VM. With cost-effective monitoring in place, you can keep track of how far behind the target database is. With ADR, recovery time is dramatically reduced, especially for long-running transactions.
  • Clusterless replica or distributed availability group: You can reduce your RTO by having either a distributed availability group or a clusterless availability group replica on standby, keeping in mind that RPO. You can also keep track of the synchronization state using built-in monitoring features of SQL Server. You can even use the same underlying SQL Server instance that you configured for read scalability. The server can be readily failed over to a primary availability group in the event of a disaster. In the case of a distributed availability group, you are failing over to a highly available solution.
  • Azure Site Recovery: This is primarily a disaster recovery solution, where physical servers and VMs running on VMware or Hyper-V can be replicated to a secondary site. Applications and workloads supported include SQL Server. This is the recommended disaster recovery solution for a hybrid infrastructure.

    Note

    You can read more about SQL Server support in Azure Site Recovery on Microsoft Docs13.

Summary

Building a hybrid infrastructure leveraging Azure services that support your on-premises environment is straightforward. Whether for scalability, migration, or disaster recovery, an Azure VM running SQL Server works the same way on Windows or Linux, with the benefit of reduced OS licensing costs when considering Linux. With software assurance and Azure Hybrid Benefit, you can reduce your costs further by leveraging this Azure VM as a standby server using some of the techniques discussed in this chapter.

Chapter links

  1. https://bit.ly/36vt762
  2. https://bit.ly/3d3W9wg
  3. https://bit.ly/2XyofcB
  4. https://bit.ly/2yy72HH
  5. https://bit.ly/2ZyWvqN
  6. https://bit.ly/2LUVUaW
  7. https://bit.ly/2Ac3sna
  8. https://bit.ly/3bWtqbm
  9. https://bit.ly/3c1mSrV
  10. https://bit.ly/2WZvf2S
  11. https://bit.ly/2LXCWQV
  12. https://bit.ly/3gkxkOx
  13. https://bit.ly/2XvLl3u

By Randolph West

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset