Chapter 16
Designing and implementing hybrid and Azure database infrastructure

In this chapter, we examine the options for designing a database infrastructure where some or all your data is hosted in a public cloud, specifically the Microsoft Azure cloud. The chapter begins with an overview of cloud and Azure concepts. We then discuss the considerations for deploying SQL Server-based infrastructure using infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), or in a hybrid fashion. We end the chapter with a cursory listing of some non-SQL Server data platform services in Azure.

Note

Microsoft Azure consists of multiple “clouds,” including Azure Government and the sovereign clouds in China and Germany. The content of this chapter applies to the public cloud. Service and feature availability may vary in the other environments.

Cloud computing and Microsoft Azure

You have likely already heard or read many different definitions of cloud computing. Rather than add yet one more, we will briefly discuss some key features of cloud computing and how they apply to SQL Server offerings in Azure.

  • Financial accounting. The first concept, expenses, relates to accounting. With traditional on-premises environments, there is usually a significant initial outlay of capital. This is called capital expenditure, or “CapEx.” Expenses in Azure, on the other hand, generally fall under the category of operational expenditure, or “OpEx.” With OpEx, there is no initial monetary outlay and mostly no long-term financial commitment. The fees you pay are pay-per-use charges and are all-inclusive: hardware, licensing, electricity, monitoring, and more.

    Under some Azure subscription models, you are incentivized for committing to a minimum annual spend in return for a reduced service cost. It is important to note that OpEx might not be cheaper than CapEx overall, depending on how efficiently cloud services are provisioned and used. Those considerations are beyond the scope of this text, but we strongly encourage you to plan early for optimizing your resource allocation.

  • Elasticity. The second concept in cloud computing is elasticity, which means that the resources you provision are not fixed in terms of capacity. In on-premises environments, you provision hardware and software (licensing) sufficient to accommodate peak demand. In Azure, elasticity gives you the ability to scale up and down or out and in as needed to accommodate demand at any given moment.

  • Control. Finally, control also becomes a discussion topic. With on-premises deployments of SQL Server, the DBA team decides which hardware to select, when to apply patches, and when to upgrade to a major new release. If you select an Azure platform-as-a-service offering, it’s the team at Microsoft that makes these decisions. The team announces changes and updates using a variety of channels, and, as a cloud DBA, one of your tasks will include regularly reviewing these announcements. You will need to thoroughly understand your Azure environment to determine which changes or updates will affect your application(s).

Database-as-a-service

Azure provides many types of services, including virtual machines (VMs), web applications, and, of course, Azure SQL Database. Cloud services are often categorized in one of three types: infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS). In this book, we often refer to Azure SQL Database as database-as-a-service (DBaaS), which is a specialized type of PaaS.

You may also choose to host SQL Server databases in the cloud using Azure VM images, which can come with a SQL Server version preinstalled. In that case, you are using IaaS. With IaaS, you gain increased control and complete feature parity with on-premises deployments. IaaS also introduces more responsibility for sizing the VM specifications appropriately and managing software updates for both the operating system (OS) and SQL Server.

Managed instance (MI) is a more recent addition to the SQL Server offerings in Azure. MI strikes a better balance between feature parity and control than IaaS. We discuss managed instance in Chapter 18.

Managing Azure: The Azure portal and PowerShell Core

When you are ready to begin using Azure, you will need to deploy, manage, and eventually tear down resources when applications are retired or upgraded. To manage on-premises Microsoft environments, you might use various GUI tools (often based on the Microsoft Management Console) or PowerShell. The primary GUI in Azure is the Azure portal. It will help if you also become comfortable managing your resources using PowerShell. With the advent of PowerShell Core a few years ago, PowerShell scripts can now be executed on a variety of platforms: Linux, macOS, and, of course, Windows.

A third option for managing Azure is the Azure -Line Interface (CLI). You can use the Azure CLI across platforms (Windows, macOS, and Linux) and within the portal using Azure Cloud Shell.

Note

PowerShell Core is available cross-platform, but not all PowerShell commands are supported on non-Windows operating systems.

Unlike Windows PowerShell, PowerShell Core is not yet part of the Windows operating system, or indeed any other OS. If you need to install PowerShell Core, refer to the Microsoft Docs at https://docs.microsoft.com/powershell/scripting/install/installing-powershell. The GitHub releases page lists preview releases in addition to regular releases. The releases shown first might be preview releases, so scroll down to find the release marked latest.

For managing Azure and Azure SQL Database using PowerShell Core, you should always use the latest Azure PowerShell module. The module is updated frequently, so be sure to regularly check for updates. You can install the PowerShell module using the following PowerShell command, run with Administrator privileges:

Install-Module -Name Az -AllowClobber

Note

The -AllowClobber parameter is generally only necessary if you have the AzureRM module installed in Windows PowerShell. The AzureRM PowerShell module will be maintained through December 2020, but hasn’t received feature updates since December 2018. You should plan to migrate any scripts that use the AzureRM module to the Az module.

Be patient while the module downloads all its child modules; there are quite a few of them to manage most Azure services.

If you need to update the module, use the following:

Update-Module Az

Before you can manage your Azure resources with PowerShell Core, you will need to log in. Use the Login-AzAccount to use the device login mechanism.

Note

At the time of writing, the device login mechanism, which requires you to copy and paste a code into a website to authenticate, is the only option for the Az module in PowerShell Core. Later, Microsoft will likely implement the modern authentication flow, as is the case with the AzureRM and Az modules in Windows PowerShell.

After logging in, the Login-AzAccount cmdlet outputs some information about the active subscription. If you need to switch the subscription, use the Get-AzSubscription cmdlet to see a list of all subscriptions your account can manage. You can then use the Select-AzSubscription cmdlet to change the active subscription to another one. This is illustrated using the commands that follow and assumes that you have a subscription with the name “Pay-As-You-Go.”

Get-AzSubscription

# A list of subscriptions is displayed.

# You can copy and paste a subscription name on the next line.

Select-AzSubscription -SubscriptionName 'Pay-As-You-Go'

Note

This section intentionally does not cover managing Azure resources. Managing SQL resources in Azure is discussed in Chapter 17 and Chapter 18.

Finally, Azure exposes a complete REST API that third-party tools or in-house developed tools can use to perform virtually any action in the Azure environment. Developers may not even need to call the APIs directly because for many platforms, official client libraries are available. The use of the REST API is not covered in this book.

Azure governance

Even relatively modest on-premises environments require governance—the organizational processes and procedures by which the environment is managed and responsibilities are delineated. Governance is also a necessity in cloud environments. In this chapter, we can’t delve into all of the governance issues related to cloud operations. We do, however, discuss some features of Azure that allow governance to be formalized.

Azure resources are organized in a hierarchy of containers. The container at the top level is the subscription. The subscription is primarily a billing boundary—all resources in a single subscription appear on a single bill and have the same billing cycle. There are also life cycle consequences: should a subscription be discontinued, all resources within the subscription will stop. (Eventually, the subscription and resources will be deleted.) Security configuration is also associated with the subscription: a subscription trusts a single Azure Active Directory (Azure AD) instance. This means that all user accounts used to manage resources within the subscription must exist within the trusted Azure AD instance. Microsoft accounts, or user accounts from other Azure AD instances, can be added as external users to the trusted instance. An organization can choose to have multiple Azure subscriptions that trust the same Azure AD instance.

As Figure 16-1 illustrates, a single subscription can have many resources of several types, and Azure SQL Database is just one. Resource groups exist to allow organizing these resources by life cycle and to provide a security boundary. Resource groups are logical containers that have a name and a little metadata. The resources in a resource group are deleted if the resource group itself is deleted, hence the life cycle relationship between the resources and the resource group.

The diagram shows several rounded rectangles representing three Azure subscriptions and one Azure Active Directory. There is a double arrow between the Azure Active directory and each subscription, indicating a trust relationship. Inside one of the Azure subscriptions, there are three rounded rectangles representing resource groups. Inside one resource group, there are three rounded rectangles representing individual Azure resources.

Figure 16-1 The container relationship between an Azure subscription, resource groups, and resources.

Using Role-Based Access Control, permissions can be granted on a resource group, and those permissions will apply to the resources within the group. Configuring permissions this way can be a huge timesaver and increase visibility into permission assignments. This is discussed in more detail in the “Security in Azure SQL Database” section in Chapter 17. Up front, you need to know that the permissions assigned to the Azure SQL Database resource don’t grant permissions to login to the database itself.

Note

You can move resources between resource groups and subscriptions. When moving resources, the Azure region in which they are located will not change, even if the target resource group’s location is different. The location of a resource group determines only where that resource group’s metadata is stored, not where the actual resources are hosted.

Cloud-first

If you’ve been working with SQL Server for more than a few years, you’ve likely noticed the increased release cadence. This is a direct result of the cloud-first approach in SQL Server product development that Microsoft adopted a few years ago. Cloud-first in this context means that new features are generally first made available in Azure SQL Database as a preview feature. Those preview features are usually opt-in and are closely monitored by the product team. The close monitoring allows the team to quickly identify usage patterns and issues. These features are then included in the next SQL Server release. Examples of features released this way for SQL Server 2019 include the SHORTEST_PATH graph function, intelligent query processing, and Unicode support. Previously, Always Encrypted, dynamic data masking, and graph tables were made available in Azure SQL Database prior to their release in SQL Server 2017.

Resource scalability

Scalability is a key feature of cloud computing, and can be considered along a vertical axis of performance (scaling up) or a horizontal axis (scaling out). Scaling up means that additional hardware resources are assigned to a server or to a database. Scaling out means that a database is either broken up into multiple databases, each holding a portion of the data (sharding), or that additional copies of the database are made available for read-only connections, such that those read-only connections are offloaded from the primary database, which will handle the writes.

Generally, scalability with PaaS resources is easier than with IaaS resources. Scaling a VM up or down causes some downtime, but it still is considerably less onerous than moving a database to different hardware as the workload changes. You can scale up or scale down an Azure SQL Database with minimal impact to applications. Depending on the size of the database and the nature of the scale operation, the operation can take several hours to complete. This scaling activity is completed while the database is online, though we must point out that at the end of the scale operation, existing connections are dropped. This benefit, and many others, of PaaS may encourage you to look at PaaS first for your cloud database needs. That is a valid strategy: choose PaaS unless it is unable to meet your requirements, in which case you fall back to IaaS.

When managing many databases, each with potentially many different scaling needs, you should also consider running these databases in an elastic pool. An elastic pool is a grouping of databases on a single logical server that share the resources available within the pool. We discuss elastic pools in depth in the “Elastic database pools” section later in this chapter.

Scaling out in Azure SQL Database (recall, that is the PaaS offering) can be accomplished using read-only replica in service tiers that support it. Read-scale replicas are discussed later in this chapter, along with elastic pools, which are ideally suited for sharding databases.

Networking in Azure

Networking in Azure is an extensive topic that cannot be fully covered in a single section of this chapter. Therefore, we focus on the aspects of Azure networking that are critical to the successful operation of the various types of SQL Server deployments.

In Azure, a virtual network is a resource type and it’s appropriately called Virtual Network (VNet). A VNet defines an IP range, subnets, and more, as discussed later in this section. In some Azure deployments, you may not find a VNet: many PaaS resources don’t require it and some deployments wouldn’t benefit from having a VNet. However, many IaaS and some PaaS resources require a VNet to be present. These include the virtual machine and managed instance resource types.

You might not be too excited about exposing services to the public Internet. To secure your virtual network, you can use network security groups (discussed a little later) or Azure Firewall. Azure Firewall is a fully managed service that provides granular control over connections to and from the virtual network.

A VNet can be divided into multiple subnets. Each subnet might then have its own route table, network security group, and service endpoints. Brief definitions of some of these networking features follow:

  • Network security group. A network security group combines security rules that define the inbound and outbound traffic that is allowed or denied. A network security group can be applied to a subnet or a single VM network interface. Network security groups can define source and target IP ranges and ports, service tags, or application security groups.

  • Service tag. A service tag allows you to define a rule in a network security group without knowing the details about source or target IP addresses. An example service tag relevant for this book is Sql.KoreaCentral. This service tag defines the IP addresses of the Azure SQL Database service in the Korea Central region of Azure.

  • Application security group. An application security group (ASG) is essentially just a name. An ASG can be assigned to multiple network interfaces, and a network interface can have multiple ASGs assigned. The ASG can then be used instead of a source or destination IP or IP range, thus simplifying the maintenance of security rules as VMs are added or removed from your Azure infrastructure. An example of how ASGs can simplify rules is available at https://docs.microsoft.com/azure/virtual-network/security-overview#application-security-groups.

  • Service endpoint. A service endpoint is a connection between the VNet and Azure services. Using service endpoints increases security because the Azure services deployed to the VNet using a service endpoint are accessible only from the VNet. Service endpoints also provide routing benefits because the traffic between the resources in the VNet and the Azure services does not take the same route as Internet traffic. Not all Azure services support service endpoints, but Azure SQL Database does.

Private Link, which is in preview at the time of writing, is a technique that allows you to assign a private IP address from a VNet to a PaaS service. By enabling Private Link, public endpoints traditionally assigned to PaaS services can be completely ignored. Connections to the service can be established only from the same VNet, a peered VNet, a cross-region VNet-to-VNet connection, Express Route, or VPN. Like service endpoints, only select PaaS services support Private Link. Azure SQL DB supports it, but managed instance does not (perhaps, yet). An interesting aspect of the Private Link configuration for Azure SQL DB is that both inbound as well as outbound access to the Internet is disabled.

Cloud models and SQL Server

Azure offers several ways to consider the management and querying of data in a SQL Server or SQL Server-like environment. This section is divided into three main areas:

  • Infrastructure-as-a-Service. SQL Server running on an Azure Virtual Machine (VM).

  • Platform-as-a-Service. Azure SQL Database, including managed instance, and Azure Synapse Analytics. These services may also be referred to as Database-as-a-Service (DBaaS).

  • Hybrid cloud. Combines the best features of on-premises SQL Server with Azure. Specific SQL Server features that support this are Stretch Database and Backup to URL.

Azure SQL resources, whether IaaS or PaaS, can be managed in a centralized management hub. This centralized view of SQL Server VMs, Azure SQL Database logical servers and databases, and Azure SQL Database managed instances can be used to efficiently manage SQL resources at scale. In addition to managing existing SQL resources, the experience of creating new SQL resources is also streamlined.

In each section, we discuss specific terminology and configuration considerations for running an optimal SQL Server deployment in Azure. Specifics for deploying Azure SQL Database and Managed Instance are covered in the next two chapters.

Infrastructure-as-a-service

Take what you’ve learned in the first three chapters of the book about VMs, and that’s infrastructure-as-a-service (IaaS) in a nutshell, optimized for a SQL Server environment.

As we detail in Chapter 2, “Introducing database server components,” a VM shares physical resources with other VMs. In the case of Azure Virtual Machines, there are some configurations and optimizations that can make your SQL Server implementation perform well, without requiring insight into the other guest VMs.

When creating a SQL Server VM in Azure, you can choose from different templates, which provide a wide range of options for different virtual hardware configurations, OS, and, of course, versions and editions of SQL Server.

Azure VMs are priced according to a time-based usage model, which makes it easier to get started. You pay per minute or per hour, depending on the resources you need, so you can start small and scale upward. In many cases, if performance is not acceptable, moving to better virtual hardware is very easy and requires only a few minutes of downtime.

Azure VM performance optimization

Many of the same rules that we outlined for physical hardware and VMs in Chapter 3, “Designing and implementing an on-premises database infrastructure,” apply also to those Azure VMs used for SQL Server. These include setting Power Saving settings to High Performance, configuring Max Server Memory usage correctly, spreading TempDB over several files, and so on.

When you deploy the VM using one of the Microsoft-provided SQL Server templates, some of these tasks have been done for you. For instance, power-saving settings are set to High Performance already, and TempDB files are configured properly when you configure a new Azure VM running SQL Server 2016 or higher.

Azure VMs have a limited selection for storage. You don’t have the luxury of custom Direct-Attached Storage or Storage-Area Networks dedicated to your environment. Instead, you can choose from the following options:

  • Standard hard disk drives (HDDs) or multiple solid-state storage (SSDs) types

  • Unmanaged or Managed Disks

  • SQL Server data files in Azure Storage

Virtual hard drives

Virtual hard drives (VHDs) in Azure are provided through several storage offerings (note that these values may change after publication):

  • Standard HDD. Designed for backups and low-cost, latency-insensitive workloads. The maximum throughput is 500 Mbps, the maximum IOPS is 2,000.

  • Standard SSD. Suited for low use enterprise workloads, including web servers, and development and test environments. There is a 750 Mbps maximum throughput, and 6,000 maximum IOPS.

  • Premium SSD. Recommended for production workloads and other performance-sensitive environments, Premium SSD offers 900 Mbps max throughput, and 20,000 max IOPS.

  • Ultra SSD. Ideal for transaction-heavy workloads, including Large scale database and analytics environments. They offer 2,000 Mbps max throughput and 160,000 max IOPS.

Given our previously stated guidance to use the fastest drive possible, SQL Server is going to make better use of Premium SSD or Ultra SSD, because SQL Server’s main performance bottleneck is TempDB, followed closely by transaction log files, all of which are I/O-bound.

It is possible to choose between unmanaged and managed disks:

  • Unmanaged disks. VHDs are managed by you, from creating the storage account and the container, to attaching these VHDs to a VM and configuring any redundancy and scalability.

  • Managed disks. You specify the size and type of drive you need (Premium or Standard), and Azure handles the creation, management, and scalability for you.

Unmanaged disks give you more control over the drive, whereas managed disks are handled automatically, including resiliency and redundancy. Naturally, managed disks have a higher cost associated with them, but this comes with peace of mind.

As of this writing, there are several sizes to choose for Premium Storage, with more offerings being added all the time. Here is a selection of these offerings.

Table 3-1 Premium Storage offerings in Azure Storage

Type

Disk Size

IOPS per disk

Throughput per disk

P4

32 GB

120

25 MBps

P6

64 GB

240

50 MBps

P10

128 GB

500

100 MBps

P15

256 GB

1,100

125 MBps

P20

512 GB

2,300

150 MBps

P30

1 TB

5,000

200 MBps

P40

2 TB

7,500

250 MBps

P50

4 TB

7,500

250 MBps

P60

8 TB

16,000

500 MBps

P70

16 TB

18,000

750 MBps

P80

32 TB

20,000

900 MBps

Note

Maximum throughput can be limited by Azure VM bandwidth. Each Azure VM also has a maximum uncached disk throughput. What this means is that you might not be able to address disk performance issues in SQL Server Azure VMs by adding disks or selecting disks with higher throughput numbers. Instead, you might need to select a higher size VM. In many cases, you can select a higher size VM to get more I/O throughput and memory, while keeping the vCPU count the same. Increasing the vCPU count increases the number of SQL Server per-core licenses you need, so if your only bottleneck is I/O, you don’t need to provision additional vCPUs with the commensurate license count increase.

We recommend a minimum of two P30 drives for SQL Server data files. The first drive is for transaction logs, and the second is for data files and TempDB.

Disk striping options

To achieve better performance (and larger drive volumes) out of your SQL Server VM, you can combine multiple drives into various RAID configurations by using Storage Spaces on Windows-based VMs, or MDADM on Linux-based VMs. Depending on the Azure VM size, you can stripe up to 64 Premium Storage drives together in an array.

An important consideration with RAID is the stripe (or block) size. A 64-KB block size is most appropriate for an OLTP SQL Server environment, as noted previously. However, large data warehousing systems can benefit from a 256-KB stripe size, due to the larger sequential reads from that type of workload.

  • Images To read more about the different types of RAID, see Chapter 2.

Storage account bandwidth considerations

Azure Storage costs are dictated by three factors: bandwidth, transactions, and capacity. Bandwidth is defined as the amount of data egress from the storage account.

For Azure VMs running SQL Server, if the storage account is in the same region as the VM, there is no additional bandwidth cost. If there is any external access on the data, however, such as log shipping to a different location or using the AzCopy tool to synchronize data to another region (for example), there is a cost associated with that.

Drive caching

For SQL Server workloads on Azure VMs, it is recommended that you turn on ReadOnly caching on the Premium Storage drive when attaching it to the VM, for data files, and TempDB. This increases the IOPS and reduces latency for your environment, and it avoids the risk of data loss that might occur due to ReadWrite caching.

For drives hosting transaction logs, do not turn on caching.

SQL Server data files in Azure Storage

Instead of attaching a data drive to your machine running SQL Server, you can use Azure Storage to store your user database files directly, as blobs. This provides migration benefits (data movement is unnecessary), high availability (HA), snapshot backups, and cost savings with storage. For system databases, this feature is neither recommended nor supported. Because performance is critical, especially when accessing storage over a network, you will need to test this offering, especially for heavy workloads.

To get this to work, you need a storage account and container on Azure Storage, a Shared Access Signature, and a SQL Server Credential for each container.

There are some limitations that might affect your decision:

  • FILESTREAM data is not supported, which affects memory-optimized objects, as well. If you want to make use of FILESTREAM or memory-optimized objects, you will need to use locally attached storage.

  • Only .mdf, .ndf, and .ldf extensions are supported.

  • Geo-replication is not supported.

Virtual machine sizing

Microsoft recommends certain types, or series, of Azure VMs for SQL Server workloads, and each of these series of VMs comes with different size options.

It is possible to resize your VM within the same series (going larger is as simple as choosing a bigger size in the Azure portal), and in many cases you can even move across series as the need arises.

You can also downgrade your VM to a smaller size, to scale down after running a resource-intensive process, or if you accidentally overprovisioned your server. Provided that the smaller VM can handle any additional options you might have selected (data drives and network interfaces tend to be the deciding factor here), it is equally simple to downgrade a VM to a smaller size by choosing the VM in the Azure portal.

Both growing and shrinking the VM size does require downtime, but it usually takes just a few minutes at most.

To quote directly from Microsoft Docs:

“Es, Eas, Ds and Das Series offers the optimum memory to vCPU ratio required for OLTP workload performance. M Series offers the highest memory to vCPU ratio required for mission critical performance and is ideal for data warehouse workloads.” (https://docs.microsoft.com/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-performance#quick-check-list).

However, choosing the right series can be confusing, especially with new sizes and series coming out all the time. The ability to resize VMs makes this decision less stressful.

Note

You pay separately for solid-state drives attached to Azure VMs. Keep this in mind when identifying the right VM for your workload.

Locating TempDB files on the VM

Many Azure VMs come with a temporary drive, which is provisioned automatically for the Windows page file and scratch storage space. The drive is not guaranteed to survive VM restarts and does not survive a VM deallocation.

A fairly common practice with Azure VMs is to place the SQL Server TempDB on this temporary drive because it uses solid-state storage and is theoretically faster than a Standard Storage drive. However, this temporary drive is thinly provisioned. Remember in Chapter 2 how thin provisioned storage for VMs is shared among all the VMs on that host.

Placing the TempDB on this may require preparation, because it can result in high latency, especially if other guests using that underlying physical storage have done the same thing. Whether or not this issue will affect you is determined in part by the series of VM. If you created the VM using a SQL Server template from Microsoft, then no preparation is necessary. If you created a VM image yourself or used a generic VM image, then register the VM with the SQL VM resource provider (recommended and covered in the next section) or perform the steps in the blog post at https://cloudblogs.microsoft.com/sqlserver/2014/09/25/using-ssds-in-azure-vms-to-store-sql-server-tempdb-and-buffer-pool-extensions/.

Managing SQL Server virtual machines in the Azure Portal

The Azure Portal provides support for managing virtual machines with SQL Server installed. When creating a virtual machine (VM) from an image that includes SQL Server, the VM will be registered with the SQL VM resource provider. If you create a VM without SQL Server pre-installed and install SQL Server yourself, you can register the VM with the resource provider.

Caution

If you will be using the Azure Hybrid Benefit licensing, which is discussed later in this chapter, you must register the VM with this resource provider. The steps to register a VM where you installed SQL Server yourself are found at https://docs.microsoft.com/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-register-with-resource-provider.

Beyond this compliance requirement, the SQL VM resource provider extends the default VM management options to include automated patching, automated backup, and additional monitoring capabilities. It is important to note that you can take advantage of automated backup and patching, but you are not required to. Both automated management features are discussed in additional detail below. The automated features depend on the SQL Server IaaS Agent Extension, which is installed in the VM when the SQL VM resource provider is installed in Full mode.

Automated backup

There are a few requirements for the automated backup feature. Chief among those is that only specific versions of SQL Server running on specific Windows Server versions are supported. Additionally, the user databases selected for automated backup must be running in the full recovery model.

Automated backup can be configured to retain the backups between 1 and 30 days in an Azure storage account of your choice. The backups can optionally be encrypted. When enabling encryption, a certificate is generated that is protected by a password. The certificate is stored in the same storage account where the backups are kept.

For SQL Server 2016 and later, Automated Backup v2 is available, which offers additional control, including the ability to include the system databases in the backups. The system databases are not required to run in the full recovery model.

In v2 as in v1, the backup schedule can be determined automatically by Azure, in which case the log growth is used to determine the frequency of backups. In addition, v2 allows the administrator to set the frequency of the full and log backups.

Note

In addition to this automated backup feature, which is convenient but somewhat limited, Azure also supports backing up SQL Server using Azure Backup for SQL VMs. A brief overview of this service is available at https://docs.microsoft.com/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-backup-recovery#azbackup.

Automated patching

If you’d like the Azure infrastructure to patch Windows and SQL Server, you can enable automated patching for SQL Server VMs. You choose a day of the week, a start hour (which is in the server’s time zone), and a maximum duration for the maintenance.

Every week at that time, Windows Updates marked as Important will be downloaded and installed while the maintenance window is active. If the maintenance window is too short for all patches to be applied, the remaining patches will be installed during the next scheduled maintenance period. After installing patches, the system may be rebooted, thus incurring downtime.

Note

SQL Server cumulative updates are not included in the automated patching; you are still responsible for installing those.

Because of the possibility of multiple periods of performance impact and downtime each month, automated patching may not be appropriate for production systems that must be highly available. On the other hand, configuring automated patching for dev and test VMs can reduce the burden on administrators allowing them to focus on the production systems’ health.

Platform-as-a-service

With platform-as-a-service (PaaS), you can focus on a task without having to worry about the administration and maintenance surrounding that task, which makes it a lot easier to get up and running.

You can use database-as-a-service (DBaaS), which includes Azure SQL Database, Azure Synapse Analytics, and managed instance to complement or replace your organization’s data platform requirements. This section introduces the currently available PaaS SQL Server offerings in the Microsoft Azure cloud.

  • Images Complete coverage of Azure SQL Database is found in Chapter 17. Complete coverage of Azure SQL Database managed instance is found in Chapter 18.

Azure SQL Database provides you with a single database or set of databases logically grouped in an elastic pool and the freedom not to concern yourself with resource allocation (CPU, RAM, storage, licensing, OS), installation, and configuration of that database.

You also don’t need to worry about patching and upgrades at the OS or instance level, TempDB, backups, corruption, or redundancy. In fact, the built-in support for database recovery is excellent (including point-in-time restores). You can even add on long-term backup retention to keep your backups for up to 10 years.

Microsoft’s Data Platform is about choosing the right component to solve a particular problem, as opposed to one offering being all things to all people.

Azure SQL Database (including managed instance) and Azure Synapse Analytics are part of a larger vision, taking the strengths of the Database Engine and combining them with other Azure components, breaking the mold of a self-contained or standalone system. The change in mindset is necessary to appreciate it for what it offers, instead of criticizing it for its perceived shortcomings.

Azure SQL Database purchase models

Azure SQL Database charges are calculated based on one of two purchasing models: by Database Transaction Units (DTUs) or by vCore. In this section, you’ll learn about the meaning of DTUs and vCores and how to choose the right purchasing model for your needs.

Note

It is possible to convert from the DTU-based purchasing model to the vCore model and vice versa, without incurring downtime.

Database Transaction Unit

DTUs are likely the Azure SQL Database concept that new adopters struggle with the most. DBAs must comprehend what it means and come to terms with the fact that this single measure is how you determine the level of performance to expect for your database.

A DTU is a blended measure of hardware resources that are provided for the database. This blend includes CPU, memory, and data and transaction log I/O. An increase in DTU results in a linear increase in each of the hardware resources. Thus, when doubling the DTUs for a database, you are effectively doubling how much CPU, memory, and I/O is assigned to your database. The relative mix of these hardware measures was determined by Microsoft using a benchmark developed for this purpose. This benchmark is called the Azure SQL Database Benchmark. It is designed to be representative of common Online Transaction Processing (OLTP) workloads.

As you’ll learn in Chapter 17, when creating a database using the DTU purchasing model, you specify the number of DTUs for that database by specifying the pricing tier and service objective. Additional differences between the pricing tiers are also discussed in that section.

Selecting a DTU pricing tier and service objective

Azure SQL Database offers three service tiers in the DTU purchasing model: basic, standard, and premium.

Because Azure SQL Database is billed by the hour, the selection of a service tier and service objective determines how much you will be charged for your database. However, there are additional considerations. Specific pricing and other details might change by the time the ink on this page has dried; thus, we will discuss some general concepts that you should be aware of and how they would influence your selection of a tier.

Note

You can find current pricing for Azure SQL Database at https://azure.microsoft.com/pricing/details/sql-database/.

The Basic tier provides the lowest available DTUs. You pay significantly less for giving up some availability guarantees and performance. This tier is suitable for development purposes and perhaps very small-scale applications.

The Standard and Premium tiers are the two main choices for production databases. At first glance, you will notice that the Premium tier provides considerably more DTUs and does so at a higher cost per DTU compared to Standard. This is because of architectural differences between these tiers. The database files in Standard tier databases are stored in Azure Storage. This means that the files are not local to the Database Engine. In the Premium tier, they are stored on local solid-state drives (SSDs). This difference in locality of the database files has performance implications, as you might expect. Further, there is also a difference in how intra-region high availability (HA) is handled. HA for Standard tier databases is ensured by using replication of the Azure blobs. In the Premium tier, HA is achieved by using Always On features.

vCore

For some workloads, the DTU purchasing model may not be appropriate. The vCore model allows the administrator to select a combination of hardware generation, number of CPU cores, and memory that is assigned to a single database or elastic pool. This makes it easier to compare the capacity you pay for with on-premises capacity. But be careful about comparing performance with on-premises performance; there are a lot of factors that impact the performance of Database Engine that aren’t discovered by comparing CPU and memory capacity.

Note

The vCore purchasing model lowest end is significantly more powerful, and therefore more expensive, than the lowest service objective when using the DTU purchasing model. For comparison purposes, a single vCore is roughly equivalent to 100 DTUs in the standard tier and 125 DTUs in the premium tier.

Another difference is that under the vCore model, no storage is included. You will be billed for the actual storage consumed at the rate of Azure premium storage in the region where your database is hosted.

Like in the DTU model, the vCore model offers different service tiers. In vCore, the service tiers are general purpose, business critical, and hyperscale. The main differences between the general purpose and business critical tiers are found in the use of remote Azure blob storage vs. local solid-state storage (resulting in 10x higher IOPS at the lowest end and about 28x at the highest end) and support for in-memory. Both differences are much like the difference between the standard and premium service tiers in the DTU purchasing model. The business critical tier also offers the option to provision additional replicas for high availability.

  • Images The architecture of the hyperscale service tier is so different, it is covered the “Hyperscale service tier” subsection below.

  • Images Read-scale replicas are in the “Read-scale replicas” section below.

Note

vCore is your only option when considering Azure SQL Database managed instance. This is also the case for the hyperscale service tier and serverless compute tier offerings, both of which are currently in preview. The serverless compute tier is discussed later in this section.

Reserved Capacity

For scenarios where you can determine a minimum amount of Azure SQL Database resources you will use, you can reduce your total cost by prepaying for capacity. Capacity is reserved for a specific region, deployment type, performance tier (confusingly, another name for service tier that is used in this context), and a term of one or three years. Once you have reserved this capacity, any database resources that meet those criteria will be billed at the reserved capacity rate, which you have prepaid. But bear in mind that the prepaid rate does not include charges that aren’t ordinarily included in the normal billing rates, notably software, networking, and storage charges.

Note

You do not need to create reserved capacity for all your Azure SQL Database needs if you’re uncertain of your requirements for certain instances, such as dev/test. You may purchase reserved capacity for those instances that you are confident you will need, and additional capacity will be billed at your normal billing rate.

Note

Reserved capacity is available only with the vCore purchasing model and with managed instance.

Serverless compute tier

At the time of writing in preview, the serverless compute tier affords the DBA the ability to select a minimum and maximum number of vCores to assign to a single database. Based on workload, the Azure Service Fabric will automatically scale up and down how many cores are assigned. Combined with the auto-scaling of vCores comes an automatic calculation of a minimum and maximum amount of memory that will be assigned. The maximum storage size is determined by the DBA and billed separately. The vCores are charged on a per vCore per second basis.

Note

Compute tier is another level of differentiation in Azure SQL Database and is different from the service tier or the purchasing model, though at the time of writing, the serverless compute tier (as opposed to the provisioned compute tier) is only available in the single database vCore purchasing model in the general purpose service tier.

Beyond the auto-scaling of CPU and RAM, the entire database can be paused after a configurable number of hours of inactivity. This brings Azure SQL Database even closer to the ideal of cloud computing: pay exactly for what you need when you need it.

There is however a downside to the serverless tier, especially when auto-pausing is enabled. When database activity resumes, there will be a delay due to the warm-up period required. Even when auto-pausing is not enabled, frequent memory trimming and the occasional need for the Azure fabric to load-balance databases between servers may cause delays and dropped connections. The good news is that if serverless turns out to be a poor choice for your database, you can move the database to the provisioned compute tier without downtime.

Hyperscale service tier

The Hyperscale service tier offers significantly larger database sizes combined with other benefits to ensure that managing these very large databases (VLDBs) doesn’t affect availability.

Architecturally, the Hyperscale tier is quite different from the other two service tiers (general purpose and business critical). Hyperscale introduces a very different deployment style for the SQL Server Database Engine: the query processing engine is separated from the storage engine and the log service and these components run on different systems. This radical departure from how SQL Server is deployed allows storage and compute resources to scale completely independently from each other. Figure 16-2 below illustrates the architecture of the Hyperscale tier.

In this diagram, the three components of the hyperscale architecture are drawn as rectangles. At the top are the compute nodes. There are three compute nodes, drawn as cylinders. One compute node is a primary, which supports read and write operations. Two secondary compute nodes support only read operations. Each compute node has a local SSD cache to store the Resilient Buffer Pool Extensions (RBPEX) data. Below the compute nodes are the page servers. There are four page servers drawn, using rounded rectangles, to highlight the fact that multiple page servers can support fewer compute nodes. Each page server also has a local SSD cache to store RBPEX data. To the right of both the compute nodes and page servers is the log service. The log service ingests the writes from the primary compute node, stores it, and distributes it to the page servers and the secondary compute nodes. Below the page servers is the Azure Storage layer, which reads and writes the data as directed by the page servers.

Figure 16-2 High-level overview of the Hyperscale architecture.

More than just size, the new architecture allows for several other impressive feats:

  • Instantaneous backups that require no CPU or storage operations on the compute nodes (by leveraging Azure Storage snapshots)

  • Warm buffer pools after startup, thanks to persisting the buffer pool using an in-memory table that is stored on fast local solid-state storage, a concept known as Resilient Buffer Pool Extensions (RBPEX)

  • A recovery point objective (RPO) of 0 minutes with a recovery time objective (RTO) of less than 10 minutes. Moreover, the RTO is not affected by the size of the database

Choosing between the DTU and vCore purchasing model

The DTU model is a good choice for low-end needs. The monthly cost of the lowest service tier in the DTU model is about 20x less than the cost of the lowest vCore offering. The vCore offering is more attractive for medium-sized databases that nevertheless have significant compute or memory requirements, because storage and compute are scaled independently in the vCore model.

Remember also that it’s possible to switch from the DTU model to the vCore and back again without downtime. Switching to a different purchase model is subject to the same restrictions you would encounter while scaling down a database. Primarily, you will not be able to scale down to a service tier that allows a maximum database size that is less than the current size of your database.

Differences from SQL Server

SQL Server is a complete, standalone relational database management system designed to create and manage multiple databases and the associated processes around them. It includes a great many tools and features, including a comprehensive job scheduler.

Think of Azure SQL Database, then, as an offering at the database level. Because of this, only database-specific features are available. You can create objects such as tables, views, user-defined functions, and stored procedures, as well as memory-optimized objects. You can write queries, and, you can connect an application to it.

What you can’t do with Azure SQL Database is run scheduled tasks directly. Querying other databases is extremely limited. With the exception of Azure SQL DB managed instance, you can’t restore databases from a SQL Server backup. You don’t have access to a file system, so importing data is more complicated. You can’t manage system databases, and in particular, you can’t manage TempDB.

There is currently no support for user-defined SQL Server CLR procedures. However, the native SQL CLR functions, like those necessary to support the hierarchyid and geospatial data types, are available.

On-premises environments typically use only Integrated Authentication to provide single sign-on and simplified login administration. In such environments, SQL authentication is often turned off. Turning off SQL authentication is not supported in Azure SQL Database. Instead of Integrated Authentication, there are several Azure Active Directory authentication scenarios supported. Those are discussed in more detail in Chapter 17.

Azure SQL Database does not support multiple filegroups or files. By extension, several other Database Engine features that use filegroups are unavailable, including FILESTREAM and FileTable.

Database limitations

Azure SQL Database is subject to certain size limitations, such as the maximum database size. The maximum size of a database varies based on the purchase model and the pricing tier. In the DTU purchase model, the size of a database includes only the size of the data; the size of transaction logs is not counted. In the vCore purchase model, you are billed for the actual amount of storage consumed by your database, including the transaction log and backups.

If you are designing an application for the cloud, the size limitations are less of a restriction when deciding to adopt Azure SQL Database. This is because an application designed to operate at cloud-scale should shard its data across several database instances. In addition to overcoming database size limitations, the benefits of sharding also include faster disaster recovery and the ability to locate the data closer to the application if the application runs in different Azure regions.

To provide predictable performance for Azure SQL Database, there are limits to the number of concurrent requests, concurrent logins, and concurrent sessions. These limits differ by service tier and service objective. If any limit is reached, the next connection or query attempt will fail with error code 10928.

One final limitation to be aware of is that a single server has an upper limit on the total DTUs it can host as well as on the total number of databases. For a large deployment, this might require distributing databases across servers. We recommend against operating at or near this limit because overall performance can become suboptimal. As of this writing, you should limit the number of databases per server to around 1,000.

Note

Any limitations discussed in this chapter are subject to change frequently, so be sure to review current limitations before deciding whether Azure SQL Database is right for you.

The resource limits, whether Database Transaction Units (DTUs) or vCores, force us to rethink how we make the best use of the service. It is easy to spend a lot of wasted money on Azure SQL Database because it requires a certain DTU service level at certain periods during the day or week, but at other times is idle.

It is possible to scale an Azure SQL database up and down as needed, but if this happens regularly, it makes the usage, and therefore the cost, unpredictable. Elastic pools (see the upcoming section on this) are a great way to get around this problem by averaging out resource usage over multiple databases with elastic DTUs or vCores.

Other SQL Server services

In addition to the Database Engine, an on-premises deployment of SQL Server might include:

  • SQL Server Agent to schedule maintenance tasks or other activities

  • SQL Server Integration Services to load or extract data

  • SQL Server Analysis Services to support analytical workloads

  • SQL Server Reporting Services to provide report functionality

These services are not included in Azure SQL Database. Instead, comparable functionality is often available through separate Azure services. A complete discussion of the available alternatives in Azure is beyond the scope of this book. The descriptions that follow are intended to merely name some of the alternatives and their high-level uses and direct you to an online starting point to learn more:

  • SQL Server Agent. To schedule recurring tasks for Azure SQL Database instances, DBAs should consider using Azure Automation, which is a service that makes it possible to reliably run potentially long-running PowerShell scripts. You can use Azure Automation to automate management of any Azure or third-party cloud service, including Azure SQL Database. In addition, there is a gallery of reusable scripts available.

    If you only require execution of T-SQL statements in your recurring tasks and you don’t need complex job branching logic between steps, elastic database jobs might be a suitable replacement also. They are covered in the upcoming “Elastic database jobs” section.

  • SQL Server Integration Services. Instead of SQL Server Integration Services (SSIS), use Azure Data Factory to perform tasks such as extracting data from various sources, transforming it by using a range of services, and finally publishing it to data stores for consumption by business intelligence tools or applications. Azure Data Factory includes an SSIS-compatible integration runtime, which enables running SSIS packages in Azure.

    You can use SQL Server Integration Services to extract data from and load data to Azure SQL Database, and you can use Data Factory to extract data from and load data to on-premises data stores. The decision about which service to use depends largely on where most of your data resides, which services you plan on using for transforming the data, and whether you allow a cloud service to connect to your on-premises environment using a gateway service.

  • SQL Server Reporting Services. Microsoft Power BI is a recommended cloud-native replacement. Power BI is a powerful tool to create interactive visualizations using data from various sources. You can embed Power BI dashboards and reports in applications. You can also access them directly using a web browser or mobile app. Power BI Premium supports paginated reports. If you must have SQL Server Reporting Services, you can run it in a VM on Azure, but you’ll need to pay for or provide a SQL Server license.

  • SQL Server Analysis Services. To replace SQL Server Analysis Services, there are several alternative Azure services. Foremost, there is Azure Analysis Services. It is built on SQL Server Analysis Services and, as such, existing tabular models can be migrated from on-premises SQL Server Analysis Services deployments to the cloud. Alternatively, Power BI also includes tabular model functionality that might replace SSAS.

Note

Some of the limitations described in this section can be addressed by using Azure SQL Database managed instance. We cover managed instance in detail in Chapter 18.

That said, Azure SQL Database is not going to completely replace SQL Server. Some systems are not designed to be moved into this type of environment, and there’s nothing wrong with that. Microsoft will continue to release a standalone SQL Server product. On the other hand, Azure SQL Database is perfect for supporting web applications that can scale up as the user base grows. For new development, you can enjoy all the benefits of not having to maintain a database server, at a predictable cost.

Read scale-out

If you’re thinking that a read scale-out replica is a secondary database instance that provides read-only access to your data, you would be correct. Provisioning these replicas in Azure SQL Database is supported in the following service tiers: premium, business critical, and hyperscale. In the premium and business critical tiers, it is also enabled by default when creating a new database.

Note

Some features are not available on the replica database. These features are the Query Store, Extended Events, SQL Profiler, and, crucially perhaps in regulated environments, Audit. It is possible to disable the Read Scale-Out feature.

An application indicates its intent for a read-only or read-write connection using the ApplicationIntent property in the connection string. Specifically, a value of ReadOnly will cause the connection gateway to direct the connection to a read-only replica, if available. Not specifying ApplicationIntent or specifying a value of ReadWrite will direct the connect to the primary replica.

Elastic pools

As noted earlier, Azure SQL Database has limits on its size and resources. Like Azure VMs, Azure SQL Database is pay-per-usage. Depending on what you need, you can spend a lot or a little on your databases.

Elastic pools increase the scalability and efficiency of your Azure SQL Database deployment by providing the ability for several databases in a pool to share resources, whether priced according to DTUs or vCores.

Without using elastic pools, each database might be provisioned with enough resources to handle its peak load. This can be inefficient. By grouping databases with different peak load times in an elastic pool, the Azure fabric will automatically balance the resources assigned to each database depending on their load. You can set limits to ensure that a single database does not starve the other databases in the pool of resources.

The best use case for elastic pools is one in which databases in the pool have low average resource utilization, with spikes that occur from time to time. This might be due to a reporting-type query that runs once a day, or a help desk system that can experience a lot of traffic at certain times of the day, for instance.

The elastic pool evens out these spikes over the full billing period, giving you a more predictable cost for an unpredictable workload across multiple databases.

Note

An elastic pool is tied to a single logical server. It’s not possible to pool databases hosted on various logical servers in a single overarching pool.

Multitenant architecture

Azure SQL databases in an elastic pool gives you the ability to provision new databases with predictable growth and associated costs. Management of these databases is much easier in this environment because the administrative burden is lowered. Performance is also predictable because the pool is based on the combined resource utilization.

However, not all scenarios will benefit from the use of elastic pools. The most beneficial are those in which databases experience their peak load at separate times. You need to monitor the usage patterns of each customer and plan which elastic database pool they work best in.

Database consolidation

In an on-premises environment, database consolidation means finding a powerful enough server to handle the workload of many databases, each with its own workload pattern. Similarly, with elastic database pools, the number of databases is limited by the pool’s size. For example, in the DTU pricing model, the current maximum pool size is 4,000 eDTUs in a Premium pool. This means that you can operate up to 160 databases (at 25 DTUs each) in that single pool, sharing their resources.

On the vCore side, the maximum pool size for a Business Critical tier gives you as many as 80 virtual cores, along with 408 GB of RAM and 4 TB data storage.

Combined with autoscale settings, depending on resource boundaries, consolidation makes a lot of sense for an organization with many small databases, just as it does with an on-premises SQL Server instance.

Elastic database tools

Scale-Out in Azure SQL Database is also achieved using the elastic database tools client library which is available for .NET and Java applications. Applications developed using this library can save their data in different Azure SQL databases based on data-dependent routing rules while retaining the ability to run SELECT queries across all shards. This is a popular model for SaaS applications because you can assign each SaaS customer its own database.

Elastic database query

Perhaps the most surprising limitation in Azure SQL Database is that support for cross-database queries is very limited. This means that it is not possible to write a query that uses a three-part or four-part object name to reference a database object in another database or on another server. Consequently, semantic search is not available.

Elastic database query, which is a preview feature at the time of writing, aims to provide a solution. For both vertically and horizontally partitioned databases, elastic database query can provide a way to write T-SQL statements that are executed across databases. This is made possible by leveraging the external data sources and external tables feature.

There are some limitations, however. On the Standard tier, the first query can take several minutes to run because the functionality to run the query needs to be loaded first. Additionally, access is currently read-only for external data.

Performance does improve on the higher tiers as costs increase, but this is not meant to replicate home-grown systems that have many databases tightly bound together.

Elastic database jobs

SQL Server has the ability to run the same query against multiple databases from a list of registered servers, or Central Management Server. In a similar manner, the elastic database jobs feature gives you the ability to run a script written in Transact-SQL (T-SQL) For example, if your organization offers its customers a SaaS solution, each customer may receive their own database instance. Then, the schemas of all databases must be kept synchronized.

An elastic database job can target the databases that are part of a custom group of databases. A job is inserted into a job database, which is a standalone Azure SQL database on a logical server referred to as the agent server. An Azure resource called an Elastic Job Agent is created, pointed at the agent server and job database. Job credentials, which allow the agent to connect to the target databases are stored in the job database, as is the list of target databases itself. Elastic database jobs can consist out of multiple steps and can have a rudimentary schedule with simple recurrence rules. Finally, the results can be stored in the job database, and diagnostics are available from the Elastic Job Agent blade in the Azure Portal.

Note

Some of the use cases for which you use the SQL Server Agent on-premises might be addressed with elastic database jobs. For other use cases, you’ll need to use other Azure functions for automating tasks. See the “Other SQL Server services” section for a brief intro to Azure Automation.

Sharding databases with Split-Merge

Azure SQL Database is designed for SaaS scenarios because you can start off small and grow your system as your customer base grows. This introduces several interesting challenges, including for example what happens when a database reaches its maximum resource limit and size.

Sharding is a technique by which data is partitioned horizontally across multiple nodes, either to improve performance or the resiliency of an application. In the context of Azure SQL Database, sharding refers to distributing data across more than one database when it grows too large. (If this sounds like table partitioning, you’re mostly right.)

It is all very well to add more databases (shards) to support your application, but how do you distribute your data evenly across those new databases?

The Split-Merge tool can move data from constrained databases to new ones while maintaining data integrity. It runs as an Azure web service, which means there is an associated cost. The tool uses shard map management to decide what data segments (shardlets) go into which database (shard) using a metadata store (an additional standalone Azure SQL Database) and is completely customizable.

Hybrid cloud with Azure

Azure SQL Database is not designed to completely replace SQL Server. Many thousands of organizations all over the world are quite happy with the security, performance, and low latency offered by hosting their environment on-premises but would like to make use of certain components in the cloud.

The most common implementation of a hybrid cloud is with Azure Active Directory (Azure AD). Instead of having to manage user accounts in two places (on-premises and in Azure, for Microsoft Office 365, for example), you can synchronize your Active Directory Domain Service (AD DS) with Azure AD and manage it all in one place.

Mixing your on-premises and Azure environments, in whichever way you do it, falls under the definition of a hybrid cloud, and Microsoft has some interesting ways of helping you achieve this, especially around the data platform and SQL Server.

Azure Hybrid Benefit

Users can obtain significant savings on SQL Server VM or Azure SQL Database list prices if they have active Windows or SQL Server core licenses with Software Assurance. Exactly which type of licenses can be converted or re-used and how they map to Azure offerings is covered in detail in the Azure Hybrid Benefit FAQ found at https://azure.microsoft.com/pricing/hybrid-benefit/faq/.

Azure Hybrid Benefit is available for SQL Server VMs, Azure SQL Database using the vCore purchase model, and Azure SQL Database managed instance. In other words, Azure SQL Database in the DTU purchase model is not eligible for discounted rates using Azure Hybrid Benefit.

Note

Azure Hybrid Benefit is a feature unique to Azure. It is different from the Software Assurance license mobility benefit, which may be used to reassign SQL Server core licenses to third-party providers offering unmanaged services, such as virtual machines or hosted SQL Server.

Keeping cold data online and queryable by using Stretch Database

In larger organizations, it can be expensive to maintain historic (cold) data in a SQL Server database, when you consider not only the storage, but also the associated maintenance and administration costs. Additionally, data retention laws require organizations to store data for several years, which can be cumbersome.

Stretch Database is designed to balance the needs of keeping cold data online by reducing the cost of storing and managing that data locally and reducing RTO and RPO mandated in a Service-Level Agreement (SLA).

Because historic data can account for a large percentage of an existing database, removing cold data from an on-premises database can significantly reduce the storage, time, and other resources for necessary tasks like backups, index and statistics maintenance, data consistency checks, and so on, while still making it available to be queried.

Stretch Database is activated at the instance level, but you move rows at a database and table level. If your cold data is already stored in a separate archive table, you can move the entire table. If your cold data is in a large table that also contains active, or hot data, you can set up a filter to move only the older rows.

No application changes are needed, but there are some considerable limitations to using a Stretch Database:

  • Constraints in the migrated data are not enforced for uniqueness (including primary key and unique constraints), and you cannot move tables referenced by foreign key constraints, or check and default constraints.

  • You cannot perform any data modification on migrated data (updates and deletes), and you cannot insert rows directly (they must be moved in by the Stretch Database functionality).

  • You cannot perform data modification on the data that is eligible for migration.

  • You cannot create indexes on views that refer to Stretch-configured tables, and you cannot move tables that are used in existing indexed views.

  • Filters on indexes are not propagated on migrated data, which can cause issues with unique filtered indexes.

  • Tables cannot take part in replication, change tracking, or change data capture.

  • Tables cannot contain more than 1,023 columns or 998 indexes.

There are also limits to the type of data and indexes that can participate in a Stretch Database:

  • text, ntext, and image data types are not permitted.

  • timestamp (rowversion), xml, and sql_variant types are not permitted.

  • FILESTREAM and, by extension, FileTable is not permitted.

  • CLR (Common Language Runtime, or .NET) data types are not permitted (including the built-in types geography, geometry, and hierarchyid).

  • Computed columns are not permitted.

  • Full-text, XML, and spatial indexes are not permitted. Tables cannot be referenced by indexed views.

Finally, the costs for Stretch Database are based on both storage and compute models. This means that you will pay for storage even if you never query the data. If you query the data, you will also pay for the compute costs (priced according to a Database Stretch Unit, or DSU) as well as any data transfer.

These limitations might exist for good reasons, but those reasons could be enough cause to consider alternatives. However, for ease of use, Stretch Database works as advertised.

Note

The Data Migration Assistant (DMA) tool can provide recommendations related to Stretch Database. Guidance on using the DMA for Stretch Database is available in the Microsoft Docs at https://docs.microsoft.com/sql/sql-server/stretch-database/stretch-database-databases-and-tables-stretch-database-advisor.

Automated backups with SQL Server Managed Backups

With SQL Server on Azure VMs, you can automate SQL Server native backups that write directly to Azure Blob Storage. (This works with an on-premises version of SQL Server, as well, but latency can be an issue.)

By default, the schedule depends on the transaction workload, so a server that is idle will have fewer transaction log backups than a busy server. This reduces the total number of backup files required to restore a SQL Server database in a disaster recovery (DR) scenario.

You can also use advanced options to define a schedule. However, you must set this up before turning on Managed Backups to avoid unwanted backup operations. Additionally, the retention period is customizable, with the maximum being 30 days.

You can configure these backups at the instance or database level, providing much needed flexibility for smaller database environments that would not ordinarily have a full-time database administrator on hand.

You can fully encrypt backups through SQL Server backup encryption, and Azure Blob Storage is encrypted by default (for data at rest).

There is an associated cost with the Azure Storage container required to store these database backups, but when the retention period is reached, older files will be cleared out, keeping the costs consistent. If you were building your own custom backup solution, you would incur similar costs anyway, and there is a good chance the managed backup storage costs will be lower.

Azure SQL Database Edge

For the Internet of Things (IoT), a lot of effort is expended in getting sensors and other devices to report their data to a SQL Server instance or Azure SQL Database in the cloud. Azure SQL Database Edge (in preview) is a repackaged SQL Server instance that can run on low-power and IoT edge computing devices, with either 64-bit ARM or Intel x64 CPUs. For example, imagine running it on a Raspberry Pi computer.

Edge includes the full SQL Server Database Engine running on Linux, along with some analytics features including data streaming and time-series support.

Azure Stack

This is Microsoft’s version of an edge and hybrid cloud, in which you can install certain Azure services on-premises, on Microsoft-approved hardware. This brings the power of Azure to your own datacenter.

After you have developed the solutions that best suit your organization, you can deploy your applications and solutions to the Azure region that makes the most sense, or just keep them on-premises, hosted on Azure Stack.

You can expose Azure SQL databases as a service by using the SQL Server resource provider. This gives your users the ability to create databases without having to provision a VM every time. Think of it as an on-premises version of Azure SQL Database.

Keep in mind that certain features like elastic pools, and scaling databases, are not available at this time.

Private networking between on-premises and Azure

Many organizations want to ensure a secure channel between their environments, whether that is using Azure VNets, or their on-premises network and Azure. You can achieve this by way of a Virtual Private Network (VPN).

A VPN encrypts traffic over any network (including the Internet), through a tunnel that it creates. All traffic that travels through that tunnel is secure, which means that no bad actors will be able to monitor the traffic. However, there is a performance overhead with encrypting that traffic, which makes the connection slightly slower.

There are two main ways that Azure implements connections between your on-premises environment and Azure itself. One of these is through a traditional VPN service over the Internet (site-to-site), and the other is through a dedicated connection that does not use the public Internet (Azure ExpressRoute).

Site-to-site VPN

There are two different types of problems that you need to deal with when connecting systems to an Azure VNet: connecting two Azure VNets together and connecting an external network to an Azure VNet.

To connect two Azure VNets together in the same region, you can create a peering network—in other words, no part of the VNet goes out to the Internet—which is priced per gigabyte transferred.

If you want a VPN gateway, instead, which creates a connection between your on-premises network and an Azure VNet, those are priced according to the maximum bandwidth you would require (100 Mbps, 650 Mbps, 1 Gbps, and 1.25 Gbps), and charged at an hourly rate (which, depending on what you need, is also reasonably priced).

Azure ExpressRoute

If the speeds of site-to-site VPNs are not satisfactory, and you want to connect your on-premises network to your Azure VNet, you can use ExpressRoute. With its low latency, ExpressRoute expands your existing network to the virtually limitless services available in Azure, depending on your budget, of course. According to Microsoft Docs:

ExpressRoute is “…excellent for scenarios like periodic data migration, replication for business continuity, disaster recovery, and other high-availability strategies.” (https://azure.microsoft.com/services/expressroute)

This type of bandwidth gives you the flexibility of moving entire VMs from on-premises to Azure, for test environments and migrations. Your customers can use Azure web services that take data from your on-premises environment without ever going over the public Internet.

You can also use it for creating a DR site, using SQL Server Log Shipping. Perhaps you want to extend your availability group to the cloud, which you can do by using a distributed availability group (see Chapter 2). Using ExpressRoute, you can treat Azure as an extension of your own network, as illustrated in Figure 16-3.

This diagram shows the various ways Azure can connect internally, within an Azure region as well as from Azure to an on-premises network. On the left is an Azure region containing two Azure VNets connected with a Site-To-Site virtual private network, by way of a peering network. The first Azure VNet is also connected to an on-premises network with a virtual private network Gateway via the Internet, which can also be achieved with ExpressRoute

Figure 16-3 Azure virtual networks can connect to an on-premises network in various ways

Cloud security

Although we didn’t include a comprehensive discussion of cloud security in this book, in this section we provide a few considerations for DBAs considering a cloud migration.

First, many of the SQL Server offerings in the cloud support the same security features as on-premises. This is true for SQL Server in Azure VMs and largely true for Azure SQL Database and managed instance. Specific security considerations for the PaaS offerings are included in their respective chapters: Chapter 17 and Chapter 18.

An important aspect of managing security in Azure is the Azure Security Center. Found in the Azure Portal, it provides an integrated view of the security posture for all your Azure resources. The Security Center also provides metrics and recommendations that help you improve the security of your resources, including the SQL resources. Some of the features of Security Center are available in the paid standard tier, while many are included with the cost of your existing resources.

Among the SQL recommendations you might receive from Security Center are enabling auditing on Azure SQL Databases or enabling vulnerability assessment on an SQL Server VM.

Deploying SQL resources in Azure means that you’ll need to understand security in Azure in general. As a DBA, you’ll need to know who has access to view and especially manage the SQL resources. Permissions to manage resources can be given at the account, subscription, resource group, or individual resource level. You’ll need to know how to monitor changes to the Azure resources. Further, you’ll need to learn about networking in Azure.

Other data services in Azure

Azure offers more data services than we cover in this book. We mention these here briefly because in today’s multi-platform environments, there is more to our world than SQL Server. These different offerings each have their strengths and weaknesses. Deploying a modern application may involve mixing these different services to create a single solution to a business need.

The most closely related offering is Azure Synapse Analytics, formerly known as Azure SQL Data Warehouse. Built on SQL Server, it is a PaaS service optimized for parallel query processing and large data volumes. Compared with Azure SQL Database, however, Azure Synapse can scale storage and compute independently by leveraging Azure Storage and a massively parallel processing engine that manages multiple SQL compute nodes via a control node. This makes it suitable for storing enormous amounts of data that might only occasionally need to be processed.

Azure also offers third-party fully managed database platforms, namely Azure Database for MariaDB, Azure Database for MySQL, and Azure Database for PostgreSQL. These offerings are all relational databases offered as PaaS.

Outside of the realm of relational DBMS, Microsoft’s Cosmos DB is noteworthy. This is a globally distributed database service that supports multiple data models, including key-value, document, graph, and relational. Cosmos DB wraps multiple APIs in a single service.

Another data offering is Azure Data Lake Storage. The underlying storage is provided by Azure Storage and is augmented with hierarchical namespace functionality, meaning that folder and other file system semantics are native to the solution while benefitting from Azure Storage’s features for scale and disaster recovery. Data Lake can store unstructured data in addition to relational data.

The final data platform offering we’ll include is Azure Databricks. Azure Databricks is built on Apache Spark and tightly integrates with Azure to provide a complete environment where team members with various roles on data science projects can collaborate.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset