5. Performance

The previous chapters have given you an understanding of SQL Server, the overall benefits of the various capabilities of SQL Server on Azure VMs, and how to get started with SQL Server on Azure VMs.

In this chapter, we will discuss SQL Server performance best practices and how you can achieve the best performance for your SQL Server workload. This chapter will be divided into three main parts:

  • Performance best practices
  • How to optimize SQL Server on Azure VMs
  • Azure BlobCache

Let's begin by taking a look at the best practices to follow in order to get the most from your SQL Server.

Performance best practices

Performance tuning can be a broad and complex topic; however, migrating to Azure VMs can help simplify the process. There are so many factors that can impact performance. If you have already deployed an Azure VM, the Azure portal lets you use Azure Monitor for VMs. It provides insights into the health and performance of your Windows or Linux VMs by monitoring their processes and dependencies on other resources. Not only can it monitor other Azure VMs; it can monitor VMs on other cloud providers and on-premises to give you a holistic view. You get pre-defined performance charts that show the trending and the dependency map, all built into the Azure portal. In addition to Azure Monitor for VMs, all VMs have basic monitoring enabled in the Azure portal. This basic monitoring shows the average CPU, the total network, the total disk bytes, and the average disk operations per second. Data can be shown for the past 1, 6, or 12 hours; the past 1, 7, or 30 days; and can be found in the Overview tab of the VM.

When planning a new deployment, upgrade, or migration for the SQL Server workload, one of the first considerations is the size of the server that will handle the workload. Regardless of the server environment, whether it's physical or virtual, on-premises or on the cloud, the amount of CPU, memory, and storage will always be factors that influence the deployment. Azure VMs help here by having specific VM types for different workload needs:

  • General purpose: Balanced CPU-to-memory ratio. Good for small- to mid-size databases.
  • Compute optimized: High CPU-to-memory ratio. Good for ETL servers.
  • Memory optimized: High memory-to-CPU ratio. Great for relational database servers, medium to large caches, and in-memory analytics.
  • Memory optimized constrained vCPUs: The vCPU count can be constrained to one half or one quarter of the original VM size. This reduces the SQL Server license cost and allows you to have a low CPU-to-memory ratio.
  • Storage optimized: High disk throughput and I/O. Ideal for big data, SQL, NoSQL, data warehousing, and large transactional databases.
  • High performance compute: Designed for leadership-class performance, MPI scalability, and cost efficiency for a variety of real-world HPC workloads.

Whatever your workload, there is a VM type for you. There are other considerations for performance as well, such as the configuration of SQL Server and the overall maintenance of the system.

In the following sections, we will begin by discussing these factors, starting with VM storage.

Virtual Machine Storage

Azure VMs are an excellent choice for running SQL Server workloads. Chapter 2, Getting Started with SQL Server in Azure Virtual Machines, introduced you to getting started with Azure VMs and Chapter 4, SQL Server on Linux in Azure Virtual Machines, covered the various Linux and Windows distributions and their advantages. The disk types were explained, and you should understand that production SQL Server workloads should be running on premium SSDs or ultra disks. Most SQL Server workloads are I/O intensive, meaning production workloads require faster storage. Premium SSDs are designed for production workloads and workloads that are sensitive to performance. For very intense I/O workloads, ultra disks are a better choice as they deliver higher throughput, higher IOPS, and lower latency. Consider ultra disks when you have a very transaction-heavy workload. When considering CPU, memory, and storage, storage is typically the slowest component, which means that faster storage is better.

When it comes to selecting a VM, the size and the type matter for numerous reasons. The number of disks a VM can have is tied to the size of the VM in addition to overall IOPS and throughput. Organizations generally struggle with selecting the right size of the VM to handle their I/O needs. It is entirely possible to configure a VM with a storage solution that can provide a high number of IOPS and throughput that the VM throttles and hits the ceiling of what the VM supports. In this type of situation, the VM will have to be scaled up to a larger size that supports a higher number of IOPS and throughput to have better storage performance. If you are configuring the VM through the Azure portal and select a storage configuration that exceeds the throughput limit of the VM, a message is displayed to warn you about this limitation.

When it comes to sizing a VM for an existing workload that is being migrated to Azure, it is common to try to match the existing memory, CPU, and storage capacity. What is needed in order to properly size an Azure VM is a baseline. A baseline is simply a point of reference that is measured. When working with SQL Server, baselines are important so that you can capture when there is a change in behavior or performance. Common things to capture are CPU and memory utilization, disk latencies, and batch requests per second, among other things. Having baselines is important; you need to know what normal behavior looks like so that you can tell when things are different. A baseline for existing I/O and throughput usage is critical in order to know what the maximum throughput is that you'll need the Azure VM to handle.

You can use Performance Monitor1 to capture physical disk metrics; however, the sys.dm_os_virtual_file_stats dynamic management view (DMV) provides the key information needed to calculate overall throughput by returning I/O statistics for data and log files. The num_of_bytes_read and num_of_bytes_written columns return the number of bytes read and written to each file. If you capture the results of this DMV over a short period of time, you can calculate the MB/s for reads and writes to know whether the size of the VM with the storage you provided can support this workload. This is a crucial step that many organizations miss when migrating SQL Server workloads to the cloud and then experience slower overall performance.

As an alternative sizing method, you can use the Azure Migrate: Server Assessment tool. Azure Migrate collects real-time sample points for a month and analyzes the data to identify the ideal VM size based on those metrics.

It is common for on-premises VMs with lower logical core counts to have very high throughput capabilities to the local storage area network (SAN). In a public cloud scenario, constraints must be in place to prevent one VM from consuming all the storage throughput on that host. It makes sense that the available throughput and I/O to the host is allocated based on the size of the VM. The larger the VM, the more resources it is assigned. This must be a consideration for performance when planning a migration to Azure.

To handle the I/O requirements, you should only consider Azure premium disk or ultra disk for your production SQL Server environment.

Memory

If you ask a Database Administrator (DBA) how much memory a SQL Server needs, a common response is "more" or "all of it". The reason why it is good to allocate SQL Server a lot of memory is that the buffer pool will store recently read data in memory. If that data is accessed again while it is in memory, SQL Server doesn't have to go back to the disk to retrieve it, which is a slow operation. These are called logical reads when it pulls from memory.

For SQL Server workloads that need high memory-to-CPU ratios, you should choose the memory-optimized type of VMs. In many cases, you may need even more memory than offered by the standard memory-optimized type of VM. For those situations, the memory-optimized constrained vCPU VM offers the best price for performance. It allows you to select the much larger vCPU and memory VM and only use a quarter or half of the available vCPU. This gives you higher memory, storage, and I/O bandwidth without the higher cost of SQL Server licenses.

Having more memory also opens up the ability to take advantage of memory-optimized tables known as In-Memory OLTP. In-Memory OLTP is designed to optimize the performance of transaction processing, ingesting data, loading data, and various transient data processes. In-Memory OLTP is just what it sounds like. It allows you to create memory-optimized tables and store data into memory. It gets its performance gains by removing lock and latch contention between transactions, making transaction execution much more efficient. For very volatile or large-volume transaction processing, In-Memory OLTP can drastically improve performance. Learn more about In-Memory OLTP here2.

Another SQL Server feature that can be utilized is columnstore indexes. Columnstore indexes provide a different way of storing data that can provide performance improvements for certain types of queries. A columnstore is data that is physically stored in columns and then logically organized into rows and columns. The columnstore index slices the table into rowgroups and then compresses them. With the high level of data compression, typically by 10 times, you can drastically reduce the storage cost and improve I/O bottlenecks in your environment. Columnstore indexes can be beneficial in both OLTP and data warehouse environments. Columnstore objects are cached in the columnstore object pool instead of the SQL Server buffer pool. There is a 32 GB limit for the columnstore object pool in the Standard Edition, whereas there is no limit for the Enterprise Edition.

If a system starts suffering from memory pressure, organizations can easily scale an Azure VM up or down to another size that has more or less memory as needed. This ability to scale up and down as workloads change is one of the biggest benefits of Azure VMs and can save organizations a ton of money. To have that benefit on-premises, you have to own the hardware that you may only need a few times per year for scale, whereas in Azure, you can scale as needed. I've worked with numerous clients over the years who have a seasonal business. Open enrollment for healthcare, registration for universities, online retailers for Black Friday, and so on require extra compute during their busy seasion. They can scale up and just pay for the additional compute during that season and then scale back down. For example, a customer may have D16s_v3/D16as_v4 with 16 vCPUs and 64 GiB of RAM. Their busy season is approaching and they know their workload will increase by 40-50%. They could simply scale to D32s_v3,/D32as_v4, which would double their vCPUs from 16 to 32, their memory from 64 GiB to 128 GiB, and also their temporary storage from 128 to 256 GiB. Since they are staying within the same VM series, all that would be needed is a restart. If the resize is to a different type of series, or the hardware cluster hosting the VM does not support the new VM, the VM will be moved to a new host, which can take more time.

For existing SQL Server instances that are being migrated to Azure, we strongly recommend analyzing the current memory utilization before assuming that you need the same memory post-migration. If data is staying in the buffer pool for excessive amounts of time, you could consider a VM with less memory. For example, I had a customer who was on a 4 vCPU on-premises VM that was allocated 128 GB of memory. I was measuring page life expectancy (how long data pages stay in memory) in days, not minutes or hours. They were not using In-Memory OLTP and it was a dedicated machine just for SQL Server. We were able to migrate to a VM with less memory, saving the customer a lot of money.

At the same time, if you are experiencing memory pressure, tuning and optimizing should be considered before migrating to decrease the memory pressure. Properly sizing the Azure VM is necessary for better SQL Server performance, but also to ensure that you are not overspending on resources that you don't need.

CPU

Selecting the number of processors can have a direct impact on SQL Server performance if you underestimate the amount of CPU needed. If you overestimate the number of processors, you won't see a performance improvement; however, it will affect your SQL Server license cost. Workloads change over time, and at some point more CPUs may be needed to handle that workload. Microsoft has published a list3 of the Azure compute units (ACUs) for the different VM families. This gives you a quick reference to compare CPU compute performance across Azure SKUs.

By running SQL Server on Azure VMs, you can easily scale to a server with more vCores, which typically also includes more memory and more I/O throughput. Don't forget about those constrained vCPU options if you have a small vCore requirement but a need for larger memory. However, if you have a high compute need, compute-optimized VMs offer a higher core count. For low CPU latency and fast clock speeds, consider the Eav4 VM series, which features the AMD EPYC™ processor.

Properly sizing the Azure VM has a direct correlation to having the foundation for a SQL server that performs well. The number of CPUs needed, the amount of memory required, the overall storage capacity, and the storage I/O are all factors that control the type and size of Azure VM you need. Each factor, by itself, can force you into a certain size VM or VM type. Gathering these requirements early on is a big factor in having a SQL server with good performance, and a successful deployment.

SQL Server configuration

TempDB is a system database that is utilized by many processes for storing work tables, temporary tables, spills, row versions, and much more. TempDB is a unique database due to its characteristics. For example, there is only one TempDB database for the entire SQL Server instance, it's recreated when the SQL Server services are restarted, and technically, TempDB cannot be backed up. However, TempDB is a mission-critical database for the SQL Server instance and can become a place of contention for SQL Server. For that reason, TempDB needs to be properly sized for the instance and requires more than a single data file. In most gallery images, there is a single data file and a log file for TempDB. With TempDB being utilized by the entire instance, contention can develop on the pages related to Page Free Space (PFS), Global Allocation Map (GAM), and Shared Global Allocation Map (SGAM). These are pages 1, 2, and 3 in the data file. A new PFS page is created every 64 MB, and a new GAM and SGAM page is created every 4 GB.

To alleviate the potential for contention on these pages, you need more data files. The generally accepted rule is one equal size data file per core up to 8-cores. If you have more than 8-cores, start with eight data files of an equal size. If contention is still an issue with eight files, create more equally sized data files in increments of four. You should also set the initial size of the data and log files to be the size that TempDB grows to after an initial workload. For example, if you have a 4 vCore server and TempDB grows to 8 GB in size after a normal workload, you would need to create four TempDB data files with an initial size of 2 GB each and make sure each is set to auto grow by the same fixed size.

The default auto growth size for TempDB on SQL Server 2017 and SQL Server 2019 is 64 MB. This can be changed to a higher value if needed. Beginning with SQL Server 2016, when TempDB data grows, each TempDB data file grows at the same time. Prior to SQL Server 2016, this was controlled using trace flag 1117. Also beginning with SQL Server 2016, when an extent is created, all eight pages are created at the same time. Prior to SQL Server 2016, this was accomplished using trace flag 1118.

A common practice is to take advantage of the local SSD that is part of every Azure VM for storing TempDB. VMs have varying sizes for this local disk, so you'll have to make sure your local SSD is of proper size before utilizing it. Memory- and storage-optimized VMs offer a higher capacity local SSD storage than the general-purpose ones. Depending on the TempDB utilization, isolating TempDB to its own premium or ultra SSD may be needed. In some extreme cases, isolating different TempDB data files to a separate disk may also be needed to further distribute I/O. See Appendix A for information on SQL Server configuration with OLTP workloads.

Dynamic management views (DMVs) and Query Store

SQL Server on an Azure VM is still SQL Server, regardless of Linux or Windows. Standard query tuning is still required to make SQL Server run as well as possible. This means monitoring for queries that are consuming the most resources. Common approaches are looking for long-running queries, queries that are executed the most, and those consuming the most CPU and disk I/O.

SQL Server 2005 introduced DMVs as well as dynamic management functions (DMFs). Every version of SQL Server since has introduced new and improved DMVs and DMFs to help manage and support SQL Server. Some common DMVs for performance tuning include:

There are numerous categories of DMVs, and DMVs can be joined with other tables and DMVs to create robust queries. Most DBAs have a collection of DMV scripts to use to collect system information on their SQL Server instances, as well as for troubleshooting performance issues when they arise.

SQL Server Query Store provides insights into the query plan choice and performance. It automatically captures the history of queries, plans, and runtime statistics and retains them for review. You can read more about Query Store in Chapter 3, Hero capabilities of SQL Server on Azure VMs .

How to optimize SQL Server on Linux

There are several Linux-specific changes that need to be made for SQL Server deployments on Linux. Some changes may be specific to certain Linux distributions.

An administrator should disable the last accessed date/time (atime) on any filesystem that is used to store SQL Server data and/or log files. The last accessed date/time mount option causes a write operation to happen after each read access. This would generate a massive amount of extra I/O. The mount option should be changed to noatime to help reduce disk I/O.

In order to deal with large amounts of memory, Linux uses Transparent Huge Pages (THP). THP automates managing, creating, and working with huge pages. Manually trying to manage huge pages would be very difficult, so, for SQL Server on Linux, leave THP enabled.

Memory should be managed so that SQL Server does not starve the underlying operating system. At the same time, SQL Server should be configured to use as much memory as possible without causing an issue with the operating system. SQL Server, by default, will only use 80% of the physical memory. If the remaining 20% is too significant and wasteful, you can manually configure the value. This is controlled by the memory.memorylimitmb setting. To change the value, use the mssql-conf script and set the value for your server to the memory.memorylimitmb value. You should also have a configured swapfile in place to avoid any memory issues.

Azure BlobCache

When configuring the VM or adding storage, you can set the caching for the disk. BlobCache provides a multi-tier capability by using the VM's memory and the local SSD for caching. This is only available for Premium disk and is the default for Azure Marketplace images. If you've self-installed SQL Server, then you'll have to manually set up Azure BlobCache.

There are three options for disk caching: ReadWrite, None, and ReadOnly.

For SQL Server workloads, ReadWrite should not be used. ReadWrite can lead to data consistency issues with SQL Server.

None should be used for SQL Server log files. SQL Server log files write data sequentially to disk. There would be no benefit to using ReadOnly caching in that scenario.

SQL Server data files should take advantage of the ReadOnly caching option. By using ReadOnly, reads are pulled from the cache, which are stored within the VM memory and the local SSD. Memory is fast, and the local SSD provides better performance than the remote SSD storage. What is even more crucial is that the reads from the cache are not counted towards the disk IOPS and throughput limits. This allows you to achieve higher overall throughput from the VM. ReadOnly caching provides lower read latency, higher throughput, and overall higher read IOPS compared to premium remote SSD.

If you are taking advantage of ReadOnly caching for SQL Server, reserve the local temporary SSD for caching, and do not place TempDB on this disk. Leave the I/O for caching and place TempDB on its own premium or ultra SSD.

Summary

For great performance with SQL Server on an Azure VM, you need to first size the VM for your workload, paying attention to CPU, memory, and storage I/O capabilities. You can easily select the right configuration for your workload with the vast array of Azure VMs available. Many have been configured to provide the best price for performance for SQL Server workloads. By leveraging the highest generation VM sizes with Azure BlobCache and ultra disks, you can run just about any size SQL Server workload on an Azure VM. To know what size VM you need, you'll need a baseline, and to have SQL Server run as smoothly as possible, you'll want to ensure that you've configured it properly for your workload. In addition to SQL Server configuration settings, there are key changes to make in Linux environments, especially setting noatime to reduce wasteful I/O.

Chapter links

  1. https://bit.ly/2X3b8Ba
  2. https://bit.ly/2X2nwRV
  3. https://bit.ly/36yDV3r

By Tim Radney

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset