© Julian Soh, Marshall Copeland, Anthony Puca, and Micheleen Harris 2020
J. Soh et al.Microsoft Azurehttps://doi.org/10.1007/978-1-4842-5958-0_17

17. Azure Data Services

Julian Soh1 , Marshall Copeland2, Anthony Puca3 and Micheleen Harris1
(1)
Washington, WA, USA
(2)
Texas, TX, USA
(3)
Colorado, CO, USA
 

Digital transformation is a term to describe how technology has changed our lives, how we interact with each other, and how organizations make business decisions. It is all possible because of the abundance of data available today. There is not only more types of data, but also the emergence of new types of data.

The volume of data has exploded exponentially in the last 30 years, especially since the early 2000s as more Internet-connected devices were adopted by organizations as well as consumers.

When the iPhone debuted in 2007, it ushered in the era of the smartphone. Entire countries that did not have the infrastructure of landlines leapfrogged developed countries in the adoption of cellular technologies. Applications have become untethered to computers and moved into the hands of mobile users creating social media content that is shared and reshared throughout the entire world. Social media unleashed different types of content such as video, conversation streams (streaming data), and photos.

Then came the Internet of Things (IoT) connecting devices to the Internet and sending data 24 hours a day, seven days a week. Smart homes and digital assistants monitor homes and make special orders via verbal commands or because an IoT refrigerator knows that something is running low and needs reordering. Organizations automate production lines and supply chains based on the conditions that IoT devices are tracking at every stage. You may have taken for granted the ability to track the delivery of the item you ordered from an Internet vendor as it traverses through a shipping company like FedEx or multinational carriers that seamlessly hand off a package from one shipper to another as it crosses international borders. This is all content and data that was not available before data was globally connected by public cloud.

To say that these examples are barely scratching the surface in terms of new content that never existed before is a huge understatement. There are so many use cases in any industry, vertical, consumer, and enterprise that generate rich content that now drives business decisions and touches our personal lives daily. The data abundance that has resulted in this age of digital transformation means that we need new tools, skills, and strategies to effectively harness the power of data-driven decision making. But before we can make sense of the data, we first need to efficiently store and manage data. For that, we need to better understand data as it exists today.

Data Trends

Data trends fall into four topical categories.
  • Data types and volume

  • Data analysis trends

  • Data roles

  • Data platform as a service (data PaaS)

Data Types and Volume

When we use the term data in the IT context, we automatically think of databases, SQL, tables, indexes, fields, and so forth. That is the traditional definition of data, also known as structured data.

However, not all data today fall into columns with fields and indexes. That is certainly not the case with videos or photos, aside from the metadata of such content. But you cannot afford not to classify them as data. For example, take the case of security video footage or videos captured by law enforcement body cameras. These are all content that is being used in line of business (LoB) applications and sometimes in court for legal proceedings. Therefore, they are truly the new data types that need to be handled differently. We call this content unstructured data.

Data also changes over time. Take a retailer, for example. Let’s look at one type of product that a retailer might sell, like a backpack. This simple product can easily fit into a structured database with fields like size, color, weatherproof qualities, or not. Then came along a change in trend that introduced a single strap backpack. Should the retailer add a field for the number of straps or perhaps an indicator field for a single strap and apply it to the entire table? Recently there is also the trend for smart backpacks with charging capabilities for electronic devices. Then came along airport security (e.g., TSA-friendly) backpacks. The bottom line is that new features and trends are introduced to products every time. So instead of adding new fields to the entire table that affects the entire product class, it may be better to have those fields present only when applicable. This is called semi-structured data.

There is also another better way to define certain data relationships rather than a traditional key/value pair in columns. For example, it is easier to map out relationships of data points by representing them as vertices and edges. This is widely used in social media to map the relationships between people. The database that manages this kind of data is known as a graph database .

By now, you are probably no longer surprised you need more storage space than you purchased and continue to fall short all the time. Not too long ago, it was gigabytes, but today, we are commonly referring to consumer-based storage in the terabyte range. That is because video takes up a lot of space. Your chat history with shared and reshared content are space hogs. You need the backup of precious digital photos. The type of content and the social aspect of the content drives the amount of volume such content.

The introduction of different data types—semi-structured, unstructured, or graph requires different approaches in storing and handling such data. The footprint of such data contributes significantly to the volume of data that needs to be managed today.

Data Analysis Trends

You may have heard of data-driven decision making. Data scientists are one of the fastest-growing professions. Colleges are introducing data science curriculums and degrees. Online and distant learning opportunities for professionals who are already in the workforce see high enrollments in such programs.

At the same time, tools such as Databricks and technologies that support big data like Hadoop clusters are rapidly being deployed to support modern data analysis so organizations can take advantage of data-driven decision making. Why all this investment? Because data-driven decision making leads to better customer service, lower operational costs, and in many cases, improve and save lives. Knowing what, when, and why to carry out certain actions is always better than guessing. The simple rule of thumb is that more data improve the quality of decision making.

The problem is that more data also takes up more compute and memory, and the decision-making timing may be cyclical, but also have a rigid timeframe. In other words, it is pointless to spend more to deploy and maintain a big data analysis infrastructure if the activity is carried out once annually, but it is also pointless if the right decision is made too late.

Modern Data Roles

With new and modern data comes new and redefined roles. We briefly mentioned data scientists. These are the professionals who take the massive amount of data, often time from disparate data sources, and conduct data exploration, experiments, and analysis to find correlations between the data.

To make the data accessible so data scientists can do their jobs is the role of the modern data engineer. Data engineers should not be confused with database administrators, although we are seeing a natural transition of professionals from the database administrator roles to that of the data engineer. We have also seen data scientists serve as the data engineer as well, especially in smaller organizations.

In either case, the role of the data engineer is very defined. It is a role that makes the data available in a useable form for data scientists to do the analysis. Making data available involves moving, copying, cleaning, consolidating, merging, and updating data into a central and easily accessible location. It has its roots from the ETL (extract-transform-load) days.

Data engineers need to know which is the right repository for the type of data. For example, as discussed, unstructured data should not be stored in structured databases. So, the data engineer ensures the efficient and best storage medium for the different types of data.

Data engineers also need to ensure security for the data. In recent years, many privacy policies have had a worldwide impact, even though they may originate from certain jurisdictions. One good example is the General Data Protection Regulation (GDPR), which is a European Union (EU) law aimed at protecting EU citizens’ privacy. Data security and access restrictions need to be in place to protect data belonging to EU citizens, but since we are in a global economy, EU citizens are no longer only in the European continent. Even small organizations are multinational if they do business on the Internet. Therefore, GDPR is a globally impacted regulation that everyone needs to adhere to. Using this single regulation as an example, all data engineers need to ensure GDPR data privacy, security, and notification requirements are in place. Throw in other regulations, such as personally identifiable information (PII) , Health Insurance Portability and Accountability Act (HIPAA), Internal Revenue Service form 1075 (IRS 1075), and so forth. While there may be overlapping requirements, the data engineer needs to ensure that they are all adhered to.

What a data engineer should not need to do is to ensure the technology platform is always available and healthy. For example, the data engineer should not need to patch the SQL Database or Cosmos DB every month. The data engineer should not need to ensure the geo-redundant capability is working, just that it is in place if there is a requirement for geo-redundancy of the data in accordance with the organizations’ business continuity and disaster recovery (BCDR) plan.

Data Platform as a Service

Data PaaS is the consumption of database services as a PaaS. This is a narrower definition of a PaaS to mean that it relates to data. It is sometimes incorrectly used interchangeably with data as a service (DaaS). We see DaaS as a definition of data available for consumption.

Data engineers deploying the right type of database to store data without having to manage the VM and the database engine and software is an example of PaaS.

Once data is cleaned, merged, consolidated, or the often used word is wrangled to the point where data scientists can analyze the data, or business users can browse to data repositories with their PowerBI tool and explore the data. Then this is considered data as a service. In Microsoft Azure, Azure SQL Database, and Cosmos DB are two examples of data PaaS. Azure Data Catalog is also an example of a data PaaS that is designed to deliver DaaS to the organization.

You have read from previous chapters the benefit of PaaS in other non-data specific terms, like Azure Cognitive Services and Azure Web Apps. The same benefits that PaaS delivers to these other areas are identical to the data realm: better redundancy, better resiliency, and better overall SLA. But remember our discussion regarding GDPR, PII, and all the numerous regulatory requirements?

Data PaaS offerings in Azure already legally adhere to these requirements. If there are required certifications in place, Microsoft already has the relevant documentation and certifications, so data engineers and organizations adopting the data PaaS can automatically ensure that the proper security requirements are in place.

The other data PaaS offers traditional on-premises databases. The common misconception for data PaaS is the network latency as it relates to data exchange since the data PaaS in the cloud is a remote location. Depending on the architecture and the use cases, this is often not the case.

If an application remains on-premises and pulls data in a traditional client-server approach, there is latency. This should be addressed with an application modernization effort, or, at the very least, combining the lessons learned in Chapter 12 from using web apps as PaaS integrated with data PaaS.

The other use case is in the field of data-driven decision making and the work of the data scientists. If the analytical tools are also PaaS, like Azure Databricks, and then there not be any latency issues. Of course, having to copy data on-premises to Azure take time from a transfer perspective, this is usually a batch process just by the nature of a data warehouse’s role, which is an Online Analytical Processing (OLAP) role as opposed to Online Transaction Processing (OLTP). Furthermore, remember the discussion related to data volume, the ability to store large amounts of data, and then assign sufficient memory and compute to process the analysis against the data on a consumption basis rather than a capital expense yield significant savings.

Azure Data Services

Now that we have introduced the industry trends driving how data is used and the data engineer and data scientist roles in this modern era, we can focus on the actual data PaaS offerings in Azure.

The data PaaS in Azure is called Azure data services. It is a family of data specific solutions to address the replication, manipulation, and storage of the different types of data, at scale, and securely.

Like all PaaS, Azure data services have a financially backed service-level agreement that focuses on high availability. The published availability of Azure data services is summarized in Table 17-1.
Table 17-1

Azure Data Services Availability SLA

Uptime Level

Uptime Hours per Year

Downtime Hours per Year

99.9%

8,751.24

~8.76

99.99%

8,759.12

~0.88

99.999%

8,759.91

~0.09

Another key difference between on-premises databases versus Azure data services PaaS is that the cost of using Azure data services already factors in all licensing costs. In fact, software licensing is not even an applicable line item for Azure PaaS, such as Azure data services.

You can explore Azure data services at https://azure.microsoft.com/en-us/services/#databases, but this only shows the different databases. However, data services include technologies that are not databases in nature, such as IoT services that require IoT-specific data services (see https://azure.microsoft.com/en-us/services/#iot) and all the analytics tools, including Azure Data Factory and Azure Data Catalog (see https://azure.microsoft.com/en-us/services/#analytics). There are also elements of Azure data services that support hybrid integration of on-premises and Azure-based architectures, such as Azure SQL Database managed instance, which you can explore at https://azure.microsoft.com/en-us/services/#hybrid. Essentially, Azure data services have a significant footprint across multiple technologies because data is a very fundamental layer utilized by many different architectures and initiatives.

In this chapter, we focus on the core Azure data services as it applies to databases. Specifically, we look at
  • Azure SQL Database

  • Azure Synapse

  • Azure Cosmos DB

Remember that there are other databases available in Azure as well, even proprietary non-Microsoft databases and open source databases, such as Azure Database for MySQL and Azure Postgres SQL.

Last, we also look at data storage technologies that support Azure data services particularly around extract-transform-load (ETL) and extract-load-transform (ELT) processes, such as Azure Blob and Azure Data Lake Services.

Azure SQL Database

Azure SQL Database should be the most familiar to audiences that have been using SQL Server. As the name implies, it is SQL Databases, not SQL servers, provisioned in Azure. Of course, the underlying technologies that support Azure SQL Database are SQL servers, but as a PaaS, you do not need to provision a VM, install the SQL Server engine, patch it, and test it before using it. You provision the SQL databases you need and start populating it with tables and data. It is the quickest and most time-saving way of getting access to a database.

There are three deployment models for Azure SQL Database.
  • Azure SQL Database

  • Azure SQL Database managed instance

  • Azure SQL Database Elastic Pool

../images/336094_2_En_17_Chapter/336094_2_En_17_Fig1_HTML.png
Figure 17-1

Three flavors of Azure SQL Database

Azure SQL Database is the most common implementation and recommendation of consuming databases as a PaaS. Each deployment of Azure SQL Database is a deployment of a single scalable database.

Hands-on with Azure SQL Database

Azure SQL Database is best for modern cloud applications that need a scalable database that is fully managed, has built-in high availability, and promises the lowest administrative overhead.

Note

One unique aspect of Azure SQL Database as a PaaS is that deploying Azure SQL Database as a PaaS would require pinning the database to a new or existing virtual SQL Server instance. This may sound confusing based on your knowledge of how PaaS work, but if you think about the way the SQL engine works, this makes sense. Traditionally, some capabilities are delivered at the server level (e.g., access security to ports used by SQL) and the other capabilities delivered at the database level. Therefore, to continue providing server-level capabilities for Azure SQL Database, a virtual instance of SQL server be created. Rest assured that you still not need to patch and manage this SQL Server. The hands-on section in this exercise clarifies any confusion concerning this topic.

  1. 1.

    As with all the other exercises throughout this book, we assume you already have an Azure subscription. Go to the Azure portal at https://portal.azure.com.

     
  2. 2.

    Click Create a resource and type SQL Database in the search box, and then select SQL Database in the results, as seen in Figure 17-2.

     
../images/336094_2_En_17_Chapter/336094_2_En_17_Fig2_HTML.jpg
Figure 17-2

Selecting SQL Database from the results of the available service

  1. 3.

    Click the Create button.

     
  2. 4.

    Select the subscription, an existing or new resource group, and a name, and then click Create new to create a new database server to host this database. In the future, when you create another Azure SQL Database, you can use this same database server. Unlike the traditional SQL server licensing model, there is no SQL core or CPU licensing associated with spinning up a server to support Azure SQL databases, but there are compute and storage charges incurred based on what you select. For example, the charges for the server to support Azure SQL databases in Figure 17-3 is for a 2 vCore, 32GB server. Visit https://docs.microsoft.com/en-us/azure/sql-database/sql-database-resource-limits-database-server to learn more about the capacity limitations, but as an example, with this single database server deployment, you can reuse it for up to 5000 Azure SQL Databases.

     
../images/336094_2_En_17_Chapter/336094_2_En_17_Fig3_HTML.jpg
Figure 17-3

Deploying an Azure SQL Database instance

  1. 5.

    In the New server pane, provide a name for the server and the admin login credentials. The admin credentials you enter here is the SQL dba credentials. Select the region to place this database server and click OK.

     
  2. 6.

    Select No for the SQL elastic pool for now. We discuss what elastic pool is later. Click Configure database in the Compute + storage section.

     
  3. 7.

    Take note of the different scale options: General Purpose, Hyperscale, and Business Critical. Look at the characteristics, such as the IOPS and latency for each option, as seen in Figure 17-4. Click each box to see other details. In an actual deployment, select the option that meets the performance requirements of your database. For now, once you have explored the different options, select General Purpose.

     
../images/336094_2_En_17_Chapter/336094_2_En_17_Fig4_HTML.jpg
Figure 17-4

Different deployment options for Azure SQL Database based on the desired performance

  1. 8.

    Next, you have the option to select Provisioned or Serverless for the Compute tier for your database. As their names imply, the Provisioned tier permanently allocate compute resources to the Azure SQL Databases that are allocated to this server. You are billed hourly so long as this server exists. The Serverless tier is based on dynamically scaled and assigned resources. During peak times, more resources may be assigned based on the sliders at the bottom of the page. During lower peak times, resources are deallocated to save money. When there is no activity in the databases, all resources are deallocated, and the database pauses until a request comes in. Therefore, there might be a slight delay if there is a period of database inactivity. If your application has unpredictable usage and can tolerate some latency, Serverless is the most cost-effective option.

     
  2. 9.

    For this exercise, select Provisioned.

     
  3. 10.

    Click Change configuration in the Hardware Configuration section. Take note of the different configuration types available. You see a compute optimized vs. memory optimized configuration option. Based on your application requirements, select a configuration that best meets your needs. For this exercise, accept the default Gen5 configuration and click OK at the bottom of the screen.

     
  4. 11.

    Use the sliders at the bottom of the screen to allocate the number of vCores and the space. You can always modify this later.

    Note In this exercise, we are explicitly defining the vCores and Storage to allocate to this server. There is another option, which is called the DTU billing option. DTU stands for database transaction units. DTUs are a unit of measure that bundles compute, storage, and IOPS. It is an option for customers who do not want to hassle with actual sizing of the servers and want a simple pre-configured option. Addressing pricing models is outside the scope of this book but are easily understood by visiting https://docs.microsoft.com/en-us/azure/sql-database/sql-database-purchase-models or discussing it with your Microsoft representative. While on this topic, note that if you already own traditional licenses for SQL Server, you may obtain credit for the Azure SQL Databases you spin up in Azure.

     
  5. 12.

    Click Apply.

     
  6. 13.

    Click Next : Networking. You are not be adding a private endpoint for this exercise, but this topic was covered in Chapter 9; note that Azure SQL Database is a PaaS that you can allocate a private IP address as a hosted endpoint if needed.

     
  7. 14.

    Click Next : Additional settings. You are accepting the default options for this exercise and also skip creating tags. Click Review + create .

     
  8. 15.
    At this point, Azure deploy the database and a new database server instance to host that database. When the deployment is complete, go to the Azure database resource. In the Overview pane, you can see the server that it is hosted on, along with other important information like the status and location of the database. The database server name is represented as a fully qualified URL. Take note of this URL and click the server name, as referenced in Figure 17-5 which is also a link. This takes you to the database server resource.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig5_HTML.jpg
    Figure 17-5

    Overview pane of an Azure SQL Database resource

     
  1. 16.
    Under the Security section of the menu on the left, click Firewalls, as referenced in Figure 17-6 and virtual networks.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig6_HTML.jpg
    Figure 17-6

    Configuring the firewall to allow access to this server and databases

     
  1. 17.
    The Azure portal has detected the IP address of your computer, so if you click Add client IP, as referenced in Figure 17-7 it adds this address to the firewall to allow network-level access to the server. This is an important step if you wish to manage your Azure SQL Database using SQL Server Management Studio (SSMS). You can manually add ranges of IP addresses that you may want to allow access to connect to this database server. Actually, you may have to, especially if you are using network address translation that masks your IP address.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig7_HTML.jpg
    Figure 17-7

    Adding a client IP address to Firewall ACL

     
  1. 18.

    Click Save and wait for the portal to update firewall rules.

     

At this point, you can access the Azure SQL Database that is hosted on this database server.

Assuming you are using SSMS as your client administrative tool, launch SSMS and connect to the server using the full URL of the server and the SQL dba login you created in step 5. Click Connect, as referenced in Figure 17-8. If your firewall is properly configured, as outlined in step 17, you are connected to the database server.
../images/336094_2_En_17_Chapter/336094_2_En_17_Fig8_HTML.jpg
Figure 17-8

Connecting to the database server to manage an Azure SQL Database using SSMS

In the future, if you create additional Azure SQL Databases but reuse the same database server, you see those databases in SSMS as well. Furthermore, you can choose to create a new database from within SSMS, and that creates a new Azure SQL Database instead of going through the portal.

We did not go into extensive details on the use of SSMS or other tools because we assume that you already know about database administration and access. What this exercise was designed to do is to show you the provisioning of Azure SQL Database and the important features. Still, tools like SSMS that are compatible with traditional SQL server databases continue to work.

Applications that reference the database servers need to use the new URL, but if a CNAME record in DNS exists for the database URLs applications are configured to use, and the CNAME is repointed to the database server hosting Azure SQL Database, you likely would not need to make any changes to the applications themselves.

Azure SQL Managed Instance

Azure SQL Managed Instance is another deployment model of Azure SQL Database PaaS but with greater integration with a hybrid architecture.

Figure 17-9 depicts the difference in the access model between Azure SQL Database and Azure SQL Managed Instance, especially with the introduction of Azure Private Link for PaaS.
../images/336094_2_En_17_Chapter/336094_2_En_17_Fig9_HTML.jpg
Figure 17-9

Differences in access models

Both Azure SQL offerings provide all the benefits of PaaS. For Azure SQL Databases, access to the database that has been provisioned on the virtual server instance is via a URL that is associated with a public endpoint (<host>.database.azure.net) as we saw in the previous exercise. With Private Link, that public endpoint can have a URL that is a private IP address owned by the customer organization, thereby further obscuring the endpoint for potential exploitation.

With Azure SQL Managed Instance, the database PaaS is tied to a virtual network (VNet), which is inherently private and isolated. This VNet is then connected to the on-premises network via a Site-to-Site VPN or ExpressRoute. We covered hybrid networking in Chapter 9.

Azure SQL Managed Instance is a new deployment option that provides native VNet integration for customers that have security requirements that mirror traditional on-premises SQL server deployments and are not exposed to the public Internet. It is also a model for applications that need to remain on-premises and access the database via a high-speed, dedicated link to Azure like ExpressRoute, where latency is minimized or eliminated.

Another important differentiator between Azure SQL Database and Azure SQL Managed Instance is that it has nearly a 100% compatibility with the latest SQL Server that is deployed on-premises. In fact, the capabilities in Azure SQL Managed Instance are greater than SQL Server on-premises. As such, Azure SQL Managed Instance makes it likely that customers can lift-and-shift on-premises databases that serve on-premises applications with minimal disruptions.

Deploying an Azure SQL Managed Instance is very similar to deploying Azure SQL Database, with the exception that you select an existing VNet or configure a new one to deploy Azure SQL Managed Instance to, and then provide an IP address that is in that VNet’s virtual IP range.

Elastic Pools

The third Azure SQL Database deployment type is the use of Elastic Pools. Azure SQL Database elastic pool is a method of housing multiple Azure SQL Databases on a single Azure SQL Database server but letting Azure scale the capacity of the server based on a set price.

Elastic pools are most beneficial to software as a service (SaaS) application providers. In a SaaS delivery model, application providers may deploy multiple separate databases for each SaaS customer organization using a common application. Customer organizations may be different in size and have different peak usage. Traditionally, SaaS providers deploy infrastructure that can handle the peak utilization of all their customers to ensure that SLAs are met. This is often a costly strategy because there be times that the compute and IOPs remain idle. The solution is to house these customer databases on an elastic database server, so all the databases share the same resources on the server. It is up to the server to allocate and scale the right amount of resources to each database in real time based on current and historical consumption. This leads to predictable budgets for the SaaS provider without compromising the ability to meet published SLAs.

Hands-on with Elastic Pools

  1. 1.

    In the Azure portal, go to the Azure SQL Database that you deployed from the first exercise and click the name of the database server in the Overview pane, or you can directly go to the database server resource.

     
  2. 2.
    In the Overview pane, click New elastic pool, as referenced in Figure 17-10.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig10_HTML.jpg
    Figure 17-10

    Adding a new elastic pool

     
  1. 3.

    Provide a name for the elastic pool then click Review + create.

     
  2. 4.

    Note the estimated monthly cost for the Database server, and then click Create.

     
  3. 5.

    Once the elastic pool is deployed, click Go to resource.

     
  4. 6.
    Select Configure, and then click Add databases. Select the Azure SQL Database created in the first exercise to add to the elastic pool, and then click Apply, as seen in Figure 17-11.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig11_HTML.jpg
    Figure 17-11

    Adding an existing database to the elastic pool

     
  1. 7.

    Click Save at the top of the pane.

     
  2. 8.

    After the changes have been implemented, click Overview. Click Create database.

     
  3. 9.

    Give the database a name and note that since you are creating this database from within the elastic pool pane, the options for the server are grayed out.

     
  4. 10.

    Click Review + create. Notice that the estimated cost for this database is stated as Included in pool, which means that the database server is scaled so that it remains within the budget as set when we defined the elastic pool’s vCore and storage.

     
  5. 11.

    Click Create.

     
  6. 12.

    After the database has been created, return to the elastic pool resource, and click Configure.

     
  7. 13.

    At the bottom of the Configure pane, you see the average CPU, peak CPU, and storage that the two databases are using. Over time, you can tune the vCore and storage of the elastic pool so that the shared use of resources by all the databases in this pool are sufficiently and efficiently allocated automatically by the pool.

     
To close out the discussion on elastic pools, we have two other Azure SQL Databases in our lab environment. They were expected to have very low utilization, as referenced in Figure 17-12 so we used a small DTU purchasing model that results in approximately $15/month for each database.
../images/336094_2_En_17_Chapter/336094_2_En_17_Fig12_HTML.jpg
Figure 17-12

Two databases purchased separately via the DTU purchasing models

Repeating the steps 6 and 7 from the previous exercise, we added these two databases to the elastic pool because we know that there is enough capacity to accommodate these two databases. By doing so, we saved $30/month since both databases are now absorbed into the cost of the elastic pool.

Hands-on Tuning and Monitoring Azure SQL Databases

One of the benefits of using PaaS in Azure is thebuilt-in

performance tuning and monitoring capabilities. In this exercise, we explore these capabilities using the database we created earlier.

To observe the results of tuning, we first set up the test environment by downloading and installing HammerDB, which is a free load testing tool for databases.

Setting up the Test Environment

  1. 1.

    From the client machine that is conducting the test against Azure SQL Database, download HammerDB from www.hammerdb.com/download.html and install it. Take note of the directory in which you are installing HammerDB because you need to go to that directory to launch the application. The installation process may not create an icon on your desktop or Start menu. Make sure that this client machine’s IP address is already added to the firewall for Azure SQL Database, which it would be if you are using the same machine against the same Azure SQL Database server in the first exercise.

     
  2. 2.
    After installation is complete, go to the installed directory and launch HammerDB by double-clicking and executing hammerdb.bat, as seen in Figure 17-13.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig13_HTML.jpg
    Figure 17-13

    Locating and executing hammerdb.bat

     
  1. 3.

    By default, HammerDB is configured for Oracle. Double-click SQL Server in the Benchmark panel and choose TPC-C, which stands for Transaction Processing Performance Council – Benchmark C.

     
  2. 4.

    Expand Schema Build and double-click Options.

     
  3. 5.
    Enter the information as you see in Figure 17-14. This creates a database named tpcc in Azure SQL Database and populates it with test schema and data. Pay close attention to edit the ODBC driver version, or the tool may be unable to connect to Azure SQL Database. Leave the number of Warehouses to 1, but guidelines suggest 10 to 100 warehouse per CPU. Leave Virtual Users to Build Schema at 1.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig14_HTML.jpg
    Figure 17-14

    Editing the HammerDB connection properties for Azure SQL Database

     
  1. 6.

    Click OK.

     
  2. 7.

    Double-click Build under Schema Build and start the build process.

     
  3. 8.
    After the build process is complete, the Azure SQL Database server has a new database named tpcc. You can verify that the database exists via SSMS or from the Azure portal, together with tables and sample data, as seen in Figure 17-15.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig15_HTML.jpg
    Figure 17-15

    New tpcc database with sample data

     
  1. 9.

    Next, expand Driver Script in the Benchmark panel and double-click Options.

     
  2. 10.
    Configure the settings, as seen in Figure 17-16. Click OK when done.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig16_HTML.jpg
    Figure 17-16

    Configuring the test driver script

     
  1. 11.

    Double-click Load under Driver Script and activate the script.

     
  2. 12.

    Expand the Virtual Users section and double-click Options.

     
  3. 13.
    Create virtual users, as seen in Figure 17-17, and then click OK.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig17_HTML.jpg
    Figure 17-17

    Creating Virtual users that load test the tpcc Azure SQL Database

     
  1. 14.
    Expand Autopilot and double-click Options. Configure Autopilot settings, as seen in Figure 17-18. We are going to ramp each stage of the test to get the results as we incrementally increase the load (number of users) according to the Active Virtual User Sequence. Then click OK.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig18_HTML.jpg
    Figure 17-18

    Configuring Autopilot to run the load test

     

Leave HammerDB running for now because we are going to switch our attention to the Azure portal to look at performance statistics and turn on Automatic Tuning. We then come back to HammerDB to start the load testing.

Performance Monitoring and Automatic Tuning

First, we look at automatic tuning. Automatic tuning is a service that continuously monitors database usage and automatically tunes it for optimal performance. This is based on intelligent learning and AI capabilities within Azure.
  1. 1.

    In the Azure portal, go to the Azure SQL Database server that was created along with the Azure SQL Database from the first exercise.

     
  2. 2.
    In the Overview pane, select the Features tab, and then select Automatic tuning, as seen in Figure 17-19.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig19_HTML.jpg
    Figure 17-19

    Accessing the automatic tuning options for Azure SQL Database

     
  1. 3.

    You see the three Automatic Tuning options to optimize database performance: force plan, create index, and drop index. By default, the database server is inheriting from Azure defaults, which is to turn on force plan and turn off create and drop index. Note that Azure SQL Databases that are hosted on this database server inherit these settings by default, but you can modify the options at the database level as well. Go to the specific Azure SQL Database to stop inheriting these server settings and set its Automatic Tuning options.

     
  2. 4.

    Turn on Create Index.

     
  3. 5.

    Click Overview, and at the bottom of the pane, under SQL database, you should see the tpcc database that was created by HammerDB. Click tpcc to select it. The portal should take you to the tpcc Azure SQL Database resource.

     
  4. 6.

    Scroll down to Intelligent Performance and click Automatic tuning. Notice you have the same options as in step 3, but this is at the database level. Leave the settings as they are because you are going to use the server’s settings from step 3.

     
  5. 7.
    Click Performance recommendation, as referenced in Figure 17-20. Azure tells you that the database is too new, and therefore there are no recommendations at this time.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig20_HTML.jpg
    Figure 17-20

    Database performance recommendations

     

Now that you have turned on Automatic tuning, you are ready to start load testing with HammerDB.

Load testing Azure SQL Database Server

  1. 1.
    Return to HammerDB and double-click Autopilot in the Benchmark panel. This starts the load testing process as virtual users are ramped up. You can stop the test at any time by clicking the Stop button, as seen in Figure 17-21, but the intent is for the script to run in its entirety.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig21_HTML.jpg
    Figure 17-21

    Running the load test in Autopilot mode

     
  1. 2.

    Click the Autopilot tab to see the progress of the test. In Figure 17-22. The Autopilot test phase completed using 49 virtual users, and the results are 28333 TPM or 12276 NOPM. Autopilot is also preparing to re-run the test with 65 virtual users. Notice that this is according to our settings from Figure 17-18 in step 14 when configuring HammerDB.

     
  2. 3.

    The setup for this test, as outlined in this exercise, takes up to an hour. When done, you can go scroll through the Autopilot window and gather all the TPM or NOPM data to determine performance at different user-level loads.

     

Analyzing Performance

  1. 1.
    Go back to the Azure portal and look at the Performance overview for the tpcc database. You should see tuning activity, top queries, and CPU consumption, as seen in Figure 17-22.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig22_HTML.jpg
    Figure 17-22

    Performance overview for Azure SQL Database

     
Note

You may have to run the HammerDB load test several times before Automatic Tuning has enough information to start implementing a tuning activity. Therefore, you may not see any Performance recommendations as well. If that is the case, re-run Autopilot in HammerDB to generate additional load history.

  1. 2.
    Click Performance recommendation. If there is enough information for Automatic Tuning to initiate an action, it is shown in this pane. Figure 17-23 shows a Create Index action resulting from Automatic tuning. The index is created, executed, and then validated. Refresh the page to see the different stages of the automatic tuning.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig23_HTML.jpg
    Figure 17-23

    Automatic Tuning in action (enlarged view)

     
  1. 3.
    Explore all the performance details by clicking Query Performance Insight. You find interesting reports like longest-running queries and queries taking up the most resources. You can also create custom queries. You can click the reports and drill down to the details to get more granular information on the performance of the queries, as seen in Figure 17-24.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig24_HTML.jpg
    Figure 17-24

    Drilling down to look at the performance at the query level

     

Next Steps: Self-Guided Assignment

Before we leave this exercise, create a table in the tpcc database that we are using in Chapter 19.

Launch SQL Server Management Studio (SSMS) and connect to the Azure SQL Database. Create this table in the tpcc database using the following SQL statement.
CREATE TABLE Users (
    PersonID int,
    LastName varchar(255),
    FirstName varchar(255),
    email varchar(255),
    ip varchar(255),
        gender varchar(10),
        comments varchar(max) null
);

This concludes our exploration of Azure SQL Database. For a detailed discussion and a much deeper dive on Azure SQL Database, a forked copy of Microsoft’s extensive documentation on Azure SQL Database is on our GitHub repo at https://github.com/harris-soh-copeland-puca/azure-docs/tree/master/articles/sql-database.

Azure Cosmos DB

For the longest time, Microsoft was only associated with SQL Database. With Azure, Microsoft has not only embraced other database engines, including opensource, it has also recognized the need for modern databases that are better catered to handle semi-structured and unstructured data. Therefore, in 2017, Microsoft announced the release of Azure Cosmos DB.

Azure Cosmos DB is Microsoft’s proprietary database service in the cloud that is
  • Globally distributed

  • Multimodel

  • Schema agnostic (which is why it is a better fit for semi and non-structured data)

  • Horizontally scalable

  • Extremely low latency (millisecond response times) backed by high- performance Solid State Drives (SSD)

  • Configurable database consistency levels

Cosmos DB can generally be classified as a NoSQL database at scale.

Use Cases for Azure Cosmos DB

Now that you know the key characteristics of Azure Cosmos DB, we can explore the use cases that make Azure Cosmos DB the preferred data source. The commonly shared requirements among the use cases are the need for low latency, data availability on a global footprint, and data that does not adhere to a strict schema.

Internet of Things (IoT)

IoT devices are usually deployed for real-time monitoring purposes and are found in sensors scattered in various locations. The kind of data generated by IoT devices is often referred to as streaming data and comes in high-frequency bursts. A low-latency database that supports real-time analytics is a key requirement for IoT deployments. An IoT solution generally uses a Cosmos DB that is front-ended by an Azure IoT hub or Azure event hub. The IoT hub or event hub can buffer incoming data streams at a rate of millions of events per second before funneling the data to Azure Cosmos DB for ad hoc querying.

Retail

Referring to our discussion at the beginning of this chapter, one of the data trends is the increase in semi-structured data types. The rise and expansion of eCommerce in the retail space is one area that is driving this data trend. Like the backpack product example, attributes of products that retailers carry change frequently over time. Being schema-agnostic makes Cosmos DB a great option for a product database.

If a retailer has a worldwide footprint, which is pretty much any retailer that sells on the Internet, and then a globally accessible database is desired for performance purposes. In this use case, the ability to configure a strong data consistency level that ensures site visitors always read the latest globally committed write may be important, so the latest inventory level or order is always being read.

Web and Social

Social media and websites are generally global services, so naturally, the globally distributed nature of Azure Cosmos DB makes it an ideal choice. Furthermore, data representing social interactions and relationships are better represented as a graph with nodes and vertices. Azure Cosmos DB, being a multimodel database, can be deployed with the Gremlin API so the data can be mapped as graphs.

Hands-on: Deploying Azure Cosmos DB

The comprehensive documentation for Azure Cosmos DB is at https://docs.microsoft.com/en-us/azure/cosmos-db/.
  1. 1.

    In the Azure portal and click Create a resource.

     
  2. 2.

    Type Cosmos in the search box and select Azure Cosmos DB in the search results, and then click Create.

     
  3. 3.

    Select the subscription that house the Cosmos DB, as referenced in Figure 17-25 and select an existing resource group or create a new one for this instance of Cosmos DB deployment.

     
  4. 4.

    Provide an account name, which must be globally unique. For this exercise, we named our Cosmos DB bot-conversations-cosmos.

     
  5. 5.
    Click the API drop-down menu and review the options. Select Core (SQL).
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig25_HTML.jpg
    Figure 17-25

    Deploying Azure Cosmos DB with MongoDB API

     
  1. 6.

    Turn off the Notebooks option and select a location closest to you.

     
  2. 7.

    For account type, select Non-Production. This only customized the user interface in the portal by immediately enabling Geo-Redundancy and Multi-region Writes. It does not affect the SLA or the cost of the service instance, aside from the fact that Geo-Redundancy and Multi-region Writes are enabled.

     
  3. 8.

    Select the latest version.

     
  4. 9.

    Leave Availability Zones disabled, and then click Review + create.

     
  5. 10.

    Click Create.

     
  6. 11.

    Once the database is deployed, click Go to resource.

     
You have now deployed an instance of Azure Cosmos DB. Before populating it with data, let’s explore some of the database features mentioned earlier.
  1. 1.
    Select the Replicate data globally menu option located under Settings. You should see a checkbox on a map indicating where your instance of Azure Cosmos DB was deployed, as seen in Figure 17-26. It should be in the location where you selected in the previous exercise.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig26_HTML.jpg
    Figure 17-26

    Azure Cosmos DB global geo-replication

     
  1. 2.

    Other locations where you can replicate Azure Cosmos DB to are indicated by the pins on the map. Hover over a pin to identify the location.

     
  2. 3.

    Click a pin to select a location you wish to replicate Azure Cosmos DB in. For this exercise, we selected the Southeast Asia region.

     
  3. 4.

    Once you have selected a region to replicate the database in, by default, the new region is read-only. To enable read and write to the region, click Enable for Multi-region writes.

     
  4. 5.

    Click Save. Azure Cosmos DB starts replicating the database to the region that you have selected. In our exercise, the database is being replicated in Southeast Asia.

     
  5. 6.

    To disable geo-replication, click the newly replicated region, and then click Save again.

     

You have seen how easy it is to replicate to multiple locations throughout the world. Let’s now use Azure

Cosmos DB to store data.

Hands-on: Using Azure Cosmos DB to Store Bot Conversation History

Remember Joe theBot from Chapter15? We are going to add more capabilities to Joe with the help of Cosmos DB. What we do in this exercise is to store conversation history. So, what are the benefits of storing bot conversation history?

Conversation history is a good source for analytics. For example, with conversation history, we can send that to Azure Cognitive Services for sentiment analysis to determine how satisfied customers are with their engagement with Joe. We can also mine conversation history to determine what were the top issues being asked. From a more traditional standpoint, conversation history may be a regulatory requirement or would be helpful in audits. There are many good reasons why conversation history should be stored.

Bot conversation history is stored in JSON format and is a built-in capability of the Bot Framework, known as the Bot Framework State Service. The conversations are stored as transactions into a memory object which we write to storage at the end of the session, in this case, Cosmos DB.
  1. 1.

    Launch Visual Studio and open the project for Joe the Bot from Chapter 15. If you rather pull the completed project from GitHub, go to https://gitHub/harris-soh-copeland-puca/JoeTheBot and clone the TrackConversation feature branch.

     
  2. 2.

    Add the Microsoft.Bot.Builder.Azure NuGet package.

     
  3. 3.

    In Solution Explorer, right-click the folder named Bots. Click Add and select Class .

     
  4. 4.
    Name the class TrackConversations.cs and add the following code to the TrackConversation class.
        public class TrackConversation
        {
            // Tracking what users have said to the bot
            public List<string> ConversationList { get; } = new List<string>();
            // The number of conversational turns that have occurred
            public int TurnNumber { get; set; } = 0;
            // Create concurrency control where this is used.
            public string ETag { get; set; } = "*";
        }
     
  5. 5.
    Open the QnABot.cs file and add the following headers to the top of the file.
    using Microsoft.Bot.Builder.Azure;
    using System.Linq;
     
  6. 6.
    In the QnABot class, add the following code containing values needed to connect to Cosmos DB as seen in Figure 17-27. You can get the Cosmos DB information from the Azure portal, as seen in Figure 17-28 .
    public class QnABot<T> : ActivityHandler where T : Microsoft.Bot.Builder.Dialogs.Dialog
        {
            protected readonly BotState ConversationState;
            protected readonly Microsoft.Bot.Builder.Dialogs.Dialog Dialog;
            protected readonly BotState UserState;
            //Cosmos DB connection properties
            private const string CosmosServiceEndpoint = "<CosmosDB_URI>";
            private const string CosmosDBKey = "<CosmosDBKey>”;
            private const string CosmosDBDatabaseName = "<CosmosDBName>";
            private const string CosmosDBCollectionName = "bot-storage";
            // Create Cosmos DB  Storage.
            private static readonly CosmosDbStorage query = new CosmosDbStorage(new CosmosDbStorageOptions
            {
                AuthKey = CosmosDBKey,
                CollectionId = CosmosDBCollectionName,
                CosmosDBEndpoint = new Uri(CosmosServiceEndpoint),
                DatabaseId = CosmosDBDatabaseName,
            });
            //End Cosmos DB connection properties
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig27_HTML.jpg
    Figure 17-27

    Insert Cosmos DB connection properties in the project

    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig28_HTML.jpg
    Figure 17-28

    Keys for Cosmos DB from the Azure portal

     
  1. 7.
    In QnABot.cs, modify the OnMessageActivityAsync method so it looks like this.
    protected override async Task OnMessageActivityAsync(ITurnContext<IMessageActivity> turnContext, CancellationToken cancellationToken)
            {    // Run the Dialog with the new message Activity.
                await Dialog.RunAsync(turnContext, ConversationState.CreateProperty<DialogState>(nameof(DialogState)), cancellationToken);
                {
                    // preserve user input.
                    var conversation = turnContext.Activity.Text;
                    // make empty local logitems list.
                    TrackConversation logItems = null;
                    // see if there are previous messages saved in storage.
                    try
                    {
                        string[] conversationList = { "ConversationLog" };
                        logItems = query.ReadAsync<TrackConversation>(conversationList).Result?.FirstOrDefault().Value;
                    }
                    catch
                    {
                        // Inform the user an error occured.
                        await turnContext.SendActivityAsync("Sorry, something went wrong reading your stored messages!");
                    }
                    // If no stored messages were found, create and store a new entry.
                    if (logItems is null)
                    {
                        // add the current utterance to a new object.
                        logItems = new TrackConversation();
                        logItems.ConversationList.Add(conversation);
                        // set initial turn counter to 1.
                        logItems.TurnNumber++;
                        // Create Dictionary object to hold received user messages.
                        var changes = new Dictionary<string, object>();
                        {
                            changes.Add("ConversationLog", logItems);
                        }
                        try
                        {
                            // Save the user message to your Storage.
                            await query.WriteAsync(changes, cancellationToken);
                        }
                        catch
                        {
                            // Inform the user an error occured.
                            await turnContext.SendActivityAsync("Sorry, something went wrong storing your message!");
                        }
                    }
                    // Else, our Storage already contained saved user messages, add new one to the list.
                    else
                    {
                        // add new message to list of messages to display.
                        logItems.ConversationList.Add(conversation);
                        // increment turn counter.
                        logItems.TurnNumber++;
                        // Create Dictionary object to hold new list of messages.
                        var changes = new Dictionary<string, object>();
                        {
                            changes.Add("ConversationLog", logItems);
                        };
                        try
                        {
                            // Save new list to your Storage.
                            await query.WriteAsync(changes, cancellationToken);
                        }
                        catch
                        {
                            // Inform the user an error occured.
                            await turnContext.SendActivityAsync("Sorry, something went wrong storing your message!");
                        }
                    }
                }
            }
     
  2. 8.

    Debug the bot and test it with the Bot Emulator by carrying out a conversation, as you did in Chapter 15.

     
  3. 9.

    After a few lines of conversation, end your session and go to the Azure portal.

     
  4. 10.

    Go to the Cosmos DB resource and click Data Explorer.

     
  5. 11.

    Expand the database, followed by the container, and finally, Items.

     
  6. 12.
    Click ConversationLogin the Items. You should see the conversation that you carried out with Joe the Bot tracked, as seen in Figure 17-29.
    ../images/336094_2_En_17_Chapter/336094_2_En_17_Fig29_HTML.jpg
    Figure 17-29

    Viewing the tracked conversation in Cosmos DB

     

This concludes our discussion on and hands-on implementation of Azure Cosmos DB. Documentation on Azure Cosmos DB is on GitHub at https://github.com/harris-soh-copeland-puca/azure-docs/tree/master/articles/cosmos-db.

Summary

In this chapter, we only had time to cover Azure SQL Database and Azure Cosmos DB. We selected these two databases because they are the most prominent deployments in many of the projects we encounter today. However, keep in mind that Azure data services are very extensive and provide many other databases, including open source. We also omitted covering Azure Synapse in this chapter because it is not possible to do so in a limited amount of space. Data warehousing and analytics is a specialized topic that is better served by other resources, but in Chapter 16, we introduced advanced analytics using machine learning, deep learning, and Azure Databricks. In Chapter 19, you are introduced to data engineering, where we take a deeper look at data pipelines, data manipulation, and the concepts of a modern data warehouse, where we spend some more time discussing Azure Synapse.

Now that you know about the different data services in Azure, especially Azure SQL Database, in the next chapter, let’s talk about strategies, tools, and best practices for migrating on-premises databases to Azure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset