Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Pranab Mazumdar, Sourabh Agarwal and Amit Banerjee, Pro SQL Server on Microsoft Azure , 10.1007/978-1-4842-2083-2_3

3. Microsoft Azure Storage

Pranab Mazumdar¹, Sourabh Agarwal¹ and Amit Banerjee¹

(1)Bangalore, Karnataka, India

Microsoft Azure Storage is a cloud storage system that provides customers the flexibility to store huge amount of data, and it can be stored for any duration. This uniqueness of the data that is stored here is that you can access the data anywhere and at anytime. It’s also a utility-based storage system (i.e., you pay for what you use and what you store). In Microsoft Azure Storage, the data is durable due to the local replication and geographic replication that enables disaster recovery. The storage consists of blobs (user’s files), tables (structured storage), and queues (messaging). The data is highly durable, available, and massively scalable. It is exposed through REST APIs, client libraries in .NET, Java, Node.js, Python, PHP, Ruby.

Azure Storage Service

There are several types of Microsoft Azure Storage services (see Figure 3-1):

Blob storage service
Table storage
Queue storage
File storage

Figure 3-1. Microsoft Azure Data Storage concepts and REST protocols to access blobs tables and queues

Blob Storage

A blobs is nothing but a filesystem in the cloud. It can contain text data or binary data, such as a document, a media file, or an application installer. In short, it is a simple interface to store and retrieve files in the cloud (see Figure 3-1). The following list provides some common uses of blobs:

Data sharing. Customers can share documents, pictures, videos, and music.
Big Data insights. Customers can store lots of raw data in the cloud and can compute using map reduce jobs for getting data insights.
Backups. Many customers store the backups in the cloud, i.e., they store on-premises data in the cloud.

The blob storage service contains the following key concepts:

Storage accounts. Microsoft Azure provides storage in the different locations around the world. You need to create a storage account to access the storage services and host your data. Once you create your storage account, you can create blobs and store them in the container, and you can create tables and put entities into those tables. You can also create queues and store messages in those queues.
Containers. A container provides a grouping of all blobs. A storage account could contain more than one container and each container can contain multiple blobs.
Blob. A blob can contain text data or binary data, such as a document, a media file, or an application installer. Microsoft Azure Storage offers three types of blobs:
- Block blob. As it goes, a block blob comprises of blocks, each of which is identified by a block ID. Basically, you create/modify a block blob by writing a set of blocks and committing them by their block IDs. Each block can be of different size with maximum of 4MB. The maximum size of the block blob is 200GB, with a single block containing 50,000 blocks. For writing a block blob no more than 64MB in size, you can upload it in a single write operation. If a block exceeds the size specified within the storage clients, it is broken into smaller chunks.
- Page blob. This is a collection of 512 bytes page optimized for random read and write operations. You need to initialize the page blob and set the maximum size the page blob will grow. For adding/modifying the contents within a page blob, you have to specify an offset and the range that aligns to the 512-byte page boundary. The writes happen and a commit is issued immediately. The maximum size of a page blob is 1TB.
- Append blob. These are blocks that are specifically designed for append operations. They do not expose the block ID. When a append blob is modified, blocks are added at the end by the Append Block operation. You cannot update/delete existing blobs. A append blob can be a different size with maximum of 4MB and it can include 50,000 blobs.

Table Storage

Azure table storage contains a large amount of structured non-relational data. Figure 3-2 shows the following key components of a table storage:

Storage account. As discussed earlier, all access to Azure storage is done through the storage accounts.
Table. A collection of entities that has different sets of properties.
Entity. Like a row in a database, it is a set of properties that can be up to 1MB size.
Properties. A name-value pair that can contain around 252 properties to store data.

Figure 3-2. Windows table storage components

Note

Refer to https://azure.microsoft.com/en-in/documentation/articles/storage-dotnet-how-to-use-tables/ to learn how to access table storage using .NET.

Queue Storage

Queue storage is used to store large numbers of messages and can be accessed using HTTP/HTTPS. It is used to process data asynchronously and it helps in passing messages between the Azure web role and the Azure worker role.

Queues provide a reliable messaging for your application. They can be used to perform asynchronous tasks, such as a task from a web role that has to be sent to the worker role for processing asynchronously. This allows the web and worker role to scale independently (see Figure 3-3).

Figure 3-3. Windows queue storage concepts

Note

Refer to https://azure.microsoft.com/en-in/documentation/articles/storage-dotnet-how-to-use-queues/ to learn how to access queue storage using .NET.

File Storage

Many of us still use legacy applications and couldn’t function without them. These legacy applications used to use SMB file shares. With the Microsoft Azure file storage service, you get cloud-based SMB file shares, and it can help if you decide to migrate your legacy applications that rely on file shares to Azure. An on-premises application can access data in a file share using the file storage REST API. Common file storage uses include the following:

Migrating on-premises application that depend on file shares.
Storing shared application files such as config files.
Storing diagnostic data such as logs.
Storing tools and utilities.

If you look at Figure 3-4, you will see that the file service can be used to store the diagnostic logs and application config files. You can create directories and store them at your convenience.

Figure 3-4. File storage concepts

Note

For more information, refer to https://azure.microsoft.com/en-in/documentation/articles/storage-dotnet-how-to-use-files/ .

Design Decisions

Microsoft Azure storage was designed based on some key customer engagements:

Strong consistency. For Enterprise customers, consistency is very important in moving their line of business applications to the cloud. It is extremely important for them to perform conditional reads, writes, and deletes for optimistic concurrency control. Microsoft Azure storage thus provides three properties as per the CAP theorem—strong consistency, high availability, and partition tolerance.
Global and scalable Namespace storage. With Microsoft Azure Storage, you can store massive data and access it consistently from anywhere across the globe. Microsoft Azure leverages the DNS as a part of the namespace and breaks the namespace into three parts—a storage account name, a partition name, and an object name (http(s)://AccountName.<Service>.core.windows.net/PartitionName/ObjectName).
Disaster recovery. With Microsoft Azure Storage, the data is stored across multiple data centers that are globally dispersed. This is to ensure that customers’ data is protected at any cost against natural disasters like earthquakes, fires, storms, etc.
Scalable. The data needs to be scalable, it should be capable of scaling automatically and be load balanced based on the peak traffic demand.
Multi-tenancy. Many customers, depending on their need, could be served from the same shared storage, thus reducing the cost of storage.

The following are important characteristics of Premium Storage :

Durability. Premium Storage was built on the Locally Redundant Storage (LRS) technology, which stores the replica of data within the same region. This was to confirm durability of data for Enterprise workload. Writes will be confirmed back to the application only when they have been durably replicated by the LRS system.
New "DS" series VMs. The new DS series of Virtual Machines supports Premium Storage data disks. You can leverage a new sophisticated caching capability that enables extremely low latency for read operations.
Linux support. With Linux integration services 4.0, Microsoft have enabled support for even more Linux flavors. There are different distributions that have been validated with Microsoft Azure Premium Storage such as Ubuntu (Versions 12.04, 14.04, 14.10, and 15.04), SUSE 12, etc.

Azure Storage Architecture Internals

A storage stamp is a cluster of racks of storage nodes on Microsoft Azure Fabric where each rack is on a separate domain under redundant networking and power (see Figure 3-5).

Figure 3-5. Storage stamps architecture

All the reads and writes go to these clusters of storage. The target is to keep the storage stamp at about 70% utilized in terms of capacity, transactions, and bandwidth so that better seek time is gained for higher throughput. This provides better resiliency against rack failures within a stamp.

Location Service manages all storage stamps. It does things like account load balancing and account allocation, and it handles geo-replication across these stamps. The location service itself is distributed across two geographic locations for its own disaster recovery.

Note

For more information on Microsoft Azure Storage, we recommend the following sites:

http://sigops.org/sosp/sosp11/current/2011-Cascais/printable/11-calder.pdf

https://blogs.msdn.microsoft.com/windowsazurestorage/2011/11/20/sosp-paper-windows-azure-storage-a-highly-available-cloud-storage-service-with-strong-consistency/

Replication Engine

There are two replication engines: intra-stamp replication (stream layer) and inter-stamp replication (partition layer).

Intra-stamp replication provides synchronous replication and ensures all data is durable within the stamp. It keeps different enough copies of data across different fault domains in advent of disk, node, or rack failures. This is done by the partition layer and is in the path of critical path of customer write request. A success is only returned to a client once the data is replicated within an intra-stamp.

Inter-stamp replication provides asynchronous replication by lazily replicating data across the stamps in the background. This is an object-level replication where either whole objects are replicated or the recent changes are. This replication is used to keep data in two locations for disaster recovery, migrating an account’s data between stamps. It provides geo-redundancy against geographical disasters by using optimal use of network bandwidth across the stamps.

Layers Within a Storage Stamp

There are three layers within a storage stamp (see Figure 3-6):

Stream layer/Distributed File System (DFS)layer.This is the lowest layer and is responsible for handling the disk and storing data to the disk. This means that it stores the bits on the disk and replicates the data across many servers to keep data durable within a storage stamp. Think of this as a distributed file system layer within a storage stamp. Data is stored into the files, called extents, and they are replicated three times within a region between upgrade domain (UD) and fault domain(FD). This is an append-only filesystem, so data is never overwritten. Data gets appended to the end of the extent.
Partition layer. This serves two unique purposes. This layer understands the data abstractions. It understands what a blob is, what a table entity is, what a message is, and how to perform transactions against those objects. Thus it ensures transaction ordering and optimistic concurrency, storing object data on top of stream layer and caching object data to reduce the disk I/O. This layer is also responsible for partitioning all objects within a stamp. This layer is also responsible for a massively scalable index, this index is used to index all blobs, table entities, and queues. There is a master that’s responsible for taking this index and breaking it into range partitions based on the load to the index. These objects are broken into disjointed ranges based on partition name values and served from different partition servers (i.e., it manages which partition server serves which partition range for blobs, tables, and queues). This is done to load-balance the TPS traffic to the big index.
Front-end layer . Provides a REST protocol for blobs, tables, and queues. It is used for authentication, authorization, and for logging and gathering metrics.

Figure 3-6. Dynamic load balancing in the partition layer

You may wonder how this structured storage system is provided in an append-only filesystem, i.e., how only updating blobs/tables is allowed when the filesystem is append only?

At a high level, the data abstractions are treated at the partition layer as logs (streams). Any update to the data is appended to the log, which means that they are appended to the last extent of the logs. Think of this as a list link of extents. Thus, the append happens only to the last extent. All prior extents are sealed and are never appended to those extents. The data is then committed and successfully returned back to the client. In parallel, all the recent updates are stored in memory and in the background, so it would lazily check those off the critical write path. These are then merged to a B-Plus tree. Therefore, there are the logs that are used to commit the updates, and then the checkpoint, and the tree that’s used to find the last version of your data.

Maintaining Availability/Consistency for Read Requests

Since all replicas that are maintained bitwise are identical, this implies that they can be read from any replica.

For read availability, parallel read requests are sent out and the first request that comes back is taken and the data is provided to the customers. Now during processing of the read requests if high latency is noticed, in parallel, another read request is issued for it to be processed. The one that returns first will be sent to the client.

Load Balancing of Partition Layer

By load balancing an attempt is made to balance the TPS, i.e., load balance the index. In Figure 3-6, the master continuously monitors the loads to the partition servers as well as to each partition. If it finds one of the partitions to be too hot, which means that this partition is getting far too many requests to be processed, the master can decide quickly to take this range partition and split it into smaller range partitions (one of the ways to deal with such issues).

Once the partition is moved to the different partition server that’s comparatively less loaded, it updates the partition map. The front-end can then forward the requests pertaining to the specific range to this new server, which eventually balances the transactions per second. An important part to note is that no data moves (in the DFS layer) during this whole process; it just takes the partition range index and updates it so that the the partition map knows which partition server to forward the requests to.

Load Balancing of the DFS Layer

For load balancing at the DFS layer, the load is monitored for storage nodes, which is used to determine which replica the data will be read from. In parallel, there is a process to perform the parallel reads in background based on 95% latency.

For writing, the load is monitored for the nodes and the replicas that is being appended to. When some node appears overloaded, it seals the replica and starts appending to a new extent. That’s placed at the end of the log.

Load Balancing of DFS Capacity

The replicas are moved around to ensure that all nodes have disks with some free space to be used. The DFS layer is an append only system, which means it would never write anything in place. It is important to have some free space where it would always append. This way hot spots are avoided in storage nodes. In addition, if there is a bad node/disk or if a rack is lost, they have to ensure a faster way to replicate the extents across all nodes/disks. With available disk space, this can be done quickly.

Durability Offerings with Azure Storage

There are three types of durability offered with Azure Storage.

LRS (Local Redundant Storage).Stores three replicas of data within a single zone in a single region. All the three replicas will be in the same zone, thus providing durability despite node/rack/disk failure.
ZRS (Zone Redundant Storage). Available for block blobs and stores three replicas of data across multiple zones and is designed to keep all three replicas in the same region, but not mandatory as sometimes you might just store it in different region as well. In addition to this, it provides durability over LRS as now the data is durable against zone-related failures (such as a fire in a facility in a zone).
GRS (Globally Redundant Storage). You store six replicas, three in the primary region and three in the secondary region, geographically dispersed. This provides additional durability to protect data against major regional catastrophic disaster(s), such as storms, tornados, earthquakes, or hurricanes.

Azure Premium Storage

Azure Premium Storage was introduced by Microsoft. It helps deliver high-performance, low-latency disk support for virtual machines running I/O-intensive workloads. It uses SSD (solid state drives) to store the data. If your workload needs high throughput and you want to take advantage of the speed and performance of these disks, Premium Storage should be your choice.

For Azure Virtual Machine workloads that need consistent high IO performance and low latency, Premium Storage is appropriate. In order to host IO intensive workloads like OLTP, Big Data, and Data Warehousing on platforms like SQL Server, MongoDB, Cassandra, and others, Premium Storage is a good choice.

With Premium Storage, your application can store up to 64TB of data per VM and can give you throughput of 80Kbps IOPs (Input Output Operations per Second) per VM and 2000Mbps disk throughput per VM.

There are quite a few points you need to keep in mind in order to use Premium Storage :

You need a Premium Storage account. Premium Storage accounts can be created using storage REST APIs version 2014-02-14 or later (Storage and Service Management), PowerShell 8.10 or later, or the Ibiza portal ( https://portal.azure.com ). When you create a Premium Storage account using PowerShell, you need to specify the type parameter as Premium_LRS. For example: New-AzureStorageAccount -StorageAccountName "Testpremiumaccount" - Location "East US" -Type "Premium_LRS".
Not all regions support Premium Storage. Currently, the Central US, East and West US, Northern and Western Europe, East and West Japan, Southeast Asia, and Eastern Australia support it.
It only supports Azure page blobs, as it is used to hold persistent disks that can be used for Azure Virtual Machines.
It is locally redundant and keeps three copies of data in same region.
In order to use Premium Storage, you need to provision either GS-series or DS-series of VMs.
For all Premium data disks, the default disk caching policy is read-only and Premium operating system disks attached to the VM are set to read-write.
For Premium Storage accounts, the IOPs depends on the size of the disk. At present there are three types of premium disks—P10, P20, and P30. See Table 3-1 for IOPs and throughput specifications.
Table 3-1. Premium Disk Storage limits
Disk Type
P10
P20
P30
Disk size
128GB
512GB
1TB
IOPS (per disk)
500
2300
5000
Throughput (per disk)
100Mbps
150Mbps
200Mbps

Disk Type	P10	P20	P30
Disk size	128GB	512GB	1TB
IOPS (per disk)	500	2300	5000
Throughput (per disk)	100Mbps	150Mbps	200Mbps

Using PowerShell, you can create a VM using Premium Storage. The following code snippets/cmdlets can be used to try this out. Note they are just cmdlets and should be used as such. There are MSDN links that you can refer to as well; see https://msdn.microsoft.com/en-us/library/mt607148.aspx .

Creating an Azure VM Machine Using Premium Storage and PowerShell

Create the following Premium Storage account :

C:> New-AzureRmStorageAccount -ResourceGroupName "TestResourceGroup" -AccountName "TestStorageAccount" -Location "US East" -Type "Premium_LRS"

The following code only uses ARM cmdlet; it can be used to create a VM:

# Set values for existing resource group and storage account names
$rgName="RGServers"
$locName="East US"
$saName="Testserverssa"

# Set the existing virtual network and subnet index
$vnetName="XYZ"
$subnetIndex=0
$vnet=Get-AzureRmVirtualNetwork -Name $vnetName -ResourceGroupName $rgName

# Create the NIC
$nicName="Test-NIC"
$domName="TestDom"
$pip=New-AzureRmPublicIpAddress -Name $nicName -ResourceGroupName $rgName -DomainNameLabel $domName -Location $locName -AllocationMethod Dynamic
$nic=New-AzureRmNetworkInterface -Name $nicName -ResourceGroupName $rgName -Location $locName -SubnetId $vnet.Subnets[$subnetIndex].Id -PublicIpAddressId $pip.Id

# Specify the name, size, and existing availability set
$vmName="TestVM"
$vmSize="Standard_A3"
$avName="Test_AS"
$avSet=Get-AzureRmAvailabilitySet –Name $avName –ResourceGroupName $rgName
$vm=New-AzureRmVMConfig -VMName $vmName -VMSize $vmSize -AvailabilitySetId $avset.Id

# Add 200GB data disk
$diskSize=200
$diskLabel="TestStorage"
$diskName="Test-DISK01"
$storageAcc=Get-AzureRmStorageAccount -ResourceGroupName $rgName -Name $saName
$vhdURI=$storageAcc.PrimaryEndpoints.Blob.ToString() + "vhds/" + $vmName + $diskName  + ".vhd"
Add-AzureRmVMDataDisk -VM $vm -Name $diskLabel -DiskSizeInGB $diskSize -VhdUri $vhdURI -CreateOption empty

# Specify the image and local administrator account, and then add the NIC
$pubName="MicrosoftWindowsServer"
$offerName="WindowsServer"
$skuName="2012-R2-Datacenter"
$cred=Get-Credential -Message "Type the name and password of the local administrator account."
$vm=Set-AzureRmVMOperatingSystem -VM $vm -Windows -ComputerName $vmName -Credential $cred -ProvisionVMAgent -EnableAutoUpdate
$vm=Set-AzureRmVMSourceImage -VM $vm -PublisherName $pubName -Offer $offerName -Skus $skuName -Version "latest"
$vm=Add-AzureRmVMNetworkInterface -VM $vm -Id $nic.Id

# Specify the OS disk name and create the VM
$diskName="OSDisk"
$storageAcc=Get-AzureRmStorageAccount -ResourceGroupName $rgName -Name $saName
$osDiskUri=$storageAcc.PrimaryEndpoints.Blob.ToString() + "vhds/" + $vmName + $diskName  + ".vhd"
$vm=Set-AzureRmVMOSDisk -VM $vm -Name $diskName -VhdUri $osDiskUri -CreateOption fromImage
New-AzureRmVM -ResourceGroupName $rgName -Location $locName -VM $vm
Ref: https://azure.microsoft.com/en-in/documentation/articles/virtual-machines-windows-create-powershell/

Inside Premium Storage

Premium Storage disks are implemented as page blobs in Azure storage. Every uncached write operation replicates to the SSDs as three servers in different racks (fault domains). The data in Premium Storage is stored on SSD drives, thereby providing higher throughput. The disk used are different than the ones you would get using Standard Storage. There is a component called blob cache that runs on the servers, hosting these VMs and taking advantage of the RAM/SSDs of the servers to implement higher throughput and low latency. It is enabled for both premium and standard disks and is configured with read-only and read-write caching. For read-only caching, it synchronously writes data to the cache and to Azure Storage. For write-caching enabled disks, it would write data back to Azure Storage when a VM requests it. This is done through disk flush or by specifying the write through flags on an I/O (forced unit access).

The VMs can take advantage of this caching architecture and can provide extremely high throughput and low latency, thereby boosting performance tremendously.

Azure Storage Best Practices

It is important for all developers to build their applications using best practices. Experience from customer engagement has led us to some key lessons while dealing with Azure storage and getting the maximum performance out of it.

Performance Enhancement Using Blobs

Performance is important when you deal with any form of application, let alone blobs. This section looks at a few of the important points to keep in mind when you’re working with blobs. When you store files in blobs and you read and write to and from blobs, there are few key aspects to keep in mind.

A single blob can be read and written at up to a maximum of 60Mbps and a single blob supports up to 500 requests per second. If you have multiple clients that need to read the same blob and you might exceed these limits, you should consider using a CDN for distributing the blob.

Try to match your read size with your write size and avoid reading small ranges from blobs with large blocks. The following properties are used to control read and write size: CloudBlockBlob.StreamMinimumReadSizeInBytes and StreamWriteSizeInBytes.

You can upload folder contents by uploading multiple files in parallel or by uploading concurrently. Uploading concurrently means multiple workers upload different blobs. Uploading in parallel means multiple workers upload different blocks of the same blob.

Uploading multiple blobs concurrently will execute faster than uploading multiple blob blocks to the same blob. This is because uploading multiple blob blocks to a single blob in parallel affects a single partition and will be limited by the partition performance targets. Uploading multiple blobs in parallel will work on many different partitions and will probably be limited by the Virtual Machines’ bandwidth.

Copy/Move Blobs

You can use storage REST APIs to copy blobs across storage accounts. A client application can instruct the storage service to copy a blob from any other storage account. The copy can happen asynchronously, which reduces the bandwidth. However, you need to be careful with asynchronous processes, as there is no completion time guarantee. It is better to download the blob to the VM and then copy it to the destination. During copying, you need to ensure that the VM is in the same region.

AzCopy

This utility released by the storage team can be used to transfer the blobs to and from the storage accounts. The transfer rate that you get is very high and it is recommended you use this for bulk uploads, downloads, and other copy scenarios. (See https://azure.microsoft.com/en-us/documentation/articles/storage-use-azcopy/ for more information.)

Choosing the Right Kind of Blob

Azure Storage supports two types of blobs— page blobs and block blobs . You need to choose the right kind of blob based on the use-case scenario as this can largely impact the performance and scalability. Block blobs are appropriate when you want to upload large amounts of data such as to upload photos or video to blob storage. Page blobs are appropriate if the application needs to perform random writes on the data.

Use the following URL as a checklist when you’re working with Microsoft Azure Storage. Application developers can follow the article, which is comprehensive with important practices documented. It will help in boosting performance of the application. See https://azure.microsoft.com/en-us/documentation/articles/storage-performance-checklist/ .

Performance Enhancement Using Tables

The following list provides a few tips for working with tables , which is another service that can be used effectively in different scenarios, as discussed in this chapter.

Scalability. The system usually performs load balancing as your traffic increases, but if your traffic has sudden bursts, you may not be able to get this volume of throughput immediately. If your pattern has bursts, you should expect to see throttling and/or timeouts during the burst as the storage service automatically load balances your table. The recommendation is to slowly ramp up, which gives the system time to load balance appropriately.
JSON (Java Script Object Notation). A popular and concise format for REST protocol. OData supports AtomPub and JSON. A drawback of Atom is its verbose nature, which you may not need when you are writing a table service. The table service supports JSON instead of the XML-based AtomPub format for transferring table data. This can reduce payload sizes by as much as 75% and can significantly improve the performance of your application.
Naggle off. Nagle’s algorithm is very popular and is implemented across TCP/IP networks to improve network performance. This may not give you optimal performance specifically in some very high interactive systems. For Azure Storage, Nagle’s algorithm has a negative impact on the performance of requests to the table and queue services, and you should disable it if possible.
Tables and partitions. It is very important how you represent your data, as that has a huge impact on the performance of the table service. Tables are divided into partitions. Every entity stored in a partition shares the same partition key and has a unique row key to identify it within that partition. The benefit is that you can update the entities in a single transaction, up to 100 separate storage operations. You can also query data within a single partition more efficiently than data that spans partitions. Partition supports atomic batch transactions and thus the access to entities stored in a single partition cannot be load balanced. Thus as a developer you should use the following techniques:
- Data that your client application frequently updates or queries should be in the same partition. This may be because your application is aggregating writes, or because you want to take advantage of atomic batch operations.
- Data that your client application does not frequently update/query in the same atomic transaction should be in separate partitions.
- Avoid hot partitions, which are partitions that receive more data as compared to the other partitions. If the partitioning scheme results in a single partition that has data that’s used far more often than in the other partitions, expect to see throttling. It is likely that the partition will approach the scalability target. It is better to make sure that your partition scheme results in no single partition approaching the scalability target.

Once you store data in the Microsoft Azure Storage Services, it is important that you understand the best practices that can help you retrieve your data. The next section covers these important practices.

Querying Data Best Practices

Once you store data in Microsoft Azure Storage Services, it is important that you understand the best practices that can help you retrieve the data.

As a best practice for querying data , a general rule of thumb is to avoid scans. Now if you have to do scans, you should organize the data so that the unnecessary data scans can be avoided.

Point queries. Try to use these types of queries as much as possible as they return only one entity by specifying the partition key and the row key.
Partition query. These queries are less performant than point queries and should be used carefully. They retrieve sets of data that share common partition keys and typically you specify a range of row key values or a range of values for some entity property in addition to a partition key.
Table queries. This is a query that retrieves a set of entities that do not share a common partition key and are not efficient and should be avoided as much as possible.
Query density. One of the important factors in efficiency of a query performance is the number of entities returned and the number of entities scanned. A low query density can cause the table service to throttle your application and so should be avoided. The following are some ways that you can avoid them:
- Filter the data so that the query returns only data that your application will consume. The performance of your application will increase due to less network payload and fewer entities that your application must process.
- Use projection to limit the size of returned data set that your client application needs.
- Remove redundant data (denormalize). This has always been helpful as it minimizes the number of entities that a query must scan to find the data the client needs, rather than having to scan large numbers of entities to find the data your application needs.

Summary

Microsoft Azure Storage services has all that an application developer needs to provide a robust, scalable cloud-based solution. With its many useful features, it provides the best platform to host your application and store the data securely. You need not think about the DR strategy, as it’s all taken care of. Now with the Premium Storage offering, the throughput increases and you get a tremendous boost on application performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. Microsoft Azure Storage

Create new playlist

Sign In

Sign Up

3. Microsoft Azure Storage

Azure Storage Service

Figure 3-1. Microsoft Azure Data Storage concepts and REST protocols to access blobs tables and queues

Blob Storage

Table Storage

Figure 3-2. Windows table storage components

Note

Queue Storage

Figure 3-3. Windows queue storage concepts

Note

File Storage

Figure 3-4. File storage concepts

Note

Design Decisions

Azure Storage Architecture Internals

Figure 3-5. Storage stamps architecture

Note

Replication Engine

Layers Within a Storage Stamp

Figure 3-6. Dynamic load balancing in the partition layer

Maintaining Availability/Consistency for Read Requests

Load Balancing of Partition Layer

Load Balancing of the DFS Layer

Load Balancing of DFS Capacity

Durability Offerings with Azure Storage

Azure Premium Storage

Table 3-1. Premium Disk Storage limits

Creating an Azure VM Machine Using Premium Storage and PowerShell

Inside Premium Storage

Azure Storage Best Practices

Performance Enhancement Using Blobs

Copy/Move Blobs

AzCopy

Choosing the Right Kind of Blob

Performance Enhancement Using Tables

Querying Data Best Practices

Summary

Table of Contents for
3. Microsoft Azure Storage