Chapter 1. Architecture

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Preface

Next Chapter

Chapter 2. Planning

Architecture

This section offers a high-level description of Data Reduction Pools and deduplication architecture.

1.1 What Data Reduction Pools are

Data Reduction Pools (DRP) represent a significant enhancement to the storage pool concept. This is because the virtualization layer is primarily a simple layer that runs the task of lookups between virtual and physical extents. Now with the introduction of data reduction technology, compression, and deduplication, it has become more of a requirement to have an uncomplicated way to stay thin.

Data Reduction Pools increase existing infrastructure capacity usage by employing new efficiency functions and reducing storage costs. The pools enable you to automatically de-allocate (not to be confused with deduplicate) and reclaim capacity of thin-provisioned volumes containing deleted data. In addition, for the first time, the pools enable this reclaimed capacity to be reused by other volumes.

With a new log-structured pool implementation, Data Reduction Pools help deliver more consistent performance from compressed volumes. Data Reduction Pools also support compression of all volumes in a system, potentially extending the benefits of compression to all data in a system.

Traditional storage pools have a fixed allocation unit of an extent, and that itself does not change with Data Reduction Pools. However, features like Thin Provisioning and IBM Real-time Compression™ (RtC) use smaller allocation units and manage this allocation with their own metadata structures. These are described as the Binary Trees or Log Structured Arrays (LSA).

In order to “stay thin”, you need to be able to reclaim capacity that is no longer used, or in the case of an LSA (where all writes go to new capacity) garbage collect the old overwritten data blocks. This also needs to be done at the smaller allocation unit size (KB) per extents.

Figure 1-1 shows the DRP mirroring structure.

Figure 1-1 New Data Reduction Pool volume mirroring structure

Note: Use volume mirroring to clone data to a new DRP, because DRP does not support migrate commands.

1.1.1 DRP Volume types

DRP technology enables you to create five types of volumes:

•Fully allocated

This type provides no storage efficiency, but the best performance, and is available for migration.

•Thin

This type provides storage efficiency, but no compression or deduplication.

•Thin and Compressed

This type provides storage efficiency with compression, and this combination provides the best performance numbers.

•Thin and Deduplication

This type provides storage efficiency, but without compression.

•Thin, Compressed, and Deduplication

This type provides storage efficiency with maximum capacity savings.

With storage efficiency, DRP thin and compressed volumes provide the best performance numbers. This is due to the new compression implementation. This implementation provides better load balancing and consistent performance when compared to the RACE implementation. This feature is also the second-best performer to fully allocated volumes, followed by thin, compressed, and deduplication volumes when it comes to storage efficiency.

Figure 1-2 shows the types of volumes in the DRP pools.

Figure 1-2 Volume types

There are four main characteristics that make up the IBM Data Reduction Pool design:

•Fine-grained allocation of data blocks

•The ability to free back unused (unmapped, or overwritten) capacity at a fine grain

•Give consistent, predictable performance

•Optimize performance for solid-state storage, such as Flash

A Data Reduction Pool, at its core, uses a log-structured array (LSA) to allocate capacity. Therefore, the volume that you create from the pool to present to a host application consists of a directory that stores the allocation of blocks within the capacity of the pool.

All writes for Data Reduction Pools take place at the upper cache layer to the host. Reads have to go through the lower cache layer. The heart of the new DRP functionality is in the new implementation of the Log Structured Array. This includes lower cache, virtualization, IBM Easy Tier®, and RAID. LSA understands what works best for each of these components.

A log structured array allows a “tree” like directory to be used to define the physical placement of data blocks independent of size and logical location. Each logical block device has a range of Logical Block Addresses (LBAs), starting from 0 and ending with the block address that fills the capacity. When written, an LSA enables you to allocate data sequentially and provide a directory that provides a lookup to match the LBA with the physical address within the array.

Note: LSA always appends new data to the end of the array. When data is overwritten, the old location and capacity utilized needs to be marked as free. UNMAP functions can also request that you “free” no longer needed capacity. Compression overwrites can result in a different capacity being used. Deduplication might find new duplicates when data is re-written.

Figure 1-3 shows an IBM Spectrum Virtualize I/O stack structure.

Figure 1-3 Data Reduction Pools/LSA is located in the IBM Spectrum Virtualize I/O stack

1.1.2 What is in a Data Reduction Pool

As shown in Figure 1-4, the user sees a sample of four volumes in a Data Reduction Pool. Internally, there are four directory volumes, one journal volume (per I/O group), one customer data volume (per I/O group), and one reverse lookup volume (per I/O group).

Figure 1-4 shows the view of DRP.

Figure 1-4 Both the front-end and back-end view of DRP

Each Internal volume type has very specific I/O patterns with its own percentage used of the total capacity of the pool, as shown in Table 1-1.

Table 1-1 I/O Patterns per Internal Volumes

Customer Data volumes	Directory volumes	Journal volumes	Reverse Lookup
98% of Pool Capacity	1% of Pool Capacity	Less than 1% of Pool Capacity	Less than 1% of Pool Capacity
Large sequential write pattern and short random read pattern	Short 4 KB random read and write pattern	Large sequential write I/O and only read for recovery scenarios (T3 and so on)	Short, semi-random read/write pattern

1.1.3 Allocation block size

The allocation size of these blocks is now 8 KB. Previously, thin provisioned volumes used 32 KB and RACE Compression write of 32 KB of compressed data. Here are some key reasons behind the 8 KB allocation:

•UNMAP requests as small as 8 KB can be catered for.

•The addressability of data in the pool is at an 8 KB (uncompressed) boundary, compared to 32 KB compressed with previous RACE compression.

•All random read requests are of 8 KB size (or less if compressed), which is ideal for Flash Storage.

•With a common metadata access size served by lower cache, performance is much more consistent.

Figure 1-5 shows the DRP Compression I/O Amplification, where K is kilobytes.

Figure 1-5 I/O Amplification of space allocated (assumes 50% Compression rate)

Note: Writes to already allocated space will drive the need for Garbage Collection (GC). The cost of GC depends on the amount of valid data in the extent that has not been over-written.

1.2 RACE versus DRP

RACE uses variable input/fixed output, intermittently having to wait or pause to see if more I/O is coming for a volume, as shown in Figure 1-6 on page 7. The RACE minimum block size to read from the backend is 32 KB, and RACE pushes at least 4 - 8 times more data through decompress hardware than DRP for true random workload. In contrast, DRP Compression uses fixed input/variable output.

In Figure 1-7 on page 7, the DRP maximum block size to read from the backend is 8 KB (typically 4 KB or less though). With the use of 8 KB input sizes, there is a small loss in compression ratio but a gain in lower latency when all workload is put into the same predictable small block size. Host I/O that is put in these small grain block size in DRP enables the following functionality:

•Provide fine-grained allocation of block data

•Free unused capacity at a fine grain

•Give consistent, predictable performance

•Optimize performance for Flash storage

Figure 1-6 RACE compression I/O stack

Figure 1-7 shows the DRP compression I/O stack.

Figure 1-7 DRP compression I/O stack

Here are some of the key differences in DRP when compared to RACE:

•CPU

– Data reduction uses the same threads as the main I/O process

– No separate compression CPU utilization

– No dedicated CPU cores for compression

•Memory

– Data reduction shares memory with the main I/O process

– 1 GB memory taken from cache when data reduction is enabled

•Compression hardware

– Shared with existing RtC compression and compression for IP replication

– New DRP compression achieves up to 4.8 GBps per node (compression card limit)

1.2.1 Benefits

There are many advantages to Data Reduction Pools, including:

•Designed to be highly scalable to support hardware with more cores and more memory

•Tightly integrated compression shares available cores with other processes for greater efficiency

•Optimization for flash storage through conversion of random write I/Os into larger sequential writes

•No limit on the number of compressed volumes enables greater use of compression (up to 5x as many volumes) and so more compression benefit and reduced storage cost

•Up to 3x better throughput for compressed data, enabling its use with a wider range of data types.

•Ability to release and reuse storage in response to server needs, reducing overall storage required

•Designed for future data reduction technologies

•Separation of metadata and user data improves cache effectiveness

•Compression integrated within I/O stack

•Shared resource design

•Active/Active: Mirrored non-volatile metadata means significantly improved failover/failback response times due to no revalidation of metadata

•No limit on the number of compressed volumes

•Space reclamation: Unmap

•Designed for deduplication

•Smaller 8 KB chunk means less compression bandwidth for small I/Os

•Metadata and user data separated, better use of cache prefetch/destage

•On average 1.8 x I/O amplification on host I/O: Much more predictable latency compared to RACE

•Able to use maximum compression bandwidth

•Comprestimator support

Table 1-2 shows DRP disk limitations.

Table 1-2 DRP Disk Limitations

Extent Size	Volume size	4 I/O groups
1 GB	128 TB	512 TB
2 GB	256 TB	001 PB
4 GB	512 TB	002 PB
8 GB	001 PB	004 PB

1.3 Data Reduction Pools and unmap

DRPs support end-to-end unmap functionality. Space that is freed from the hosts is a process called unmap. A host can issue a small file unmap (or a large chunk of unmap space if you are deleting a volume that is part of a data store on a host), and either of these results in the freeing of all the capacity allocated within that unmap. Similarly, deleting a volume at the DRP level frees all of the capacity back to the pool.

When a Data Reduction Pool is created, the system monitors the pool for reclaimable capacity from host unmap operations. This capacity can be reclaimed by the system and redistributed into the pool. Create volumes that use thin provisioning or compression within the Data Reduction Pool to maximize space within the pool.

1.4 Data Reduction Pools with Easy Tier

A DRP uses an LSA as mentioned above. RACE Compression has used a form of LSA since its introduction in 2011, and this means that there is normal garbage collection that needs to be done all the time. An LSA always appends new writes to the end of the allocated space, even if data already exists, and the write is an over-write, the new data is not written in place. Rather, the new write is appended at the end and the old data is marked as needing garbage collected. This itself allows for the following functionality:

•Writes to a DRP volume are always sequential, so we can build all of the 8 KB chunks into a larger 256 KB chunk and destage the writes from cache, either as full stride writes, or as large 256 KB sequential stream of writes.

•This should give the best performance both in terms of RAID on backend systems, and on Flash, where it is easier for the Flash device to also garbage collect on a larger boundary.

•We can start to record metadata about how frequently certain areas of a volume are over-written.

We can then bin sort the chunks into a heat map in terms of rewrite activity, and then group commonly rewritten data onto a single extent. This is so that EasyTier will operate correctly for not only read data, but write data when data reduction is in use. Previously, writes to compressed volumes held lower value to the Easy Tier algorithms, because writes were always to a new extent, so the previous heat was lost.

Now, we can maintain the heat over time and ensure that frequently rewritten data gets grouped together. This also aids the garbage collection at the virtualize level and also on Flash, where it is likely that large contiguous areas will end up being garbage collected together.

1.5 Garbage collection

DRP has built in services to enable garbage collection (GC) of unused blocks. This means that lots of smaller unmaps end up enabling a much larger chunk (extent) to be freed back to the pool. If the storage behind IBM San Volume Controller (SVC) supports unmap, we pass an unmap command to the backend storage (again equally important with today’s Flash backend systems) especially so when they themselves implement some form of data reduction.

Trying to fill small holes is very inefficient: too many I/Os would be needed to keep reading and rewriting the directory. So, GC waits until an extent has many small holes. Move the remaining data in the extent, compact, and rewrite. When we have an empty extent, it can be freed back to the virtualization layer (and back end with UNMAP) or start writing into the extent with new data (or rewrites).

The reverse lookup metadata volume tracks the extent usage, or more importantly the holes created by overwrites or unmaps. GC looks for extents with the most unused space. After a whole extent has had all valid data moved elsewhere, it can be freed back to the set of unused extents in that pool, or it can be reused for new written data.

1.6 Data Reduction Pools with deduplication

Deduplication can be configured with thin-provisioned and compressed volumes in Data Reduction Pools for added capacity savings. The deduplication process identifies unique chunks of data, or byte patterns, and stores a signature of the chunk for reference when writing new data chunks.

If the new chunk’s signature matches an existing signature, the new chunk is replaced with a small reference that points to the stored chunk. The same byte pattern might occur many times resulting in the amount of data that must be stored being greatly reduced.

Duplicate matches are found using SHA1 hashes created for each 8 KB align region of client data to a deduplicated copy. The matches are detected when the data is written. For Data Reduction Pools, deduplication data can work in two separate ways. It can be grouped into 256 KB blocks and written to storage, or can be passed as 8 KB chunks, and compressed first.

Deduplication has specific I/O characteristics in the handling of data and data copies. When a matching fingerprint is found, the metadata is updated to point to the metadata of the existing copy of the data. Each copy of the data can have up to 255 8 KiB virtual chunks referring to it. Each virtual 8 KiB chunk can track up to 3 versions of data. I/O performance takes precedence over finding duplicate copies of data. Host I/Os smaller than 8 KiB do not attempt to find duplicates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 1. Architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 1. Architecture