Deduplication / Compression with VMware vSphere 5.1
This chapter provides information about Advanced Single Instance Storage (A-SIS) deduplication and the benefits of enabling it. It also guides you step-by-step on how to set it up for a VMware vSphere 5.1 environment. It includes the following topics:
13.1 A-SIS deduplication overview
N series deduplication is a technology that can reduce the physical storage required to store data. Any typical data that might be stored in a disk volume has a certain amount of redundancy. It occurs in the form of identical data strings written to the volume multiple times. At a high level, the N series system can reduce the storage cost of this data. It does so by examining it and eliminating the inherent redundancies, as shown in Figure 13-1.
Figure 13-1 A-SIS savings
N series deduplication is managed at the volume level. Individual volumes can be configured to take advantage of deduplication, depending on the nature of the data in the volume. N series deduplication operates at the block level, which gives it a high level of efficiency. During the deduplication process, fingerprints of individual blocks within a volume are compared to each other. When duplicate blocks are found, the system updates pointer files within the file system to reference to one of the duplicate blocks. The others are deleted to reclaim free space.
The deduplication process does not occur at the time the data is written, thus the performance impact of deduplication is low. It runs on a pre-determined schedule or can be started manually at any time. During times when the storage system is busy or is accepting new write operations, the only impact is the lightweight fingerprinting process. The total impact to performance of the system is imperceptible. The more I/O intensive deduplication process can then be scheduled to run during a period of low activity.
The amount of space savings using deduplication vary depending on the nature of the data being deduplicated. Results of anywhere between 10% and 90% space savings can be seen, but 50% or more is common.
13.2 Storage consumption on virtualized environments
Although any type of data can be effectively deduplicated by N series deduplication, the data on virtualized environment has several unique characteristics that make deduplication effective.
For example, when a virtual disk is created, a file equal to the size of the virtual disk is created in a datastore. This virtual disk file consumes space equal to its size regardless of how much data is stored in the virtual disk. Any allocated but unused space (sometimes called white space) is identical redundant space on the disk and a prime candidate for deduplication.
Another unique characteristic of that data is related to the way that virtual machines are created. A common deployment method is to create templates and then deploy new virtual machines by cloning the template. The result is virtual machines that have a high level of similarity in their data.
In a traditional deployment, each new virtual machine takes new storage. N series deduplication can help to reduce the amount of storage required to store the virtual machine images. When two or more virtual machines are stored in the same datastore, any common data between them can be duplicated. (The common data includes operating system binary files, application binary files, and free space.) In some cases, that data can be deduplicated down to the equivalent of a single copy it.
13.3 When to run deduplication
As mentioned previously, the N series deduplication process does not occur at the time that the data is written to the storage device. However, it can be run any time the administrator desires after the data was written. The deduplication process can be resource-intensive, and it is a best practice to run it during a period of low activity.
The options to start the deduplication process are flexible, as it can be started automatically on a fixed schedule, after a defined amount of new data was written to the volume (20% by default) or manually started by the administrator at any time.
Consider to run the deduplication process manually when a significant amount of data must be deduplicated, for instance, after provisioning new virtual machines.
13.4 The effect of snapshots in deduplicated volumes
Although snapshots can be used in deduplicated volumes, you must take note of one operational difference. The deduplication process can identify and deduplicate redundant blocks that are in a snapshot. However, the block reclamation process cannot return blocks to free space while the snapshots exist. Because of this behavior, you might experience lower than expected space savings when deduplicating data in a volume that has snapshots.
When all of the snapshots that were taken before the deduplication process are deleted, the deduplicated blocks are reclaimed as free space. As a result, you might want to deduplicate new data before any snapshots are taken.
13.5 Enabling deduplication on a volume
This section explains how to set up deduplication on an N series for use with VMware hosts. It also provides information about storage reduction after enabling it for Network File System (NFS) and iSCSI volumes.
13.5.1 Setting up deduplication on a volume
In this section, you go step-by-step through the process to set up deduplication. This scenario is based on the creation of five identical guests of 10 GB each on the NFS and iSCSI. The size for the iSCSI LUN and the NFS share is 120 GB each.
The deduplication process
Figure 13-2 shows the original sizes of the NFS and VMFS datastores where the clones are running.
Figure 13-2 NFS size on the vCenter management console before deduplication
Example 13-1 shows the size of the NFS share as viewed on the N series command line.
Example 13-1 NFS size on the N series CLI
N6070A> df -g /vol/VMWare_NAS
Filesystem total used avail capacity Mounted on
/vol/VMWare_NAS/ 120GB 50GB 69GB 42% /vol/VMWare_NAS/
/vol/VMWare_NAS/.snapshot 0GB 0GB 0GB ---% /vol/VMWare
_NAS/.snapshot
Example 13-2 shows the size of the FCP LUN as viewed on the N series command line.
Example 13-2 LUN size on the N series CLI
N6070A> df -g /vol/LUNdedup
Filesystem total used avail capacity Mounted on
/vol/LUNdedup/ 120GB 50GB 69GB 42% /vol/LUNdedup/
/vol/LUNdedup/.snapshot 0GB 0GB 0GB ---% /vol/LUNdedup
/.snapshot
To enable deduplication on a volume, enter the sis on <vol_name> command as follows:
For an NFS volume, enter the command as shown in Example 13-3.
Example 13-3 Enabling deduplication
N6070A> sis on /vol/VMWare_NAS
SIS for "/vol/VMWare_NAS" is enabled.
Already existing data could be processed by running "sis start -s /vol/VMWare_NA
S".
For an FCP volume, follow these steps:
a. Set the fractional reserve to 0 (Example 13-4).
Example 13-4 Setting the fractional reserve
itsotuc3> vol options /vol/LUNdedup fractional_reserve 0
b. Enable deduplication on the FCP volume (Example 13-5).
Example 13-5 Enabling deduplication on the FCP volume
N6070A> sis on /vol/LUNdedup
SIS for "/vol/LUNdedup" is enabled.
Already existing data could be processed by running "sis start -s /vol/LUNdedup".
c. Check the status (Example 13-6).
Example 13-6 Checking the status
N6070A> sis status
Path State Status Progress
/vol/VMWare_NAS Enabled Idle Idle for 00:50:29
/vol/LUNdedup Enabled Idle Idle for 00:00:37
Deduplicating existing data
You can start the deduplication process at any time by using the sis start <vol> command. The default behavior of the command deduplicates only data that was written since deduplication was turned on for the volume.
To deduplicate data that was written before deduplication was enabled, proceed as follows:
To start the deduplication process, use the sis start -s <vol_name> command (Example 13-7).
Example 13-7 Starting the deduplication process
N6070A> sis start -s /vol/VMWare_NAS
The file system will be scanned to process existing data in /vol/VMWare_NAS.
This operation may initialize related existing metafiles.
Type y when asked to proceed with the scan, and the operation will start (Example 13-8).
Example 13-8 Accepting to proceed with deduplication
Are you sure you want to proceed (y/n)? y
The SIS operation for "/vol/VMWare_NAS" is started.
N6070A> Fri Nov 16 00:52:58 CET [N6070A:wafl.scan.start:info]: Starting SIS volu
me scan on volume VMWare_NAS.
Example 13-9 shows how to start the deduplication process on a SAN volume, typing y when asked to proceed:
Example 13-9 Starting the deduplication process on a SAN volume
N6070A> sis start -s /vol/LUNdedup
The file system will be scanned to process existing data in /vol/LUNdedup.
This operation may initialize related existing metafiles.
Are you sure you want to proceed (y/n)? y
The SIS operation for "/vol/LUNdedup" is started.
N6070A> Fri Nov 16 00:57:00 CET [N6070A:wafl.scan.start:info]: Starting SIS volu
me scan on volume LUNdedup.
13.5.2 Deduplication results
To check the progress of the deduplication process, use the sis status command, as shown in Example 13-10. If the status is active, the process of deduplication is still on going. If the status is idle, deduplication is completed.
Example 13-10 Checking status
N6070A> sis status
Path State Status Progress
/vol/VMWare_NAS Enabled Active 47 GB Scanned
/vol/LUNdedup Enabled Active 11 GB Scanned
You might see some intermediate results while the deduplication runs, as shown in Example 13-11. Just wait it finishes.
Example 13-11 Intermediate results
N6070A> sis status
Path State Status Progress
/vol/VMWare_NAS Enabled Active 5480 MB (11%) Done
/vol/LUNdedup Enabled Active 35 GB Scanned
When the process is completed, all volumes will be listed as Idle, as shown in Example 13-12.
Example 13-12 Process completed
N6070A> sis status
Path State Status Progress
/vol/VMWare_NAS Enabled Idle Idle for 00:07:38
/vol/LUNdedup Enabled Idle Idle for 00:11:42
The amount of saved space can be checked you can view the space savings from the vSphere client or on the storage controller. Use the df -gs command, as shown in Example 13-13.
Example 13-13 N series node
N6070A> df -gs /vol/VMWare_NAS
Filesystem used saved %saved
/vol/VMWare_NAS/ 2GB 47GB 94%
The space savings of NFS volumes are available immediately and can be observed from both the storage controller and vSphere client. The NFS example started with a total of 50 GB being used, which is reduced to 2 GB for a total savings of 91%.
The savings displayed on the N series node match what is shown on the ESXi management console. Figure 13-3 shows 117.02 GB of space is available on the NFS share.
Figure 13-3 Savings display
13.5.3 Deduplication of LUNs
Deduplication is effective on both NFS and LUNs. However, as a default behavior, a LUN on the N series storage system reserves its space in the volume where it resides. Deduplication cannot reduce this reservation. To reclaim the space savings, the LUN reservation must be disabled. This option is set on each LUN individually and can be set in the GUI or by using the lun set reservation command.
After deduplication is complete, you can use the free space gained to store new data, by either creating a LUN in the same volume and connect it as a new datastore, shrinking the existing volume and use the saved space to grow other volumes or creating new volumes.
To disable space reservation for the LUN, run the lun set reservation <lun_path> command (Example 13-14).
Example 13-14 Setting LUN reservation
N6070A> lun set reservation vol/LUNdedup/iSCSI_dedup disable
Now the storage savings can be seen as shown in Example 13-15.
Example 13-15 Storage savings displayed
N6070A> df -gs /vol/LUNdedup
Filesystem used saved %saved
/vol/LUNdedup/ 2GB 48GB 95%
 
Space allocation on the VMFS file system: Deduplication reduces the amount of physical storage that the LUN consumes on the storage device. However, it does not change the logical allocation of space within the VMFS file system. This situation is unlike an NFS datastore, where space savings are shown immediately and new data can be written to the datastore. For VMFS file systems, deduplication cannot change the total amount of space that can be stored in a VMFS datastore.
Unlike NFS, the FCP savings are not apparent when you verify the VMware vCenter management console, as seen in Figure 13-3 on page 216.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset