Planning the VMware vSphere storage system design
Planning carefully is essential to any new storage installation. And to choose the proper equipment and software with their best settings for your installation can be a challenge. Well-thought-out design and planning prior to the implementation can help you get the most of your investment in the present and protect it for the future, which can include throughput capability, size, and resources that are necessary to handle the volume of traffic allied with the required capacity.
This chapter provides guidelines needed to assist in the planning of storage systems for a VMware vSphere environment.
3.1 VMware vSphere ESXi Server storage structure: Disk virtualization
In addition to the disk virtualization that is offered by a SAN, VMware further abstracts the disk subsystem from the guest operating system (OS). It is important to understand this structure to make sense of the options for best practices when connecting VMware vSphere ESXi hosts to a SAN-attached subsystem.
3.1.1 Local storage
The disks that vSphere ESXi host uses for its boot partition are usually local disks that have a partition/file structure akin to the Linux file hierarchy. The disks are internal storage devices inside your ESXi host and external storage devices outside and connected to the host directly through different protocols. vSphere ESXi supports various internal and external local storage devices (disks), including SCSI, IDE, SATA, USB, and SAS storage systems. Because local storage devices do not support sharing across multiple hosts, the recommendation is to use them only for storing some template or ISO files.
3.1.2 Networked storage
Networked storage consists of external storage systems that your ESXi host uses to store virtual machine files remotely. Typically, the host accesses these systems over a high-speed storage network. Networked storage devices are shared. Datastores on networked storage devices can be accessed by multiple hosts concurrently. IBM DS Storage Systems attached to vSphere ESXi hosts support the following networked storage technologies.
Fibre Channel (FC) storage
FC storage stores virtual machine files remotely on an FC storage area network (SAN). An FC SAN is a high-speed network that connects your hosts to high-performance storage devices. The network uses FC protocol to transport SCSI traffic from virtual machines to the FC SAN devices. To connect to the FC SAN, your host needs to be equipped with FC host bus adapters (HBAs) and FC (fabric) switches to route storage traffic.
Figure 3-1 on page 31 shows a host with an FC HBA connected to a fibre array (storage) through a SAN fabric switch. The LUN from a storage array becomes available to the host. The virtual machine access to the LUNs is accomplished through a Virtual Machine File System (VMFS) datastore.
Figure 3-1 vSphere ESXi basic FC storage configuration
Internet Small Computer System Interface (iSCSI)
iSCSI is an industry standard development to enable transmission of SCSI block commands over the existing IP network by using TCP/IP. The virtual machine files are remotely stored on storage with iSCSI capabilities. iSCSI SANs use Ethernet connections between host servers and high-performance storage subsystems.
iSCSI SAN uses a client/server architecture, in which the client (vSphere ESXi host), called iSCSI initiator, operates on your host. It initiates iSCSI sessions by issuing SCSI commands and transmitting them, encapsulated into iSCSI protocol, to a server (storage system). The server is known as an iSCSI target. The iSCSI target represents a physical storage system on the network. The iSCSI target responds to the initiator’s commands by transmitting required iSCSI data.
VMware supports different types of initiators.
Hardware iSCSI adapter
A hardware iSCSI adapter is a third-party adapter that offloads iSCSI and network processing from your host. Hardware iSCSI adapters are divided into categories:
Dependent hardware iSCSI adapter: Depends on VMware networking and iSCSI configuration and management interfaces that are provided by VMware.
Independent hardware iSCSI adapter: Implements its own networking and iSCSI configuration and management interfaces.
Software iSCSI adapter
A software iSCSI adapter is VMware code built into the VMkernel. It allows your host to connect to the iSCSI storage device through standard network adapters. The software iSCSI adapter handles iSCSI processing while communicating with the network adapter. With the software iSCSI adapter, you can use iSCSI technology without purchasing specialized hardware.
Figure 3-2 on page 32 shows supported vSphere ESXi iSCSI initiators and a basic configuration.
Figure 3-2 vSphere ESXi iSCSI supported initiators and basic configuration
For more information about iSCSI and FC storage basics, see the IBM System Storage DS5000 Series Hardware Guide, SG24-8023.
3.1.3 SAN disk usage
VMware vSphere continues to emphasize support for SAN-based disks. SAN disk is used on vSphere Server in the following manner:
After the IBM Midrange Storage Subsystem is configured with arrays, logical drives, and storage partitions, these logical drives are presented to the vSphere Server.
Two options exist for using these logical drives within vSphere Server:
Option 1 Formatting these disks with the VMFS: This option is most common because several features require that the virtual disks are stored on VMFS volumes.
Option 2 Passing the disk through to the guest OS as a raw disk: No further virtualization occurs; instead, the OS writes its own file system onto that disk directly as though it is in a stand-alone environment without an underlying VMFS structure.
VMFS volumes house the virtual disks that the guest OS sees as its real disks. These virtual disks are in the form of a virtual disk file with the extension .vmdk.
The guest OS either reads from or writes to the virtual disk file (.vmdk) or writes through the vSphere ESXi abstraction layer to a raw disk. In both cases, the guest OS treats the disks as though they are real.
Figure 3-3 on page 33 shows logical drives to vSphere VMFS volumes.
Figure 3-3 Logical drives to vSphere VMFS volumes
3.1.4 Disk virtualization with VMFS volumes and .vmdk files
The VMware vSphere Virtual Machine File System (VMFS) is the file system created by VMware specifically for the vSphere Server environment. It is designed to handle very large disks (LUNs) and store the virtual machine (.vmdk) files. VMFS volumes store these types of information:
Virtual machine disk files (.vmdk)
Virtual machine configuration files (.vmx)
Memory images from suspended virtual machines
Snapshot files for the .vmdk files manually created or set to a disk mode of non-persistent, undoable, or append.
The virtual machine .vmdk files represent what is seen as a physical disk by the guest OS. These files have many distinct benefits over physical disks (although several of these functions are available through the advanced functions of an IBM Midrange Storage System):
They are portable, so they can be copied from one vSphere ESXi host to another, either for moving a virtual machine to a new ESXi host or to create backup or test environments. When copied, they retain all of the structure of the original disk and if it is the virtual machine’s boot disk, it includes all of the hardware drivers that are necessary to make it run on another vSphere ESXi host (although the .vmx configuration file also needs to be replicated to complete the virtual machine).
They are easily resized (using vmfsktools or vSphere Client) if the virtual machine needs more disk space. This option presents a larger disk to the guest OS that requires a volume expansion tool for accessing the additional space.
They can be mapped and remapped on a single vSphere ESXi host to keep multiple copies of a virtual machine’s data. Many more .vmdk files can be stored for access by a vSphere host than are represented by the number of virtual machines that are configured.
3.1.5 VMFS access mode: Public mode
Public mode is the default mode for VMware ESXi Server and the only option for VMware ESX 3.x and above.
VMFS-3 partitions also allow multiple vSphere Servers to access the VMFS volume concurrently and use file locking to prevent contention on the .vmdk files.
Introduced with vSphere5, VMFS-5 continues providing the same file locking mechanism as VMFS-3 to prevent the contention on the .vmdk files.
 
Note: Starting with VMFS-3, there is no longer a shared mode. Virtual machine clustering now occurs with raw device mapping (RDM) in physical or virtual compatibility mode.
3.1.6 vSphere Server .vmdk modes
vSphere Server has two modes of operation for .vmdk file disks that can be set from within the vSphere ESXi.
The server management user interface is seen during the creation of the .vmdk files or afterward by editing an individual virtual machine’s settings. The modes are listed:
Persistent Similar to normal physical disks in a server. vSphere Server writes immediately to a persistent disk.
Non-persistent Changes that were made since the last time a virtual machine was powered on are lost when that VM is powered off (soft reboots do not count as being powered off).
3.1.7 Specifics of using SAN arrays with vSphere ESXi Server
Using a SAN with an vSphere ESXi Server host differs from traditional SAN usage in various ways, which we discuss in this section.
Sharing a VMFS across vSphere ESXi Servers
The vSphere Virtual Machine File System (VMFS), shown in Figure 3-4 on page 35, is designed for concurrent access from multiple physical machines and enforces the appropriate access controls on virtual machine files.
vSphere VMFS can perform these functions:
Coordinate access to virtual disk files: ESXi Server uses file-level locks, which the VMFS Distributed Lock Manager manages. This feature prevents a virtual machine from being powered on by multiple servers at the same time.
Coordinate access to VMFS internal file system information (metadata): vSphere ESXi Server coordinates accurate shared data.
Figure 3-4 VMFS across ESXi hosts
Metadata updates
A VMFS holds files, directories, symbolic links, RDMs, and so on, and the corresponding metadata for these objects. Metadata is accessed each time the attributes of a file are accessed or modified. These operations include, but are not limited to the following operations:
Creating, growing, or locking a file
Changing a file’s attributes
Powering a virtual machine on or off
Creating or deleting a VMFS datastore
Expanding a VMFS datastore
LUN display and rescan
A SAN is dynamic. The LUNs that are available to a certain host can change based on several factors:
New LUNs created on the SAN storage arrays
Changes to LUN masking
Changes in SAN connectivity or other aspects of the SAN
The VMkernel discovers LUNs when it boots, and those LUNs are then visible in the VI Client. If changes are made to the LUNs, you must rescan to see those changes.
3.1.8 Host types
Every LUN has a slightly different behavior depending on the type of host that is accessing it. The host type determines how the storage subsystem controllers work with each operating system on the hosts to which they are connected. For VMware hosts, a special host type is available: VMware. If you are using the default host group, ensure that the default host type is also VMware.
 
Note: If you change the host type while the storage subsystem and host are running, you need to follow these guidelines:
The controllers do not need to be rebooted after the change of host type.
The host must be rebooted.
Change the host type under low I/O conditions.
3.1.9 Levels of indirection
If you are used to working with traditional SANs, the levels of indirection can initially be confusing for these reasons:
You cannot directly access the virtual machine operating system that uses the storage. With traditional tools, you can monitor only the VMware ESXi Server operating system, but not the virtual machine operating system. Use the vSphere Client to monitor virtual machines.
Each virtual machine is configured by default with one virtual hard disk and one virtual SCSI controller during installation. You can modify the SCSI controller type and SCSI bus sharing characteristics by using the vSphere Client to edit the virtual machine settings. You can also add hard disks and virtual SCSI controllers to your virtual machine.
The HBA that is visible to the SAN administration tools is part of the VMware vSphere ESXi Server, not the virtual machine.
The VMware vSphere ESXi Server system performs multipathing for you, the VMkernel multipathing plug-in that ESXi provides by default is the VMware Native Multipathing Plug-in (NMP). The NMP is an extensible module. The following tasks are in scope:
 – Manages physical path claiming and unclaiming
 – Registers and unregisters logical devices
 – Associates physical paths with logical devices
 – Processes I/O requests to logical devices
 – Supports management tasks, such as abort or reset of logical devices
3.2 Choosing the IBM Midrange Storage Subsystem for a VMware implementation
Unfortunately, there is not a straightforward answer to this question. All of the IBM Midrange Storage Systems can provide excellent functionality for attaching to VMware vSphere Servers. The answer depends on the specific requirements that a vSphere Server is intended to address and the expectations that need to be met in terms of performance, availability, capacity, growth, and so on.
One thing is certain, the sizing requirements for capacity and performance do not change when a vSphere Server is being considered as opposed to a group of individual physical servers. Some consolidation of SAN requirements can be achieved. Other requirements remain, for example, because of under-utilization. Consolidation is often possible regarding the number of physical HBAs that can be required and, therefore, also the number of SAN switch ports that are also required to connect those HBAs. Because both of these items come at a considerable cost, any reduction can represent significant savings. It is also common to find low-bandwidth utilization of HBAs and SAN switch ports in a non-consolidated environment, thus also adding to the potential for consolidation of these items.
Alternatively, it is common that individual physical disk utilization is high and, therefore, reducing the number of physical disks is often inappropriate. Like in all SAN implementations, consider both the immediate requirements of the project and the possibilities for reasonable future growth.
3.3 Overview of IBM Midrange Storage Systems
In this section, we provide a brief overview of the IBM Midrange Storage Systems to help you decide which storage subsystem is best suited for your VMware environment. For detailed descriptions of IBM Midrange Storage Systems, see these books:
IBM System Storage DS5000 Series Hardware Guide, SG24-8023
IBM System Storage DS5000 Series Implementation and Best Practices Guide, SG24-8024
IBM System Storage DS3500 Introduction and Implementation Guide, SG24-7914
IBM System Storage DCS3700 Introduction and Implementation Guide, SG24-8037
3.3.1 Positioning the IBM Midrange Storage Systems
The IBM DS storage family is suitable for a broad range of business needs. From entry-level IBM System Storage DS3000 series, to the midrange IBM System Storage DS5000 series, to high-performance IBM System Storage DS8000® series, the IBM DS storage family covers the needs of small businesses all the way up to dealing with large enterprise requirements.
The IBM Midrange Storage Systems, also referred to as the IBM System Storage DS3000 and DS5000 series, are designed to meet the demanding open-systems requirements of today and tomorrow, while establishing a new standard for lifecycle longevity with field-replaceable host interface cards. Seventh-generation architecture delivers relentless performance, real reliability, multidimensional scalability, and unprecedented investment protection.
The IBM System Storage DS3000 and DS5000 series consists of the following storage systems:
DS3500 Express:
 – DS3512 Express Single Controller Storage System (1746A2S)
 – DS3512 Express Dual Controller Storage System (1746A2D)
 – DS3524 Express Single Controller Storage System (1746A4S)
 – DS3524 Express Dual Controller Storage System (1746A4D)
 – DS3524 Express DC Dual Controller Storage System (1746T4D)
DCS3700 (1818-80C)
DS5020 Express Disk System (1814-20A)
Targeted at growing midrange sites requiring reliability, efficiency, and performance value
DS5100 Disk System (1818-51A)
Targeted at cost-conscious midrange sites requiring high-end functionality and pay-as-you-grow scalability
DS5300 Disk System (1818-53A)
Targeted at environments with compute-intensive applications and large-scale virtualization/consolidation implementations
Figure 3-5 on page 38 shows the positioning of the products within the DS3000 and DS5000 Midrange series.
Figure 3-5 Product positioning within the Midrange DS3000 and DS5000 series
For more information about the positioning and the characteristics of each of the family members of the IBM Midrange System Storage, see the IBM System Storage DS5000 Series Hardware Guide, SG24-8023, and IBM System Storage DS5000 Series Implementation and Best Practices Guide, SG24-8024.
3.4 Storage subsystem considerations
This section presents several important application-specific considerations.
3.4.1 Segment size
The segment size that we discuss in the following section refers to the data partitions of your VMware installation. It is recommended to separate your OS partitions from your data partitions. Base the segment size on the type and expected I/O size of the data. Store sequentially read data on logical drives with small segment sizes and with dynamic prefetch enabled to dynamically read-ahead blocks. For the procedure to choose the appropriate disk segment size, see “Calculating optimal segment size” on page 39.
Oracle
Little I/O from Oracle is truly sequential in nature except for processing redo logs and archive logs. Oracle can read a full-table scan all over the disk drive. Oracle calls this type of read a scattered read. The Oracle’s sequential data read is for accessing a single index entry or a single piece of data. Use small segment sizes for an Online Transaction Processing (OLTP) environment with little or no need for read-ahead data. Use larger segment sizes for a Decision Support System (DSS) environment where you perform full table scans through a data warehouse.
Remember three important things when considering block size:
Set the database block size lower than or equal to the disk drive segment size. If the segment size is set at 2 KB and the database block size is set at 4 KB, this procedure takes two I/O operations to fill the block, which results in performance degradation.
Make sure that the segment size is an even multiple of the database block size. This practice prevents partial I/O operations from filling the block.
Set the parameter db_file_multiblock_read_count appropriately. Normally, you want to set the db_file_multiblock_read_count as shown:
segment size = db_file_multiblock_read_count * DB_BLOCK_SIZE
You also can set the db_file_multiblock_read_count so that the result of the previous calculation is smaller but in even multiples of the segment size. For example, if you have a segment size of 64 KB and a block size of 8 KB, you can set the db_file_multiblock_read_count to 4, which equals a value of 32 KB, which is an even multiple of the 64-KB segment size.
SQL Server
For SQL Server, the page size is fixed at 8 KB. SQL Server uses an extent size of 64 KB (eight 8-KB contiguous pages). For this reason, set the segment size to 64 KB. Read “Calculating optimal segment size” on page 39.
Exchange server
Set the segment size to 64 KB or multiples of 64. See “Calculating optimal segment size”.
Calculating optimal segment size
The IBM term segment size refers to the amount of data that is written to one disk drive in an array before writing to the next disk drive in the array. For example, in a RAID 5, 4+1 array with a segment size of 128 KB, the first 128 KB of the LUN storage capacity is written to the first disk drive and the next 128 KB to the second disk drive. For a RAID 1, 2+2 array, 128 KB of an I/O is written to each of the two data disk drives and to the mirrors. If the I/O size is larger than the number of disk drives times 128 KB, this pattern repeats until the entire I/O is completed.
For very large I/O requests, the optimal segment size for a RAID array is one that distributes a single host I/O across all data disk drives.
The formula for optimal segment size is:
LUN segment size = LUN stripe width ÷ number of data disk drives
For RAID 5, the number of data disk drives is equal to the number of disk drives in the array minus 1, for example:
RAID5, 4+1 with a 64 KB segment size => (5-1) * 64KB = 256 KB stripe width
For RAID 1, the number of data disk drives is equal to the number of disk drives divided by 2, for example:
RAID 10, 2+2 with a 64 KB segment size => (2) * 64 KB = 128 KB stripe width
For small I/O requests, the segment size must be large enough to minimize the number of segments (disk drives in the LUN) that must be accessed to satisfy the I/O request, that is, to minimize segment boundary crossings. For IOPS environments, set the segment size to
256 KB or larger so that the stripe width is at least as large as the median I/O size.
When using a logical drive manager to collect multiple storage system LUNs into a Logical Volume Manager (LVM) array or volume group(VG), the I/O stripe width is allocated across all of the segments of all of the data disk drives in all of the LUNs. The adjusted formula is shown:
LUN segment size = LVM I/O stripe width / (# of data disk drives/LUN * # of LUNs/VG)
To learn the terminology so that you can understand how data in each I/O is allocated to each LUN in a logical array, see the vendor documentation for the specific LVM.
 
Best practice: For most implementations, set the segment size of VMware data partitions to 256 KB.
3.4.2 Midrange Storage Systems cache features
On the DS3500, DCS3700, and DS5000 series, there are two cache features that are worth describing. These features are the permanent cache backup and the cache mirroring.
The permanent cache backup feature provides a cache hold-up and de-staging mechanism to save cache and processor memory to a permanent device. This feature replaces the reliance on batteries in older models to keep the cache alive for a period of time when power is interrupted.
Disk drive cache has permanent data retention in a power outage. This function is accomplished by using flash drives. The batteries only power the controllers until data in the cache is written to the flash drives. When the storage subsystem is powered back up, the contents are reloaded to cache and flushed to the logical drives.
When you turn off the storage subsystem, it does not shut down immediately. The storage subsystem writes the contents of cache to the flash drives before powering off. Depending on the amount of cache, the storage subsystem might take up to several minutes to actually power off.
 
Note: When upgrading cache, dual inline memory modules (DIMMs) need to be upgraded together with flash drives.
The other feature used for cache protection is the dedicated write cache mirroring. When this feature is enabled, all cache is mirrored between controllers. In the case of a controller failure, write cache is not lost because the other controller mirrored the cache. When write cache mirroring is enabled, there is no impact to performance.
3.4.3 Enabling cache settings
Always enable read cache. Enabling read cache allows the controllers to process data from the cache if it was read before and thus the read is significantly faster. Data remains in the read cache until it is flushed.
Enable write cache to let the controllers acknowledge writes as soon as the data reaches the cache instead of waiting for the data to be written to the physical media. For other storage systems, a trade-off exists between data integrity and speed. The IBM DS3500, DCS3700, and DS5000 storage subsystems are designed to store data on both controller caches before being acknowledged. To protect data integrity, cache mirroring must be enabled to permit dual controller cache writes.
Enable write-cache mirroring to prevent the cache being lost in a controller failure.
Whether you need to prefetch cache depends on the type of data that is stored on the logical drives and how that data is accessed. If the data is accessed randomly (by way of table spaces and indexes), disable prefetch. Disabling prefetch prevents the controllers from reading ahead segments of data that most likely will not be used, unless your logical drive segment size is smaller than the data read size requested. If you are using sequential data, cache prefetch might increase performance because the data can be pre-stored in cache before reading.
3.4.4 Aligning file system partitions
Align partitions to stripe width. Calculate stripe width by the following formula:
segment_size / block_size * num_drives
For example, using this formula, 4+1 RAID 5 with a 512-KB segment equals 512 KB /
512 Byte * 4 drives= 4096 Bytes.
3.4.5 Premium features
Premium features, such as FlashCopy and VolumeCopy, are available for both the virtual drive and for the raw device mapping (RDM) device. For virtual drives, VMware has tools to provide these functions. For RDM devices, the IBM Midrange Storage Subsystem provides the following premium features:
VolumeCopy
FlashCopy
Enhanced Remote Mirroring
Storage Partitioning
3.4.6 Considering individual virtual machines
Before you can effectively design your array and logical drives, you must determine the primary goals of the configuration: performance, reliability, growth, manageability, or cost. Each goal has positives, negatives, and trade-offs. With the goals determined for your environment, follow the guidelines discussed in this chapter to implement them. To get the best performance from the IBM storage subsystem, you must know the I/O characteristics of the files to be placed on the storage system. After you know the I/O characteristics of the files, you can set up a correct array and logical drive to support these files.
Web servers
Web server storage workloads typically contain random small writes. RAID 5 provides good performance. It has the advantage of protecting the system from one drive loss and has a lower cost by using fewer disk drives.
Backup and file read applications
The IBM Midrange Storage Systems perform very well for a mixed workload. There are ample resources, such as IOPS and throughput, to support backups of virtual machines and not affect the other applications in a virtual environment. Addressing performance concerns for individual applications takes precedence over backup performance.
However, there are applications that read large files sequentially. If performance is important, consider using RAID 10. If cost is also a concern, RAID 5 protects from disk drive loss with the fewest disk drives.
Databases
Databases are classified as one of the following categories:
Frequently updated databases: If your database is frequently updated and if performance is a major concern, your best choice is RAID 10, even though RAID 10 is the most expensive because of the number of disk drives and expansion drawers. RAID 10 provides the least disk drive overhead and provides the highest performance from the IBM storage systems.
Low-to-medium updated databases: If your database is updated infrequently or if you must maximize your storage investment, choose RAID 5 for the database files. RAID 5 lets you create large storage logical drives with minimal redundancy of disk drives.
Remotely replicated environments: If you plan to remotely replicate your environment, carefully segment the database. Segment the data on smaller logical drives and selectively replicate these logical drives. Segmenting limits WAN traffic to only what is absolutely needed for database replication. However, if you use large logical drives in replication, initial establish times are larger and the amount of traffic through the WAN might increase, leading to a poor database performance. The IBM premium features, Enhanced Remote Mirroring (ERM), VolumeCopy, and FlashCopy, are useful for replicating remote environments.
3.4.7 Determining the best RAID level for logical drives and arrays
In general, RAID 5 works best for sequential large I/Os (> 256 KB), and RAID 5 or RAID 1 works best for small I/Os (< 32 KB). For I/O sizes in between, the RAID level can be dictated by other application characteristics. Table 3-1 on page 43 shows the I/O size and optimal RAID level.
Table 3-1 I/O size and optimal RAID level
I/O size
RAID level
Sequential, large (>256 KB)
RAID 5
Small (<32 KB)
RAID 5 or RAID 1
Between 32K B and 256 KB
RAID level does not depend on I/O size.
RAID 5 and RAID 1 have similar characteristics for read environments. For sequential writes, RAID 5 typically has an advantage over RAID 1 because of the RAID1 requirement to duplicate the host write request for parity. This duplication of data typically puts a strain on the drive-side channels of the RAID hardware. RAID 5 is challenged most by random writes, which can generate multiple disk drive I/Os for each host write. Different RAID levels can be tested by using the DS Storage Manager Dynamic RAID Migration feature, which allows the RAID level of an array to be changed and maintain continuous access to data.
Table 3-2 shows the RAID levels that are most appropriate for specific file types.
Table 3-2 Best RAID level for file type
File type
RAID level
Comments
Oracle Redo logs
RAID 10
Multiplex with Oracle
Oracle Control files
RAID 10
Multiplex with Oracle
Oracle Temp datafiles
RAID 10, RAID 5
Performance first/drop re-create on disk drive failure
Oracle Archive logs
RAID 10, RAID 5
Determined by performance and cost requirements
Oracle Undo/Rollback
RAID 10, RAID 5
Determined by performance and cost requirements
Oracle datafiles
RAID 10, RAID 5
Determined by performance and cost requirements
Oracle executables
RAID 5
 
Oracle Export files
RAID 10, RAID 5
Determined by performance and cost requirements
Oracle Backup staging
RAID 10, RAID 5
Determined by performance and cost requirements
Exchange database
RAID 10, RAID 5
Determined by performance and cost requirements
Exchange log
RAID 10, RAID 5
Determined by performance and cost requirements
SQL Server log file
RAID 10, RAID 5
Determined by performance and cost requirements
SQL Server data files
RAID 10, RAID 5
Determined by performance and cost requirements
SQL Server Tempdb file
RAID 10, RAID 5
Determined by performance and cost requirements
Use RAID 0 arrays only for high-traffic data that does not need any redundancy protection for device failures. RAID 0 is the least used RAID format but provides for high-speed I/O without the additional redundant disk drives for protection.
Use RAID 1 for the best performance and to provide data protection by mirroring each physical disk drive. Create RAID 1 arrays with the most disk drives possible (30 maximum) to achieve the highest performance.
Use RAID 5 to create arrays with either 4+1 disk drives or 8+1 disk drives to provide the best performance and to reduce RAID overhead. RAID 5 offers good read performance at a reduced cost of physical disk drives compared to a RAID 1 array.
 
Note: If protection for a two drive failure is needed, use RAID 6. It has the same performance as RAID 5 but uses an extra drive for additional protection.
Use RAID 10 (RAID 1+0) to combine the best features of data mirroring of RAID 1, plus the data striping of RAID 0. RAID 10 provides fault tolerance and better performance compared to other RAID options. A RAID 10 array can sustain multiple disk drive failures and losses as long as no two disk drives form a single pair of one mirror.
3.4.8 Server consolidation considerations
There is a misconception that simply adding up the amount of storage required for the number of servers that will be attached to a SAN is good enough to size the SAN. The importance of understanding performance and capacity requirements is very high but is even more relevant to the VMware environment because the concept of server consolidation is also part of the equation. Figure 3-6 demonstrates a consolidation of four physical servers into a single VMware ESXi Server to explain the considerations.
Figure 3-6 Unrealistic storage consolidation
In Figure 3-6, an attempt is made to take the capacity requirement that is calculated from the four existing servers and use that as a guide to size a single RAID 5 array to host all four virtual environments.
It is unlikely that assigning a single RAID 5 LUN to the vSphere Server host in this way supplies enough disk performance to service the virtual machines adequately.
 
Note: While the following guidelines help to increase the performance of a VMware ESXi Server environment, it is important to realize that the overhead of the VMware ESXi Server virtualization layer still exists. In cases where 100% of the native or non-virtualized performance is required, an evaluation of the practicality of a VMware environment must occur.
An assessment of the performance of the individual environments can show that there is room for consolidation with smaller applications. The larger applications (mail or DB) require that similar disk configurations are given to them in a SAN environment as they had in the previous physical environment.
Figure 3-7 illustrates that a certain amount of storage consolidation might indeed be possible without ignoring the normal disk planning and configuration rules that apply for performance reasons. Servers with a small disk I/O requirement can be candidates for consolidation onto fewer LUNs; however, servers that have I/O-intensive applications require disk configurations that are similar to those of their physical counterparts. It might not be possible to make precise decisions how to best configure the RAID array types and which virtual machine disks must be hosted on them until after the implementation. In an IBM Midrange Storage Systems environment, it is safe to configure several of these options later through the advanced dynamic functions that are available on the storage subsystems.
Figure 3-7 Potential realistic storage consolidation
These changes might include the following actions:
Adding more disks (capacity) to an array using the Dynamic Capacity Expansion function (before creating VMFS datastores on the LUN)
And, joining two VMFS volumes together in a volume set
Changing the array type from RAID 5 to RAID 10 using the Dynamic RAID-Level Migration function
Or, changing the segment sizing to better match our application using the Dynamic Segment Sizing function
 
Note: Dynamic Volume Expansion is not supported for VMFS-formatted LUNs.
3.4.9 VMware ESX Server storage configurations
There are many ways to implement VMware ESXi Servers that are attached to IBM Midrange Storage Systems. Variants range from the number of HBAs, switches, and paths that are available for a VMware ESXi Server, to multiple VMware ESXi Servers sharing access to logical drives on the IBM Midrange Storage Systems.
Configuring according to a common base of settings allows for growth from one configuration to another with minimal impact. It is therefore recommended to review all of the configurations with your growth plan in mind (as much as possible) so that best practices can be applied from the initial installation and last through a final configuration as it develops over time.
This principle correlates with the installation and configuration details that we give throughout this document. Compile the settings that need to be made into a common set for all configurations with additional minimal changes listed for specific configurations as required.
At the time of writing, DS Storage Manager software is not available for VMware ESX Server operating systems. Therefore, to manage DS5000 Storage Subsystems with your VMware ESXi Server host, you must install the Storage Manager client software (SMclient) on a Microsoft Windows or Linux management workstation, which can be the same workstation that you use for the browser-based VMware ESXi Server Management interface.
VMware ESXi Server restrictions
Certain VMware ESXi server restrictions exist for storage.
SAN and connectivity restrictions
In this section, we discuss SAN and connectivity restrictions for storage:
VMware ESXi Server hosts support host-agent (out-of-band) managed DS5000 configurations only. Direct-attach (in-band) managed configurations are not supported.
VMware ESXi Server hosts can support multiple host bus adapters (HBAs) and DS5000 devices. However, there is a restriction on the number of HBAs that can be connected to a single DS5000 Storage Subsystem. You can configure up to two HBAs per partition and up to two partitions per DS5000 Storage Subsystem. Additional HBAs can be added for additional DS5000 Storage Subsystems and other SAN devices, up to the limits of your specific subsystem platform.
When you use two HBAs in one VMware ESXi Server, LUN numbers must be the same for each HBA that is attached to the DS5000 Storage Subsystem.
Single HBA configurations are allowed, but each single HBA configuration requires that both controllers in the DS5000 are connected to the HBA through a switch. If they are connected through a switch, both controllers must be within the same SAN zone as the HBA.
 
Important: Having a single HBA configuration can lead to the loss of access data in a path failure.
Single-switch configurations are allowed, but each HBA and DS5000 controller combination must be in a separate SAN zone.
Other storage devices, such as tape devices or other disk storage, must be connected through separate HBAs and SAN zones.
Partitioning restrictions
In this section, we discuss partitioning restrictions for storage:
The maximum number of partitions per VMware ESXi Server host, per DS5000 Storage Subsystem, is two.
All logical drives that are configured for VMware ESXi Server must be mapped to a VMware ESXi Server host group.
 
Note: Set the host type of all your VMware ESXi Servers to VMware. If you are using the default host group, ensure that the default host type is VMware.
Assign LUNs to the VMware ESXi Server starting with LUN number 0.
Do not map an access (UTM) LUN (LUN ID 31) to any of the VMware ESXi Server hosts or host groups. Access (UTM) LUNs are used only with in-band managed DS5000 configurations, which VMware ESXi Server does not support at this time.
Failover restrictions
In this section, we discuss failover restrictions for storage:
You must use the VMware ESXi Server failover driver for multipath configurations. Other failover drivers, such as RDAC, are not supported in VMware ESXi Server configurations.
The default failover policy for all DS5000 Storage Subsystems is now most recently used (MRU).
Use the VMware host type in VMware ESXi Server configurations (2.0 and higher).
The VMware host type automatically disables AVT/ADT.
Other restrictions
In this section, we discuss other restrictions for storage:
Dynamic Volume Expansion is not supported for VMFS-formatted LUNs.
Recommendation: Do not boot your system from a SATA device.
Cross-connect configuration for VMware vSphere ESXi
A cross-connect storage area network (SAN) configuration is required when VMware vSphere ESXi hosts are connected to IBM Midrange Storage Systems. Each host bus adapter (HBA) in a vSphere ESXi host must have a path to each of the controllers in the DS storage subsystem. Figure 3-8 on page 48 shows the cross connections for VMware server configurations.
Figure 3-8 Cross-connect configuration for vSphere ESXi connections
A single path to both controllers can lead to either unbalanced logical drive ownership or thrashing under certain conditions. The ownership of all logical drives can be forced to one of the controllers. Depending on which path the VMware ESXi Server finds first, the single active controller on that path can be forced to assume ownership of all LUNs, even those for which that controller is not the preferred owner. This process limits the storage performance for the VMware ESXi Server.
In configurations that involve multiple VMware ESXi Servers that are attached to the IBM DS Midrange Storage Systems, the behavior is exacerbated. When one VMware ESXi Server performs LUN discovery, it can lead to thrashing or bouncing logical drive ownership between the controllers.
To avoid these problems, VMware advises that you set up four paths between the server and the storage system. At least two vSphere ESXi host HBA ports must be used and both HBA ports must see both controllers.
A loss of one of the paths can lead to less than optimal performance because logical drives owned by the controller on the lost path are transferred to the other controller with the surviving path.
If performance is also a concern, consider adding additional connections from one of the storage system’s available host ports to the switch.
To preserve logical drive ownership, each controller is cross-connected to the other switch. The disadvantage of this type of switching is that the additional storage system host ports are consumed for the zone and cannot be used to address other performance concerns. If you are seeking to prevent logical drive ownership transfer, consider using the additional controller to switch connections in multiple zones.
The previous recommendations prevent thrashing but do not sufficiently address performance concerns. Only one of the paths can be active, because the first HBA port that the vSphere ESXi host configured is used to communicate with both controllers. To maximize performance, you must spread the load between more paths.
3.4.10 Configurations by function
This section discusses different configurations that are available when using multiple vSphere hosts.
A vSphere VMFS volume can be set as one of these modes:
A VMFS volume that is visible by only one vSphere ESXi host. We call this mode the independent VMFS module. When you have multiple vSphere ESXi hosts, independent VMFS modules can be set through LUN masking (partitioning). This type of configuration is rarely needed and not recommended. It might be implemented when there is a requirement to separate the vSphere hosts’ virtual machines. For example, two companies or departments share a SAN infrastructure but need to retain their own servers/applications.
A VMFS volume that is visible by multiple vSphere ESXi hosts. This mode is the default. This VMFS mode is called public VMFS.
A VMFS volume that is visible by multiple vSphere ESXi hosts and stores virtual disks (.vmdk) for split virtual clustering. This VMFS mode is called shared VMFS.
Public VMFS might be implemented for the following reasons:
vSphere high availability (HA) using two (or more) vSphere ESXi hosts with shared LUNs, allowing one vSphere ESXi host to restart the workload of the other vSphere ESXi host if needed. With public VMFS, virtual machines can be run on any host, ensuring a level of application availability in a hardware failure on the vSphere hosts.
This is possible, as multiple vSphere Servers have access to the same VMFS volumes, and a virtual machine can be started from potentially any vSphere Server host (although not simultaneously). It is important to understand that this approach does not protect against .vmdk file corruption or failures in the storage subsystem unless the .vmdk file is somehow replicated elsewhere.
vSphere vMotion allows a running virtual machine to be migrated from one vSphere host to another without being taken offline. In scenarios where a vSphere Server needs to be taken down for maintenance, the virtual machines can be moved without being shut down and as they receive workload requests.
vSphere Storage vMotion allows you to relocate virtual machine disk files between and across shared storage locations, maintaining continuous service availability.
Clustering is another method to increase the availability of the environment and is only supported by VMware vSphere using Microsoft Clustering Services (MSCS) on Windows guests. Clustering cannot only transfer the workload with minimal interruption during maintenance, but near continuous application availability can be achieved in an OS crash or hardware failure, depending on which of the following configurations is implemented:
 – Local virtual machine cluster increases availability of the OS and application. Many server failures relate to software failure; therefore, implementing this configuration can help reduce software downtime. This configuration does not however increase hardware availability, and this might need to be considered when designing the solution.
 – Split virtual machine cluster increases availability of the OS, application, and vSphere ESXi host hardware by splitting the cluster nodes across two vSphere ESXi hosts. In an OS or vSphere ESXi host hardware failure, the application can fail over to the surviving vSphere host/virtual machine cluster node.
 – Physical/virtual machine (hybrid) cluster increases availability of the OS, application, and server hardware where one node is a dedicated physical server (non-ESX), and the other node is a virtual machine. Implementations of this kind are likely to occur where the active node of the cluster requires the power of a dedicated physical server (that is, four or more processors, or more than 3.6-GB memory) but where the failover node can be less powerful, yet remain for availability purposes.
The physical/virtual machine (hybrid) cluster might also be implemented where a number of dedicated physical servers are used as active nodes of multiple clusters failing over to their passive cluster nodes that all exist as virtual machines on a single vSphere Server. Because it is unlikely that all active nodes fail simultaneously, the vSphere ESXi host might only need to take up the workload of one cluster node at a time, therefore, reducing the expense of replicating multiple cluster nodes on dedicated physical servers. However, the physical server (that is, not the vSphere Server) can only have a non-redundant SAN connection (a single HBA and a single storage controller); therefore, we do not actively advocate the use of this solution.
Configuration examples
The examples in this section show the configuration options that are available when multiple vSphere hosts attach to shared storage partitions.
High availability
The configuration in Figure 3-9 shows multiple vSphere Servers connected to the same IBM Midrange Storage Subsystem with a logical drive (LUN) shared between the servers. (This configuration can have more than just two vSphere ESXi hosts.)
Figure 3-9 Multiple servers sharing a storage partition configuration sample
vSphere vMotion
The configuration for vSphere vMotion functions the same as the configuration in the preceding high availability (HA) section.
Clustering (guest OS level)
 
Note: Guest Clustering is only supported by VMware using Microsoft Clustering Services (MSCS) on Windows guests, and only in a two node per cluster configuration.
There are many ways to implement MSCS with VMware vSphere ESXi, depending upon the level of requirements for high availability and whether physical servers are included in the mix.
In the following sections, we review the ways that MSCS might be implemented.
A local virtual machine cluster configuration is shown in Figure 3-10. VMFS volumes are used with the access mode set to public for all of the virtual machine disks. This design requires that both virtual machines must run on top of the same ESXi physical host. Therefore, you lose the hardware redundancy provided by VMware HA functions described previously.
Figure 3-10 Local virtual machine cluster
A split virtual machine cluster configuration is shown in Figure 3-11 on page 52. VMFS volumes are used with the access mode set to public for all virtual machine .vmdk files (OS boot disks). Raw volumes are used for the cluster shares. The cluster shares can be .vmdk files on shared VMFS volumes, but limitations make using raw volumes easier to implement. There are other caveats about VMware function availability when using MSCS clustering functions with virtual machines.
Figure 3-11 Split virtual machine cluster
For more information about vSphere ESXi and Microsoft Cluster Services implementation and support, see this website:
For Microsoft Cluster Service (MSCS) support on ESX/ESXi, see VMware KB Article 1004617:
3.4.11 Zoning
Zoning for a VMware vSphere ESXi Server environment is essentially the same as a non-ESXi environment. It is considered a good practice to separate the traffic for stability and management reasons. Zoning follows your standard practice where, in reality, it is likely that multiple servers with different architectures (and potentially different cable configurations) are attached to the same IBM Midrange Storage Subsystem. In this case, additional hosts are added to the appropriate existing zones, or separate zones are created for each host.
A cross-connect SAN configuration is required when vSphere ESXi hosts are connected to IBM Midrange Storage Systems. Each HBA in a vSphere ESXi host must have a path to each of the controllers in the DS Storage Subsystem.
Figure 3-12 on page 53 shows a sample configuration with multiple switches and multiple zones.
Figure 3-12 Multiple switches with multiple zones
For more information about zoning the SAN switches, see Implementing an IBM b-type SAN with 8 Gbps Directors and Switches, SG24-6116, or Implementing an IBM/Cisco SAN, SG24-7545.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset