Chapter 14. Designing an N series solution

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Designing an N series solution

This chapter addresses the issues to consider when sizing an IBM System Storage N series storage system to your environment. The following topics are addressed:

•Performance and throughput

•Capacity requirements

•Effects of optional features

•Future expansion

•Application considerations

•Backup and recovery

•Resiliency to failure

•Configuration limits

A complete explanation is beyond the scope of this book, so only high-level planning considerations are presented.

This chapter includes the following sections:

•Primary issues that affect planning

•Performance and throughput

•Summary

14.1 Primary issues that affect planning

You need to determine the following questions during the planning process:

•Which model IBM System Storage N series to use

•What amount of storage would be required on the IBM System Storage N series

•Which optional features are wanted

•What your future expansion requirements are

To begin the planning process, use the following IBM solution planning tools:

•IBM Capacity Magic

•IBM Disk Magic

14.1.1 IBM Capacity Magic

This tool is used to calculate physical and effective storage capacity. It supports the IBM DS6000™, DS8000, IBM V7000, and N series models.

•IBM staff

https://w3-03.sso.ibm.com/sales/support/ShowDoc.wss?docid=E272838K75735I92

•IBM Business Partners

TBA

14.1.2 IBM Disk Magic

This tool is used to estimate disk subsystem performance. It supports the IBM XIV, DS8000, DS6000, DS5000, DS4000®, SAN Volume Controller, V7000, V7000U, and SAN-attached N Series.

•IBM staff

https://w3-03.sso.ibm.com/sales/support/ShowDoc.wss?docid=Q947558L63209Z65

•IBM Business Partners

TBA

Restriction: The Disk Magic and Capacity Magic tools are licensed for use by IBM staff and IBM Business Partners only.

14.2 Performance and throughput

The performance required from the storage subsystem is driven by the number of client systems that rely on the IBM System Storage N series, and the applications running on those systems. Keep in mind that performance involves a balance of all of the following factors:

•Performance of a particular IBM System Storage N series model

•Number of disks used for a particular workload

•Type of disks used

•How close to capacity the disks being run are

•Number of network interfaces in use

•Protocols used for storage access

•Workload mix (reads versus writes versus lookups)

– Protocol choice

– Percentage mix of read and write operations

– Percentage mix of random and sequential operations

– I/O sizes

– Working set sizes for random I/O

– Latency requirements

– Background tasks running on the storage system (for example, SnapMirror)

Tip: Always size a storage system to have reserve capacity beyond what is expected to be its normal workload.

14.2.1 Capacity requirements

A key measurement of a storage system is the amount of storage that it provides. Vendors and installers of storage systems generally deal with raw storage capacities. Users, however, are generally only concerned with available capacity. Ensuring that the gap is bridged between raw capacity and usable capacity minimizes surprises both at installation time and in the future.

Particular care is required when specifying storage capacity, because disk vendors, array vendors, and client workstations often use different nomenclature to describe the same capacity. Storage vendors usually specify disk capacity in “decimal” units, whereas desktop operating systems usually work in “binary” units. These units are often used in confusingly similar or incorrect ways.

Although this might seem to be a subtle difference, it can rapidly compound in large networks. This can cause the storage to be significantly over or under provisioned. In situations where capacity needs to be accurately provisioned, this discrepancy can cause an outage or even data loss. For example, if a client OS supports a maximum LUN size of 2 TB (decimal), it might fail if presented with a LUN of 2 TB (binary).

To add to the confusion, these suffixes have traditionally been applied in different ways across different technologies. For example, network bandwidth is always decimal (100 Mbps = 100 x 10^6 bits). Memory is always binary, but is not usually shown as “GiB” (4 GB = 4 x 2^30 bytes).

Table 14-1 shows a comparison of the two measurements.

Table 14-1 Decimal versus binary measurement

Name (ISO)	Suffix (ISO)	Value (bytes)	Approx. Difference	Value (bytes)	Suffix (IEC)	Name (IEC)
kilobyte	kB	10^3	2%	2^10	KiB	kibibyte
megabyte	MB	10^6	5%	2^20	MiB	mebibyte
gigabyte	GB	10^9	7%	2^30	GiB	gibibyte
terabyte	TB	10^12	9%	2^40	TiB	tebibyte
petabyte	PB	10^15	11%	2^50	PiB	pebibyte

Some systems use a third option, where they define 1 GB as 1000 x 1024 x 1024 kilobytes.

This conversion between binary and decimal units causes most of the capacity “lost” when calculating the correct size of capacity in an N series design. These two methods represent the same capacity, a bit like measuring distance in kilometers or miles, but then using the incorrect suffix.

For more information, see the following website:

http://en.wikipedia.org/wiki/Gigabyte

Remember: This document uses decimal values exclusively, so 1 MB = 10^6 bytes.

Raw capacity

Raw capacity is determined by taking the number of disks connected and multiplying by their capacity. For example, 24 disks (the maximum in the IBM System Storage N series disk shelves) times 2 TB per drive is a raw capacity of approximately 48,000 GB, or 48 TB.

Usable capacity

Usable capacity is determined by factoring out the portion of the raw capacity that goes to support the infrastructure of the storage system. This capacity includes space used for operating system information, disk drive formatting, file system formatting, RAID protection, spare disk allocation, mirroring, and the Snapshot protection mechanism.

The following example is where the storage would go in the example 24 x 2 TB drive system. Capacity usually gets used in the following areas:

•Disk ownership: In an N series dual controller (active/active) cluster, the disks are assigned to one, or the other, controller.

In the example 24 disk system, the disks are split evenly between the two controllers (12 disks each).

•Spare disks: It is good practice to allocate spare disk drives to every system. These drives are used if a disk drive fails so that the data on the failed drive can automatically be rebuilt without any operator intervention or downtime.

The minimum acceptable practice would be to allocate one spare drive, per drive type, per controller head. In the example, that would be two disks because it is a two-node cluster.

•RAID: When a drive fails, it is the RAID information that allows the lost data to be recovered.

– RAID-4: Protects against a single disk failure in any RAID group, and requires that one disk is reserved for RAID parity information (not user data).

Because disk capacities have increased greatly over time, with a corresponding increase in the risk of an error during the RAID rebuild, do not use RAID-4 for production use.

The remaining 11 drives (per controller), divided into 2 x RAID-4 groups, require two disks to be reserved for RAID-4 parity, per controller.

– RAID-DP: Protects against a double disk failure in any RAID group, and requires that two disks be reserved for RAID parity information (not user data).

With the IBM System Storage N series, the maximum protection against loss is provided by using the RAID-DP facility. RAID-DP has many thousands of times better availability than traditional RAID-4 (or RAID-5), often for little or no additional capacity.

The remaining 11 drives (per controller), allocated to 1 x RAID-DP group, require two disks to be reserved for RAID-DP parity, per controller.

•The RAID groups are combined to create storage aggregates that then have volumes (also called file systems) or LUNs allocated on them.

Normal practice would be to treat the nine remaining disks (per controller) as data disks, thus creating a single large aggregate on each controller.

All 24 available disks are now allocated:

•Spare disk drive: 2 (1 per controller)

•RAID parity disks: 2 (2 per controller)

•Data disks: 18 (9 per controller)

About 25% of the raw capacity is used by hardware protection. This amount varies depending on the ratio of data disks to protection disks. The remaining usable capacity becomes less deterministic from this point because of ever increasing numbers of variables, but a few firm guidelines are still available.

Right-sizing

A commonly misunderstood memory requirement is that imposed by the right-sizing process. This overhead is because of three main factors:

•Block leveling

– Disks from different batches (or vendors) can contain a slightly different number of addressable blocks. Therefore, the N series controller assigns a common maximum capacity across all drives of the same basic type. For example, this process makes all “1 TB” disks exactly equal.

– Block leveling has a negligible memory requirement because disks of the same type are already similar.

•Decimal to binary conversion

– Because disk vendors measure capacity in decimal units and array vendors usually work in binary units, the stated usable capacity differs.

– However no capacity is really lost because both measurements refer to the same number of bytes. For example, 1000 GB decimal = 1000000000000 bytes = 931 GB binary.

•Checksums for data integrity

– Fibre Channel (FC) disks natively use 520 byte sectors, of which only 512 bytes are used to store user data. The remaining 8 bytes per sector are used to store a checksum value. This imposes a minimal capacity overhead.

– SATA disks natively use 512 byte sectors, all of which is used to store user data. Therefore one sector per eight blocks is reserved to store the checksum value. This imposes a higher capacity overhead than for FC disks.

Table 14-2 Right-sized disk capacities

Disk Type	Capacity (decimal GB)	Capacity GB (binary GB)	Checksum Type	Right-sized Cap. (binary GB)
FC	72	68	512/520 Block (approximately 2.4%)	66
	144	136		132
	300	272		265
	600
SATA (or NL-SAS)	500	465	8/9 512 Block (approximately 11.1%)	413
	750	698		620
	1000	931		827
	2000	1862		1655
	3000	2794		2483

Effect of the aggregate

When the disks are added to an aggregate, they are automatically assigned to RAID groups. Although this process can be tuned manually, there is no separate step to create RAID groups within the N series platform.

The aggregate might impose some capacity overhead, depending on the DOT version:

•DOT 8.1

– In the latest version on ONTAP, the default aggregate snapshot reserve is 0%

– Do not change this setting unless you are using a MetroCluster or SyncMirror configuration. In those cases, change it to 5%

•DOT 7.x

– In earlier versions on ONTAP, the aggregate had a default aggregate snapshot reserve of 5%. However, the modern administration tools (such as NSM) use a default of 0%

– The default was typically only used in a MetroCluster or SyncMirror configuration. In all other cases, it can safely be changed to 0%

Effect of the WAFL file system

Another factor that affects capacity is imposed by the file system. The Write Anywhere File Layout (WAFL) file system used by the IBM System Storage N series has less effect than many file systems, but the effect still exists. Generally, WAFL has a memory usage equal to 10% of the formatted capacity of a drive. This memory is used to provide consistent performance as the file system fills up. The reserved space increases the probability of the system locating contiguous blocks on disk.

As a result, the example 2000 GB (decimal) disk drives are down to only a little under 1500 GB (binary) before any user data is put on them. If you take the nine data drives per controller and allocate them to a single large volume, the resulting capacity is approximately 13,400 GB (binary) (Figure 14-1).

Figure 14-1 Example of raw (decimal) to usable (binary) capacity

The example in Figure 14-1 is for a small system. The ratio of usable to raw capacity varies depending on factors such as RAID group size, disk type, and space efficiency features that can be applied later. Examples of these features include thin provisioning, deduplication, compression, and Snapshot backup.

Effect of Snapshot protection

Finally, consider the effect of Snapshot protection on capacity. Snapshot is a built-in capability that keeps space free until it is actually used. However, using Snapshot affects the apparent usable capacity of the storage system. It is common to run a storage system with 20% of space reserved for Snapshot use. To the user, this space seems to be unavailable. The amount allocated for this purpose can be easily adjusted when necessary to a lower or higher value.

Running with this 20% setting further reduces the 13,400 GB usable storage to approximately 10,700 GB (binary). Whether you consider the snapshot reserve as being overhead or just part of the usable capacity depends on your requirements.

To return to reconciling usable storage to raw storage, this example suggests that either 65% or 55% of raw capacity is available for storing user data. The percentage depends on how you classify the snapshot reserve. In general, larger environments tend to result in a higher ratio of raw to usable capacity.

Attention: When introducing the N series gateway in a pre-existing environment, note that the final usable capacity is different from that available on the external disk system before being virtualized.

14.2.2 Other effects of Snapshot

It is important to understand the potential effect of creating and retaining Snapshots, on both the N series controller and any associated servers and applications. Also, the Snapshots need to be coordinated with the attached servers and applications to ensure data integrity.

The effect of Snapshots is determined by these factors:

•N series controller:

– Negligible effect on the performance of the controller: The N series snapshots use a redirect-on-write design. This design avoids most of the performance effect normally associated with Snapshot creation and retention (as seen in traditional copy-on-write snapshots on other platforms).

– Incremental capacity is required to retain any changes: Snapshot technology optimizes storage because only changed blocks are retained. For file access, the change rate is typically in the 1–5% range. For database applications, it might be similar. However, in some cases it might be as high as 100%.

•Server (SAN-attached):

– Minor effect on the performance of the server when the Snapshot is created (to ensure file system and LUN consistency).

– Negligible ongoing effect on performance to retain the Snapshots

•Application (SAN or NAS attached):

– Minor effect on the performance of the application when the snapshot is created (to ensure data consistency). This effect depends on the snapshot frequency. Once per day, or multiple times per day might be acceptable, but more frequent Snapshots can have an unacceptable effect on application performance.

– Negligible ongoing effect on performance to retain the Snapshots

•Workstation (NAS attached):

– No effect on the performance of the workstation. Frequent Snapshots are possible because the NAS file system consistency is managed by the N series controller.

– Negligible ongoing effect on performance to retain the Snapshots

14.2.3 Capacity overhead versus performance

There is considerable commercial pressure to make efficient use of the physical storage media. However, there are also times when using more disk spindles is more efficient.

Consider an example where 100 TB is provisioned on two different arrays:

•100% raw-to-usable efficiency requires 100 x 1 TB disks, with each disk supporting perhaps 80 IOPS, for a total of 8000 physical IOPS.

•50% raw-to-usable efficiency requires 200 x 1 TB disks, with each disk supporting perhaps 80 IOPS, for a total of 16,000 physical IOPS.

Obviously this is a simplistic example. Much of the difference might be masked behind the controller’s fast processor and cache memory. But it is important to consider the number of physical disk spindles when designing for performance.

14.2.4 Processor utilization

Generally, a high processor load on a storage controller is a not, on its own, a good indicator of a performance problem. This is due both to the averaging that occurs on multi-core, multi-processor hardware. Also, the system might be running low-priority housekeeping tasks while otherwise idle (and such tasks are preempted to service client I/O).

One of the benefits of Data ONTAP 8.1 is that it takes better advantage of the modern multi-processor controller hardware.

The optimal initial plan would be for 50% average utilization, with peak periods of 70% processor utilization. In a two-node storage cluster, this configuration allows the cluster to fail-over to a single node with no performance degradation.

If the processors are regularly running at a much higher utilization (for example, 90%), then performance might still be acceptable. However, expect some performance degradation in a fail-over scenario because 90% + 90% adds up to a 180% load on the remaining controller.

14.2.5 Effects of optional features

A few optional features affect early planning. Most notably, heavy use of the SnapMirror option can use large amounts of processor resources. These resources are directly removed from the pool available for serving user and application data. This process results in what seems to be an overall reduction in performance. SnapMirror can affect available disk I/O bandwidth and network bandwidth as well. Therefore, if heavy, constant use of SnapMirror is planned, adjust these factors accordingly.

14.2.6 Future expansion

Many of the resources of the storage system can be expanded dynamically. However, you can make this expansion easier and less disruptive by planning for possible future requirements from the start.

Adding disk drives is one simple example. The disk drives and shelves themselves are all hot-pluggable, and can be added or replaced without service disruption. But if, for example, all available space in a rack is used by completely full disk shelves, how does a disk drive get added?

Where possible, a good practice from the beginning is to try to avoid fully populating disk shelves. It is much more flexible to install a new storage system with two half-full disk shelves attached to it rather than a single full shelf. The added cost is generally minimal, and is quickly recovered the first time additional disks are added.

Similar consideration can be given to allocating network resources. For instance, if a storage system has two available gigabit Ethernet interfaces, it is good practice to install and configure both interfaces from the beginning. Commonly, one interface is configured for actual production use and one as a standby in case of failure. However, it is also possible (given a network environment that supports this) to configure both interfaces to be in use and provide mutual failover protection to each other. This arrangement provides additional insurance because both interfaces are constantly in use. Therefore, you will not find that the standby interface is broken when you need it at the time of failure.

Overall, it is valuable to consider how the environment might change in the future and to engineer in flexibility from the beginning.

14.2.7 Application considerations

Different applications and environments put different workloads on the storage system. This section addresses a few considerations that are best addressed early in the planning and installation phases.

Home directories and desktop serving

This is a traditional application for network-attached storage solutions. Because many clients are attached to one or more servers, there is little possibility to effectively plan and model in advance of actual deployment. But a few common sense considerations can help:

•This environment is generally characterized by the use of Network File System (NFS) or Common Internet File System (CIFS) protocols.

•It is generally accessed by using Ethernet with TCP/IP as the primary access mechanism.

•The mix of reading and writing is heavily tipped towards the reading side. Uptime requirements are generally less than those for enterprise application situations, so scheduling downtime for maintenance is not too difficult.

In this environment, the requirements for redundancy and maximum uptime are sometimes reduced. The importance of data writing throughput is also lessened. More important is the protection offered by Snapshot facilities to protect user data and provide for rapid recovery in case of accidental deletion or corruption. For example, email viruses can disrupt this type of environment more readily than an environment that serves applications like Oracle or SAP.

Load balancing in this environment often takes the form of moving specific home directories from one storage system to another, or moving client systems from one subnet to another. Effective prior planning is difficult. The best planning takes into account that the production environment is dynamic, and therefore flexibility is key.

It is especially important in this environment to install with maximum flexibility in mind from the beginning. This environment also tends to use many Snapshot images to maximize the protection offered to the user.

Enterprise applications

Previously the domain of direct-attached storage (DAS) architectures, it is becoming much more common to deploy enterprise applications that use SAN or NAS storage systems. These environments have significantly different requirements than the home directory environment. It is common for the emphasis to be on performance, uptime, and backup rather than on flexibility and individual file recovery.

Commonly, these environments use a block protocol such as iSCSI or FCP because they mimic DAS more closely than NAS technologies. However, increasingly the advantages and flexibility provided by NAS solutions have been drawing more attention. Rather than being designed to serve individual files, the configuration focuses on LUNs or the use of files as though they were LUNs. An example would be a database application that uses files for its storage instead of LUNs. At its most fundamental, the database application does not treat I/O to files any differently than it does to LUNs. This configuration allows you to choose the deployment that provides the combination of flexibility and performance required.

Enterprise environments are usually deployed with their storage systems clustered. This configuration minimizes the possibility of a service outage caused by a failure of the storage appliance. In clustered environments, there is always the opportunity to spread workload across at least two active storage systems. Therefore, getting good throughput for the enterprise application is generally not difficult.

This assumes that the application administrator has a good idea of where the workloads are concentrated in the environment so that beneficial balancing can be accomplished. Clustered environments always have multiple I/O paths available, so it is important to balance the workload across these I/O paths and across server heads.

For mission-critical environments, it is important to plan for the worst-case scenario. That is, running the enterprise when one of the storage systems fails and the remaining single unit must provide the entire load. In most circumstances, the mere fact that the enterprise is running despite a significant failure is viewed as positive. However, but there are situations in which the full performance expectation must be met even after a failure. In this case, the storage systems must be sized accordingly.

Block protocols with iSCSI or FCP are also common. The use of a few files or LUNs to support the enterprise application means that the distribution of the workload is relatively easy to install and predict.

Microsoft Exchange

Microsoft Exchange has a number of parameters that affect the total storage required of N series. The following are examples of those parameters:

•Number of instances

With Microsoft Exchange, you can specify how many instances of an email or document are saved. The default is 1. If you elect to save multiple instances, take this into consideration for storage sizing.

•Number of logs kept

Microsoft Exchange uses a 5 MB log size. The data change rate determines the number of logs generated per day for recovery purposes. A highly active Microsoft Exchange server can generate up to 100 logs per day.

•Number of users

This number, along with mailbox limit, user load, and percentage concurrent access, has a significant effect on the sizing.

•Mailbox limit

The mailbox limit usually represents the quota assigned to users for their mailboxes. If you have multiple quotas for separate user groups, this limit represents the average. This average, multiplied by the number of users, determines the initial storage space required for the mailboxes.

•I/O load per user

For a new installation, it is difficult to determine the I/O load per user, but you can estimate the load by grouping the users. Engineering and development tend to have a high workload because of drawings and technical documents. Legal might also have a high workload because of the size of legal documents. Normal staff usage, however, consists of smaller sized I/O, more frequent transaction workloads. Use the following formula to calculate the usage:

IOPS/Mailbox = (average disk transfers/sec) / (number of mailboxes)

•Concurrent users

Typically, an enterprise’s employees do not all work in the same time zone or location. Estimate the number of concurrent users for the peak period, which is usually the time when the most employees have daytime operations.

•Number of storage groups

Because a storage group cannot span N series storage systems, the number of storage groups affects sizing. There is no recommendation on number of storage groups per IBM System Storage N series storage system. However, the number and type of users per storage group helps determine the number of storage groups per storage system.

•Volume type

Are FlexVols or traditional volumes used? The type of volume used affects both performance and capacity.

•Drive type

Earlier, this chapter addressed the storage capacity effect of drive type. For Microsoft Exchange, the drive type and performance characteristics are also significant, especially with a highly used Exchange server. In an active environment, use smaller drives and higher performance characteristics such as RPM and Fibre Channel versus SATA.

•Read-to-write ratio

The typical read-to-write ratio is 70% to 30%.

•Growth rate

Industry estimates place data storage growth rates at 50% or higher. Size for at least two years into the future.

•Deleted mailbox cache space

This is a feature of Microsoft Exchange that must also be sized for storage usage on the N series. Microsoft allows for a time-specified retention of documents even after deletion of a mailbox. You also must size the storage effect of this feature.

14.2.8 Backup servers

Protecting and archiving critical corporate data is increasingly important. Deploying servers for this purpose is becoming more common, and these configurations call for their own planning guidelines.

A backup server generally is not designed to deliver high transactional performance. Data center managers rely on the backup server being available to receive the backup streams when they are sent. Often the backup server is an intermediate repository for data before it goes to backup tape and ultimately offsite. But frequently the backup server takes the place of backup tapes.

The write throughput of a backup server is frequently the most important factor to consider in planning. Another important factor is the number of simultaneous backup streams that a single server can handle. The more effective the write throughput and the greater the number of simultaneous threads, the more rapidly backup processes complete. The faster the processes complete, the sooner that production servers are taken out of backup mode and returned to full performance.

Each IBM System Storage N series platform has different capabilities in each of these areas. The planning process must take these characteristics into account to ensure that the backup server is capable of the workload expected.

14.2.9 Backup and recovery

In addition to backup servers, all storage systems must be backed up. Generally, the goal is to have the backup process occur at a time and in a way that minimizes the effect on overall production. Therefore, many backup processes are scheduled to run during off-hours. However, all of these backups run more or less at the same time. Therefore, the greatest I/O load put on the storage environment is frequently during these backup activities, instead of during normal production.

IBM System Storage N series storage systems have a number of backup mechanisms available. With prior planning, you can deploy an environment that provides maximum protection against failure while also optimizing the storage and performance capabilities.

Keep in mind the following issues:

•Storage capacity used by Snapshots

How much extra storage must be available for Snapshots to use?

•Networking bandwidth used by SnapMirror

In addition to the production storage I/O paths, SnapMirror needs bandwidth to duplicate data to the remote server.

•Number of possible simultaneous SnapMirror threads

How many parallel backup operations can be run at the same time before some resource runs out? Resources to consider include processor cycles, network throughput, maximum parallel threads (which is platform-dependent), and the amount of data that requires transfer.

•Frequency of SnapMirror operations

The more frequently data is synchronized, the fewer the number of changes each time. More frequent operations result in background operations running almost all the time.

•Rate at which stored data is modified

Data that does not change much (for example, archive repositories) does not need to be synchronized as often, and each operation takes less time.

•Use and effect of third-party backup facilities (for example, IBM Tivoli Storage Manager)

Each third-party backup tool has its unique I/O effects that must be accounted for.

•Data synchronization requirements of enterprise applications

Certain applications such as IBM DB2®, Oracle, and Microsoft Exchange, must be quiesced and flushed before performing backup operations. This process ensures data consistency of backed-up data images.

14.2.10 Resiliency to failure

Like all data processing equipment, storage devices sometimes fail. Most often the failure is of small, uncritical pieces that have redundancy such as disks, networks, fans, and power supplies. These failures generally have only a small effect (usually none at all) on the production environment. But unforeseen problems can cause rare and infrequent outages of entire storage systems. The most common issues are software problems that occur inside the storage system or infrastructure errors (such as DNS or routing tables) that prevent access to the storage system. If a storage system is running but cannot be accessed, the effect on the enterprise is effectively the same as it being out of service.

Designing 100% reliable configurations is difficult, time-consuming, and costly. Generally, strike a compromise that minimizes the likelihood of error while providing a mechanism to get the server back into service as quickly as possible. In other words, accept the fact that failures will occur, but have a plan ready and practiced to recover when they do.

Spare servers

Some enterprises keep spare equipment around in case of failure. Generally, this is the most expensive solution and is only practical for the largest enterprises.

An often overlooked similar situation is the installation of new servers. Additional or replacement equipment is always being brought into most data environments. Bringing this equipment in a bit early and using it as spare or test equipment is a good practice. Storage administrators can practice new procedures and configurations, and test new software without having to do so on production equipment.

Local clustering

The decision to use the high availability features of IBM System Storage N series is determined by availability and service level agreements. These agreements affect the data and applications that run on the IBM System Storage N series storage systems. If it is determined that a Active/Active configuration is needed, it affects sizing. Rather than sizing for all data, applications, and clients serviced by one IBM System Storage N series node, the workload is instead divided over two or more nodes.

Failover performance

Another aspect of a Active/Active configuration is failover performance. As an example, you have determined that the data, application, or clients require constant availability of the IBM System Storage N series, and use Active/Active configurations. However, you might have sized for normal operations on each node and not failover. So what was originally a normal workload for a single node has now doubled.

You also must consider the service level agreement for response time, data access, and application performance. How long can your customers work within a degraded performance environment? If the answer is not long at all, the initial sizing of each node also must take failover workload into consideration. Because failover operation is infrequent and usually remedied quickly, it is difficult to justify these additional standby resources unless maintaining optimum performance is critical. An example is a product ordering system with the data storage or application on an IBM System Storage N series storage system. Any effect on the ability to place an order affects sales.

Software upgrades

IBM regularly releases minor upgrades and patches for the Data ONTAP software. Less frequently there are also major release upgrades, such as version 8.1.

You need to be aware of the new software versions for these reasons:

•Patches address recently corrected software flaws

•Minor upgrades will bundle multiple patches together, and might introduce new features

•Major upgrades generally introduce significant new features

To remain informed of new software releases, subscribe to the relevant sections at the IBM automatic support notification website at:

https://www.ibm.com/support/mynotifications

Upgrades for Data ONTAP, along with mechanisms for implementing the upgrade are available on the web at:

http://www.ibm.com/storage/support/nas

Be sure that you understand the recommendations from the vendor and the risks. Use all the available protection tools such as Snapshots and mirrors to provide a fallback in case the upgrade introduces more problems than it solves. And whenever possible, perform incremental unit tests on an upgrade before putting an upgrade into critical production.

Testing

As storage environments become ever more complex and critical, the need for customer-specific testing increases in importance. Work with your storage vendors to determine an appropriate and cost-effective approach to testing solutions to ensure that your storage configurations are running optimally.

Even more important is that testing of disaster recovery procedures become a regular and ingrained process for everyone involved with storage management.

14.3 Summary

This chapter provided only a high-level set of guidelines for planning. Consideration of the issues addressed maximizes the likelihood for a successful initial deployment of an IBM System Storage N series storage system. Other sources of specific planning templates exist or are under development. Locate them by using web search queries.

Deploying a network of storage systems is not greatly challenging, and most customers can successfully deploy it themselves by following these guidelines. Because of the simplicity that appliances provide, if a mistake is made in the initial deployment, corrective actions are generally not difficult or overly disruptive. For many years customers have iterated their storage system environments into scalable, reliable, and smooth-running configurations. So getting it correct the first time is not nearly as critical as it was before the introduction of storage appliances.

If storage system planners and architects remember to keep things simple and flexible, success in deploying an IBM System Storage N series system can be expected.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 14. Designing an N series solution

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 14. Designing an N series solution