TS7700 usage considerations
In this chapter, we provide a general overview about the necessary information and considerations to choose the best configuration for your needs.
The TS7700 offers a great variety of configuration options, features, functions, and implementation parameters. Many of the options serve different purposes and their interactions with other options can affect how they contribute to achieving your business goals.
While for some environments, these features and functions are mandatory, for other environments, they will only raise the level of complexity.
There is no “one configuration and implementation fits all”. Therefore, you need a plan to build an environment to meet your requirements.
This chapter summarizes what to consider during the planning phase, especially for introducing new features. We also give some general suggestions about considerations for the day-to-day operation of a TS7700 environment.
 
3.1 Introduction
Since the first days of tape usage, the world has changed dramatically. Not only the amount of data stored has increased, but also the sources of data, data legal requirements, and the technical data management possibilities have grown dramatically.
In each new release of the Virtual Tape System (VTS), IBM has delivered new features to support our most demanding client needs. Consider that some of these functions are needed in your environment, others are not:
Some features are totally independent from all others, others are not.
Certain features have a strong impact on the behavior of your environment, for example, performance or data availability.
Specific features influence your setup of the environment.
Some features can be overruled by Overrides settings.
So, while these functions might be necessary to support different client requirements, they might not be required in all use cases. In fact, they might only add complexity to the solution or they might result in unexpected behaviors. Therefore, understanding the available features and how they complement, interact, and affect each other will help you plan an optimal and successful implementation.
3.1.1 A short look at history
At first, data was measured in Megabytes. Terabytes of data were hosted only by a small number of mainframe clients. You always knew the location of your data. When you mounted a tape (by operator or by robot), you can be sure that your data was written directly to that specific tape. The ownership of physical tapes was clear. If you wanted to have two tapes – you needed duplex writing from the host. If you wanted to relocate specific data to a different storage location, you moved that data to a specific tape. Your batch planners ensured that, if multifile was used, they belong to the same application – and that the same rules (duplexing and moving) apply. Sharing resources between multiple IBM MVS™ systems or even clients was mostly not desired or needed. You always were sure where your data on tape was located. Another important aspect was that users did not expect fast read response time for data on tape.
3.1.2 Challenges of today’s businesses
The amount of data is increasing tremendously. The growth is not only because “existing” data is stored for a longer period of time (legal requirements, audit, and compliance). Also, one area of data increasing is driven by higher requirements for disaster recovery scenarios (multiple copies). The real drivers for the growth are new sources of data, for example:
Emails.
Social networks and their global implications.
Web shops do not only record your actual shopping, but also your interest and buying patterns. Based on this information, you get personalized information per email or other communication paths.
Storing documents in paper form is not enough anymore.
Digital media.
This amount of data must be stored and protected somewhere. And this data must be accessible in fast ways. Response time requirements from users have changed.
Due to cost pressure, businesses are enforcing a Tier-Technology environment. Older or not often used data must reside on less expensive storage, while highly accessed data must stay on primary storage, which allows very fast access. Applications such as Content Manager, Hierarchical Storage Manager (HSM), or output archiver are rule-based and are able to shift data from one storage tier to another one. If you are considering using such applications, the tier concept needs to be planned carefully.
3.1.3 Challenges of technology progress
With advanced technology, there are challenges, such as having many new options to meet many client needs. For example, the TS7700 has many options for where data can be located and where and how it must be replicated. Investing some time in choosing the right set of rules will help you meet your requirements.
Also, the TS7700 itself decides which workload has to be prioritized. Depending on the cluster availability in the grid, actual workload, or other storage conditions, the copy queues might be delayed. In addition, the TS7700 automates many decisions in order to provide the most value. This dynamic behavior can sometimes result in unexpected behaviors or delays. Understanding how your environment behaves, including the location of the data during an outage or disaster, is key to having a successful implementation, for example:
During a mount, a remote Tape Volume Cache (TVC) was chosen over a local TVC.
Copies are intentionally delayed to configuration parameters, yet they were expected to complete sooner.
Copy Export sets do not include all the expected content since the export was initiated from a cluster that was not configured to receive a replica of all the content.
A reaction might be to configure your environment to define synchronous and immediate copies to all locations or to set all overrides. This likely increases the configuration capacity and bandwidth needs, which will not achieve the desired result. Planning and working with your IBM account team so that the optimal settings can be configured will help eliminate any unexpected behaviors.
Other features, such as scratch allocation assistance (SAA) and device allocation assistance (DAA), might affect your methodology of drive allocation, while some customizing parameters must always be used, if you are a Geographically Dispersed Parallel Sysplex (GDPS) user.
So, it is essential for you to understand these mechanisms to choose the best configuration and customize your environment. You need to understand the interactions and dependencies to plan for a successful implementation.
 
Note: There is no “one solution that fits all”. Do not introduce complexity when it is not required. Let IBM help you look at your data profile and requirements so that the best solution can be implemented for you.
3.2 Gather your business requirements
There are several different types of business requirements you need to consider. Consider the following lists as a starting point.
Requirements from the data owners, application administrators, and the applications
How important is the data? Consider multiple copies, Copy Consistency Points, retention requirements, and business recovery time expectations.
How often will the data be accessed, and what retrieval times are expected? Consider sizing and performance.
How will the application react if the tape environment is not available? Consider high availability (HA) and disaster recovery (DR) planning and copy consistency.
How will the application react if specific data is not available? Consider HA and DR planning and copy consistency.
How much storage for the data will be needed? Size future growth.
What are the performance expectations during an outage or disaster event?
While it is sometimes difficult to get information from the owners of the data and the owners of the applications, you can also start with the information you can determine from your existing tape environment and then verify that information with the application and data owners. Some of the requirements are in the service level agreements (SLAs) that are in place with the business lines of management.
Requirements from the IT department
Support of the general IT strategy (data center strategy and DR site support)
Sharing of a TS7700 environment between multiple logical partitions (LPARs)/sysplexes (definition of logical pools, physical pools, and performance)
Sharing of a TS7700 in a multi-tenancy environment (logical pools, physical pools, selective device access control (SDAC), export and migration capabilities, and performance)
Support of the automation concepts (monitoring and validation)
Environmental requirements (cooling and space)
Financial requirements
Multiple platforms required (System z Operating Systems)
Monitoring and automation capabilities to identify issues and degradations
Floor space requirements
Network infrastructure
Power availability
Depending on your overall IT strategy, application requirements and data owner requirements will be used to select an appropriate TS7700 configuration. If you have a dual site, also have at least a two-cluster grid spread across the data center.
If your approach is that each data center can host the total workload, plan your environment accordingly. Consider the possible outage scenarios and verify whether any potential degradations for certain configurations can be tolerated by the business until the full equipment is available again.
In a two-cluster environment, there is always a trade-off between availability and a non-zero point of recovery. Assume that data protection has the highest priority. In a service preparation situation, the workload must be stopped because in this situation only one valid copy is available, which is unacceptable. If availability is rated higher, you take the risk that in a DR situation some data may be lost.
But, more advanced TS7700 configurations can be implemented that allow both availability and data protection to be equally important, for example, a three-way or higher configuration.
3.3 What type of data do you store in your TS7700 environment
Depending on your type of data, you have multiple configuration choices.
We start with a general view before we look closer at the specific types of data.
3.3.1 Environment: Source of data
Depending on the method of creating data, you have different requirements. Assume you have all four types of systems to create data:
Sandbox system: Used to verify new operating and subsystem versions
Development system: Used to develop new applications
User Acceptance Test (UAT) system: Used for integration and performance testing
Production system
Consider the following guidelines:
Data from a sandbox system (regardless of whether it is backup or active data) might not need multiple copies because you can re-create the information from other sources (new installation, and so on).
Application data from a development system might not need multiple copies in different storage pools or data centers because the data can be re-created from production systems.
Application code from a development system likely needs multiple copies because that data might not be re-created from elsewhere.
Migrate data associated with UAT quickly to physical tape since the response might not be as critical as a production workload.
Not all production or backup workloads targeting the TS7700 might be replicated. Perhaps, you have those workloads managed differently for DR needs, or you do not need that workload in a DR event. These non-replicated workloads can optionally be Copy Exported as a DR alternative if replication is not feasible.
Data from your sandbox, test, UAT, or production system might share the same tape environment, but it can be treated differently. That is important for sizing, upgrades, and performance considerations as well.
 
Note: Plan your environments and the general rules for different types of environments. Understand the amount of data that these environments host today.
Next, we look closely at the data.
3.3.2 Backup data, active data, and archive data
In general, data from different applications has different requirements for your tape environment. Your tape processing environment may be all virtual, all physical, or a combination of the two.
Backup data
The data on tape is only a backup. Under normal conditions, it will not be accessed any more. It might be accessed again only in problems, such as DASD hardware problems, logical database failures, site outages, and so on.
Expiration
The expiration is mostly a short or medium time frame.
Availability requirements
In case the tape environment is not available for a short period of time, the application workload can still run without any impact. When the solution is unavailable, the backup to tape cannot be processed.
Retrieval requirements
Physical tape recall can be tolerated in most environments.
Multiple copies
Depending on your overall environment, a single copy (not located in the same place as the primary data) might be acceptable, perhaps on physical tape. However, physical media may fail or a storage solution or its site may experience an outage. Therefore, a second or additional copy is likely needed. These copies may exist on additional media within the same location or ideally at a distance from the initial copy.
If you use multiple copies, a Copy Consistency Point of Deferred might suffice, depending on your requirements.
Active data on tape
The data is stored only on tape. This data is not placed somewhere on DASD. If the data needs to be accessed, it will be read from the tape environment.
Expiration
The expiration depends on your application.
Availability requirements
When the tape environment is not available, your original workload might be severely affected.
Retrieval requirements
Physical tape recalls might not be tolerated, depending on your data source (sandbox, test, or production) or the type of application.
Multiple copies
Although tape is the primary source, a single copy is not suggested. Even a media failure will result in data loss. Multiple copies must be stored in different locations to be prepared for a data center loss or outage. In a stand-alone environment, dual copies are suggested.
Depending on the recovery point objectives (RPO) of the data, choose an appropriate Consistency Point Policy. For example, Synchronous Mode replication is a popular choice for these workloads because it can achieve a zero point RPO at sync point granularity.
Archive data on tape
Archive data on tape is also active data. However, archive data is usually stored for a long time. Expiration dates for 10 - 30 years to satisfy regulatory requirements are common. Sometimes, Logical Write Once Read Many (LWORM) data is required.
Expiration
The expiration depends on your application, but it is usually many years.
Availability requirements
Archive data is usually seldom accessed for read. If the tape environment is not available, your original workload might still be affected because you cannot write new archive data.
Retrieval requirements
Physical tape recalls might be tolerated.
Multiple copies
Although the tape is the primary source, a single copy is not suggested. Even a media failure will result in data loss. Store multiple copies in different locations to be prepared for a data center loss. In a stand-alone environment, dual copies are suggested.
Depending on the criticality of the data, choose an appropriate Copy Consistency Point Policy.
Remember that archive data sometimes has to be kept for 10 - 30 years. During such long time periods, the technology progresses, and data migration to newer technology may need to take place. If your archive data resides on physical tapes in a TS7740, you must also consider the life span of physical tape cartridges. Some vendors suggest that you replace their cartridges every five years, other vendors, such as IBM, offer tape cartridges that have longer lifetimes.
If you are using a TS7740 and you store archive data in the same storage pools with normal data, there is a slight chance, that due to the reclaim process, the number of stacked volumes containing only archive data will increase. In this case, these cartridges may not be used (either for cartridge reads or reclaim processing) for a longer period of time. Media failures might not be detected. If you have more than one copy of the data, the data can still be accessed. However, you have no direct control over where this data is stored on the stacked volume, and the same condition might occur in other clusters, too.
Therefore, consider storing data with such long expiration dates on a specific stacked volume pool. Then, you can plan regular migrations (even in a 5 - 10 year algorithm) to another stacked volume pool. You might also decide to store this data in the common data pool.
3.3.3 DB2 archive log handling
With DB2, you have many choices about how to handle your DB2 archive logs. You can put both of them to DASD and maybe rely on a later migration to tape through DFSMShsm or an equivalent application. You can write one archive log to DASD and another one to tape. Or, you can put them both directly to tape.
Depending on your choice, the tape environment is more or less critical to your DB2 application. This depends also on the number of active DB2 logs you have defined in your DB2 environment. In some environments, due to peak workload, logs are switched every two minutes. If all DB2 active logs are used and they cannot be archived to tape, DB2 will stop processing.
Scenario
You have a four-cluster grid, spread over two sites. A TS7720 and a TS7740 are at each site. You store one DB2 archive log directly on tape and the other archive log on disk. Your requirement is to have two copies on tape:
Using the TS7720 can improve your recovery (no recalls from physical tape needed).
Having a consistency point of R, N, R, N provides two copies, stored in both TS7720s. As long as one TS7720 is available, DB2 archive logs can be stored to tape. However, if one TS7720 is not available, you have only one copy of the data. In a DR situation where one of the sites is not usable for a long period of time, you may want to change your policies to replicate this workload additionally to the local TS7740.
If the TS7720 enters the “Out of cache resources” state, new data and replications to that cluster will be put on hold. To avoid this situation, consider having this workload also target the TS7740 and allow the Automatic Removal policy to free up space in the TS7720. Until the “Out of cache resources” state is resolved, you may have fewer copies than expected within the grid.
If one TS7720 is not available, all mounts have to be executed on the other TS7720 cluster.
In the unlikely event that both TS7720s are not reachable, DB2 will stop working as soon as all DB2 logs on the disk are consumed.
Having a consistency point of R, N, R, D provides you with three copies, which are stored in both TS7720s and in the TS7740 of the second location. That exceeds your original requirement, but in an outage of any component, you still have two copies. In a loss of the primary site, you do not need to change your DB2 settings because two copies are still written. In an “Out of Cache resources” condition, the TS7720 can remove the data from cache because there is still an available copy in the TS7740.
 
Note: Any application with the same behavior can be treated similarly.
3.3.4 Object access method: Object processing
You can use the object access method (OAM) to store and transition object data in a storage hierarchy. Objects can reside on disk (in DB2 tables, the IBM System z® file system (zFS), or Network File System (NFS) mountable file systems), optical, and tape storage devices. You can choose how long an object is stored on disk before it will be migrated to tape. Remember that if the object is moved to tape, this is active data. Users accessing the data on tape (in particular the TS7740) may have to wait for their document until it is read from physical media. With the data remaining in cache, the TS7720 may be a better fit.
The same conditions apply for DB2 archive logging.
In OAM, you can define up to two backup copies. Backup copies of your object data (maintained by OAM) are in addition to any copies of your primary data being that are maintained by the TS7700. Determine the copy policies for your primary data and any additional OAM backup copies that may be needed. The backups maintained by OAM are only used if the primary object is not accessible. The backup copies may reside on native tape, in the TS7720 or TS7740.
3.3.5 Batch processing: Active data
If you create data in the batch process, which is not stored on disk, it is also considered active data. The access requirements of this data determine whether the data resides on a TS7720, a TS7740, or both. For example, very active data that needs quick access is ideal for a TS7720. Rarely accessed data that does not demand quick access times can reside on the TS7740. Data that becomes less important with time can also use the TS7720 auto-removal policies to benefit from both technologies.
Assume that you have the same configuration as the DB2 archive log example:
With a Consistency Copy Point policy of [N,R,N,R], your data will be stored only on the TS7740s (fast access is not critical).
With a Consistency Copy Point policy of [R,N,R,N], your data will only be stored on the TS7720s (fast access is critical).
With a Consistency Copy Point policy of [R,D,R,D], your copy will reside on the TS7720s first and then also on the TS7740s, allowing the older data to age off of the TS7720s via the auto-removal policy.
3.4 Features and functions
Based on the gathered requirements, you can now decide which features and functions you want to use in your TS7700 environment.
3.4.1 Stand-alone versus grid environments
Consider a stand-alone cluster in the following conditions:
You do not need a high availability or an electronic DR solution
You can handle the effect to your application in a cluster outage
In a data center loss, a data loss is tolerable or a recovery from Copy Export tapes is feasible (time and DR site)
You can plan outages for microcode loads or upgrade reasons
If you cannot tolerate any of these items, consider implementing a grid environment.
3.4.2 Sharing a TS7700
Sharing a TS7700’s resources is supported in most use cases. Whether the environment includes different applications within a common sysplex, independent sysplexes, or System z Operating Systems, the TS7700 can be configured to provide shared access. The TS7700 can also be shared between multiple tenants.
Because the TS7700 is policy-managed, each independent workload can be treated differently with respect to how the data is managed within the TS7700. For example, different workloads can reside on independent physical volume pools within a TS7740. Or, different workloads can use different replication requirements. All applications within a Parallel Sysplex can use the same logical device ranges and logical volume pools, simplifying sharing resources. When independent sysplexes are involved, device ranges and volume ranges are normally independent, but are still allowed to share the disk cache and physical tape resources.
Of all the sharing use cases, the majority share the FICON channels into the TS7700. Although the channels can also be physically partitioned, it is not necessary because each FICON channel has access to all device and volume ranges within the TS7700.
However, there are still considerations:
The TVC is used in common. You cannot define a physical limit to the amount of space a client is using in the TVC. However, through policy management, you can use preference groups differently in a TS7740 or the removal policies can be configured differently, giving more TVC priority to some workloads over others.
Define the scratch categories that the different systems will use. The scratch categories are specified in the DEVSUPxx parmlib member.
Decide which VOLSER ranges the different systems will use. This is typically handled through the tape management system. For DFSMSrmm, this is handled through their PRTITION and OPENRULE parameters.
Another main item to consider is how the drives will be managed across the different systems and which systems will share which drives. This is typically handled through a tape device sharing product.
Storage management subsystem (SMS) constructs and constructs on the TS7700 must match. If not, new constructs in SMS will lead to new constructs in the TS7700 that are created with default parameters. To avoid the uncontrolled buildup of constructs in the TS7700, SMS must be controlled by a single department.
SMS constructs used by different workloads need to use unique names when the TS7700 behavior is expected to be different. This will allow each unique application’s behavior to be tuned within the TS7700. If the behavior is common across all shared workloads, the same construct names can be used.
Ensure that the single defined set of constructs within the TS7700 are configured with a behavior that is acceptable to all users. If not, different constructs must be used for those customers.
Control of the TS7700 Management Interfaces and TS3500 GUI must be only allowed to a single department that controls the entire environment. Control must not be given to a single customer.
Review the IBM RACF® statements for Devserv and Library commands on all LPARs. These commands must be protected. In a multiple client environment, the use of Library commands must be restricted.
When independent sysplexes are involved, the device ranges and corresponding volume ranges can be further protected from cross-sysplex access through the Selective Device Access Control feature.
When device partitioning is used, consider assigning the same number of devices per cluster per sysplex in a grid configuration so that the availability of the grid is equal across all sysplexes.
Override policies set in the TS7700 apply to the whole environment and cannot be enabled or disabled by an LPAR or client.
For additional considerations, see the IBM Redbooks publication Guide to Sharing and Partitioning IBM Tape Library Data, SG24-4409.
Note: Some parameters can be updated by the Library Host command. This command changes the cluster behavior. This is not only valid for the LPAR where the command was executed, but for all LPARs that use this cluster.
Ensure that only authorized personnel can use the Library Host command.
If you share a library for multiple customers, establish regular performance and resource usage monitoring. See 3.5, “Operation aspects: Monitoring and alerting” on page 122.
3.4.3 Tape volume cache selection
Depending on your Copy Consistency Policies, the cluster where the virtual tape mount occurred is not necessarily the cluster that is selected as the TVC. When a TVC other than the local TVC is chosen, this is referred to as a remote mount. Plan the Copy Consistency Policy so that you are aware where your data resides at any point in time.
TVC selection may also influence the Copy Export. See 11.5, “Copy Export overview and Considerations” on page 789.
3.4.4 Consistency policy
Define the consistency policy for each Management Class. For a detailed discussion, see 2.3.5, “Copy Consistency Points” on page 54.
Here are several general considerations:
When a cluster is assigned a policy of 'N', this cluster will not be the target of a replication activity:
 – This cluster cannot be chosen as the TVC (it can be chosen as the mount point)
 – If only 'N' clusters are available, any mount that uses that Management Class will fail.
A consistency point of [D, D, D, D] means that the selected TVC will be treated as RUN, and the additional copies will be performed asynchronously. For a scratch selection, the mount point cluster is normally chosen as the TVC, although it is not required. Copy Override settings can be used to prefer that it also acts as the TVC.
A consistency point of [D, R, D, D] means that Cluster 1 will be preferred as the TVC, even if the other cluster is chosen as the mount point. Therefore, the 'R' location will be preferred, which can result in a remote mount when the mount point is not the same as the 'R' location. This may be done intentionally in order to create a remote version as the initial instance of a volume.
If you do not care which TVC is chosen and you prefer a balanced grid, use “D, D, D, D.”
3.4.5 Override policies
Override policies overrule the explicit definitions of Copy policies.
 
Note: Synchronous mode is not subject to override policies.
The Override policies are cluster-based. They cannot be influenced by the attached hosts. With Override policies, you can influence the TVC selection and whether a copy needs to be present in that cluster.
Copy Count Override allows the client to define for this cluster that at least two or more copies exist at RUN time, but the client does not care which clusters have a copy at this point in time.
If you use Copy Count Override, the grid configuration and available bandwidth between locations will likely determine which RUN copies meet the count criteria. Therefore, the limited numbers of copies may be within the closest locations versus at longer distances. Remember this if you use this override.
3.4.6 Cluster family
Cluster families can be introduced to help with TVC selection or replication activity. You might want to use them for the following conditions:
You have an independent group or groups of clusters serving a common purpose within a larger grid.
You have one or more groups of clusters with limited bandwidth between the group and other clusters in the grid.
Cluster families provide two essential features:
During mounts, clusters within the same family as the mount point cluster are preferred for TVC selection.
During replication, groups of clusters in a family cooperate and distribute the replication workload inbound to its family, which provides the best utilization of the limited network outside of the family.
Therefore, grid configurations with three or more clusters may benefit from cluster families.
3.4.7 Expire Time (resource efficient) versus legacy implementation
If the tape management system sets a virtual volume from Private to SCRATCH, the device category for this virtual volume is changed in the TS7700. If you do not specify an expiration time, the data will be kept on this volume – even if it is scratch – until the volume is used for another scratch mount and the data is overwritten from the host.
A scratch category can have a defined expiration time, allowing the volume contents for those volumes that are returned to scratch to be automatically deleted after a grace period has passed. The grace period can be configured from one hour to many years. Volumes in the scratch category will then either be expired with time or reused, whichever comes first. If physical tape is present, the spaced utilized by the deleted logical volume is then a candidate for reclamation.
After the volume is deleted or reused, content that was previously present is no longer accessible. An inadvertent return to scratch may result in loss of data, so a longer expiration grace period is suggested to allow any return to scratch mistakes to be corrected within your host environment. To prevent reuse during this grace period, an additional hold option can be configured to prevent access and reuse during the grace period. This provides a window of time where a host-initiated mistake can be corrected, allowing the volume to be moved back to a private state while retaining the previously written content.
3.4.8 Data security: Encryption
Depending on your legal requirements and your type of business, data encryption might be mandatory.
Consider the following information:
If you use theTS7740 Copy Export feature and encrypt the export pool, you must ensure that you can decrypt the tapes in the export location. You need to have access to an external key manager that has the appropriate keys available.
Ensure that the recovery site can read your encrypted tapes (same or compatible drives that can read the exported media format).
TVC encryption for data at rest in disk cache can only be enabled against the entire cache repository.
Both physical tape and TVC encryption can be enabled at any time. After TVC encryption is enabled, it cannot be disabled without a rebuild of the disk cache repository.
3.4.9 Allocation assistance
Allocation assistance is functionality built into z/OS and the TS7700 that allows both private and scratch mounts to be more efficient when choosing a device within a grid configuration where the same sysplex is connected to two or more clusters in an active configuration.
If you use the allocation assistance, the allocation routine in z/OS will be influenced by additional information from the grid environment. Several aspects are honored to find the best mount point in a grid for this mount. For more information, see 2.3.15, “Allocation assistance” on page 61.
Depending on your configuration, your job execution scheduler, and any automatic allocation managers you might use, the allocation assist function might provide value to your environment.
Currently, this function is only supported in a JES2 environment. A statement of direction to support JES3 exists.
If you use any dynamic tape manager, such as the IBM Automatic Tape Allocation Manager, plan the introduction of SAA and DAA carefully. Some dynamic tape managers manage devices in an offline state. Because allocation assist functions assume online devices, issues can surface.
Therefore, consider keeping some drives always online to a specific host and leave only a subset of drives to the dynamic allocation manager. Or, discontinue working with a dynamic tape allocation manager.
Included in the z/OS operating system, automatic tape switching (ATS) STAR works with online devices and is compatible with DAA and SAA.
 
Note: In a cluster outage, the number of virtual drives is decreased. Plan the distribution of your virtual drives to host LPARs to reflect these outage conditions.
3.5 Operation aspects: Monitoring and alerting
To ensure that your TS7700 environment works as expected and to be notified of any potential issues or trends, perform these steps:
Check for the introduction of new messages into your automation and alerting tool.
Use automation to trap on alerts of interest that are surfaced to the hosts.
Gather long-term statistics through tools, such as VEHSTATS and BVIR, to retain data for trending.
Periodically monitor your environment through host commands, tools, or the management interface for any existing issues.
Analyze any changes in the workload profile or behavior of the grid environment to ensure that the overall configuration operates as expected and whether changes must occur.
3.5.1 Handling messages
With each new feature or microcode release, new messages might be introduced. Usually, they are described in the PTF description or mentioned in the messages and codes books.
Identify all new messages for the TS7700 (usually CBRxxxx) and review them. Evaluate the meanings to understand how they relate to your business.
Identify the appropriate action that an operator or your automation tool must execute. Introduce the new messages in your automation tool (with the appropriate action) or alert the message for human intervention.
3.5.2 Regularly scheduled performance monitoring
Regularly scheduled performance monitoring enables you to perform these tasks:
See trends in your workload profile and react before shortages occur
Identify arising performance issues
Provide a database with comparison data in case you need it for a performance issue
The TS7700 keeps performance data for the last 90 days. You need to store this information only if you need a longer period of time for trending analysis or as baseline performance numbers. Then, set up regular Bulk Volume Information Retrieval (BVIR) runs and keep the data. Check this data on a periodic regular basis to see the usage trends, especially for shortage conditions.
3.5.3 Optionally: Check your data
In addition to the performance numbers, the TS7700 environment also allows you to run audit trails against the TS7700 environment. These audit runs allow you to verify whether all clusters have received copies of logical volumes that can optionally take into account the configured copy policies or ignore them and assume all volumes must be copied everywhere.
Consider running audit trails after major changes in the environment, such as joins, merges, or removal of clusters. Also, you can run audit trails if you have introduced new Copy Consistency Policies or override policies.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset