Cluster partitioning management update
From Version 7.1 forward, PowerHA SystemMirror provides more split and merge policies. Split and merge policies are important features in PowerHA SystemMirror because they are used to protect customers’ data consistency and maintain application running stably in cluster split scenarios and other unstable situations. They are vital for customer environments.
This chapter describes split and merge policies.
This chapter covers the following topics:
10.1 Introduction to cluster partitioning
During normal operation, cluster nodes regularly exchange messages, commonly called heartbeats, to determine the health of each other. Figure 10-1 depicts a healthy two-node PowerHA cluster.
Figure 10-1 A healthy two-node PowerHA Cluster with heartbeat messages exchanged
When both the active and backup nodes fail to receive heartbeat messages, each node falsely declares the other node to be down, as shown in Figure 10-2. When this happens, the backup node attempts to takeover the shared resources, including shared data volumes. As a result, both nodes might be writing to the shared data and caused data corruption.
Figure 10-2 Cluster that is partitioned when nodes failed to communicate through heartbeat message exchange
When a set of nodes fails to communicate with the remaining set of nodes in a cluster, the cluster is said to be partitioned. This is also known as node isolation, or more commonly, split brain.
 
Note: As two-node clusters are by far the most common PowerHA cluster configuration, we introduce cluster partitioning concepts in the following sections in the context of a two-node cluster. These basic concepts can be applied similarly to clusters with more than two nodes and are further elaborated where necessary.
10.1.1 Causes of a partitioned cluster
Loss of all heartbeats can be caused by one of the following situations:
When all communication paths between the nodes fail (as shown in Figure 10-2 on page 316).
Here is an example scenario based on a real-world experience:
a. A cluster had two communication paths for heartbeat, the network and repository disk. The PowerHA network heartbeat mode was configured as multicast.
b. One day, a network configuration change was made that disabled the multicast network communication. As a result, network heartbeating no longer worked. But, system administrators were unaware of this problem because they did not monitor the PowerHA network status. The network heartbeat failure was left uncorrected.
c. The cluster continued to operate with heartbeat through the repository disk.
d. Some days later, the repository disk failed and the cluster was partitioned.
One of the nodes is sick but not dead.
One node cannot send/receive heartbeat messages for a period, but resumes sending/receiving heartbeat messages afterward.
Another possible scenario is:
a. There is a cluster with nodes in separate physical hosts with dual Virtual I/O Servers (VIOSs).
b. Due to some software or firmware defect, one node cannot perform I/O through the VIOSs for a period but resumes I/O afterward. This causes an intermittent loss of heartbeats through all communication paths between the nodes.
c. When the duration of I/O freeze exceeds the node failure detection time, the nodes declare each other as down and the cluster is partitioned.
Although increasing the number of communication paths for heartbeating can minimize the occurrence of cluster partitioning due to communication path failure, the possibility cannot be eliminated completely.
10.1.2 Terminology
Here is the terminology that is used throughout this chapter:
Cluster split When the nodes in a cluster fail to communicate with each other for a period, each node declares the other node as down. The cluster is split into partitions. A cluster split is said to have occurred.
Split policy A PowerHA split policy defines the behavior of a cluster when a cluster split occurs.
Cluster merge A PowerHA cluster merge policy defines the bahavior of a cluster when a cluster merge occurs.
Merge policy A PowerHA merge policy defines the behavior of a cluster when a cluster merge occurs.
Quarantine policy A PowerHA quarantine policy defines how a standby node isolates or quarantines an active node or partition from the shared data to prevent data corruption when a cluster split occurs.
Critical resource group When multiple resource groups (RGs) are configured in a cluster, the RG that is considered as most important or critical to the user is defined as the Critical Resource Group for a quarantine policy. For more information, see 10.3.1, “Active node halt quarantine policy” on page 328.
Standard cluster A standard cluster is a traditional PowerHA cluster.
Stretched cluster A stretched cluster is a PowerHA V7 cluster with nodes that are in sites within the same geographic location. All cluster nodes are connected to the same active and backup repository disks in a common storage area network (SAN).
Linked cluster A linked cluster is a PowerHA V7 cluster with nodes that are in sites in different geographic locations. Nodes in each site have their own active and backup repository disks. The active repository disks in the two sites are kept in sync by Cluster Aware AIX (CAA).
10.2 PowerHA cluster split and merge policies (before PowerHA V7.2.1)
This section provides an introduction to PowerHA split and merge policies before PowerHA for AIX V7.2.1.
For more information, see the following IBM Redbooks:
IBM PowerHA SystemMirror for AIX Cookbook, SG24-7739
IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278
10.2.1 Split policy
Before PowerHA V7.1, when a cluster split occurs, the backup node tries to take over the resources of the primary node, which results in a split-brain situation.
The PowerHA split policy was first introduced in PowerHA V7.1.3 with two options:
None
This is the default option where the primary and backup nodes operate independently of each other after a split occurs, resulting in the same behavior as earlier versions during a split-brain situation.
Tie breaker
This option is applicable to only clusters with sites configured. When a split occurs, the partition that fails to acquire the SCSI reservation on the tie-breaker disk has its nodes restarted. For a two-node cluster, one node is restarted, as shown in Figure 10-3 on page 319.
 
Note: EMC PowerPath disks are not supported as tie-breaker disks.
Figure 10-3 Disk tie-breaker split policy
PowerHA V7.2 added the following options to the split policy:
Manual option
Initially, this option was applicable only to linked clusters. However, in PowerHA V7.2.1, it is now available for all cluster types. When a split occurs, each node waits for input from the user at the console to choose whether to continue running cluster services or restart the node.
NFS support for the tie-breaker option
When a split occurs, the partition that fails to acquire a lock on the tie-breaker NFS file has its nodes restarted. For a two-node cluster, one node is restarted, as shown in Figure 10-4.
Figure 10-4 NFS tie-breaker split policy
 
Note: PowerHA V7.2.1 running and migrated to AIX 7.2.1 supports subcluster split and merge functions among all types of PowerHA clusters.
10.2.2 Merge policy
Before PowerHA V7.1, the default action when a merge occurs is to halt one of the nodes based on a predefined algorithm, such as halting the node with the highest node ID. There is no guarantee that the active node is not the one that is halted. The intention is to minimize the possibility of data corruption after a split-brain situation occurs.
The PowerHA merge policy was first introduced in PowerHA V7.1.3 with two options:
Majority
This is the default option. The partition with the highest number of nodes remains online. If each partition has the same number of nodes, then the partition that has the lowest node ID is chosen. The partition that does not remain online is restarted, as specified by the chosen action plan. This behavior is similar to previous versions, as shown in Figure 10-5.
Figure 10-5 Default merge policy: Halt one of the nodes
Tie breaker
Each partition attempts to acquire a SCSI reserve on the tie-breaker disk. The partition that cannot reserve the disk is restarted, or has cluster services that are restarted, as specified by the chosen action plan. If this option is selected, the split-policy configuration must also use the tie-breaker option.
PowerHA V7.2 added the following options to the merge policy:
Manual option
This option is applicable only to linked clusters. When a split occurs, each node waits for input from the user at the console to choose whether to continue running cluster services or restart the node.
Priority option
This policy indicates that the highest priority site continues to operate when a cluster merge event occurs. The sites are assigned with a priority based on the order they are listed in the site list. The first site in the site list is the highest priority site. This policy is only available for linked clusters.
3. NFS support for the tie-breaker option
When a split occurs, the partition that fails to acquire a lock on the tie-breaker NFS file has its nodes restarted. If this option is selected, the split-policy configuration must also use the tie-breaker option.
10.2.3 Configuration for the split and merge policy
Complete the following steps:
1. In the SMIT interface, select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split Management Policy, as shown in Figure 10-6.
Figure 10-6 Configuring the cluster split and merge policy
2. Select TieBreaker. The manual option is not available if the cluster you are configuring is not a linked cluster. Select either Disk or NFS as the tie breaker, as shown in Figure 10-7.
Figure 10-7 Selecting the tie-breaker disk
Disk tie-breaker split and merge policy
Select the disk to be used as the tie-breaker disk and synchronize the cluster. Figure 10-8 shows select hdisk3 as the tie breaker device.
Figure 10-8 Tie-breaker disk split policy
Figure 10-9 shows the result after confirming the configuration.
Figure 10-9 Tie-breaker disk successfully added
Before configuring a disk as tie breaker, you can check its current reservation policy by using the AIX command devrsrv, as shown in Example 10-1.
Example 10-1 The devrsrv command shows no reserve
root@testnode1[/]# devrsrv -c query -l hdisk3
Device Reservation State Information
==================================================
Device Name : hdisk3
Device Open On Current Host? : NO
ODM Reservation Policy : NO RESERVE
Device Reservation State : NO RESERVE
When cluster services are started, the first time after a disk tie breaker is configured on a node, the reservation policy of the tie-breaker disk is properly set to PR_exclusive with a persistent reserve key, as shown in Example 10-2.
Example 10-2 The devrsrv command shows PR_exclusive
root@testnode1[/]# devrsrv -c query -l hdisk3
Device Reservation State Information
==================================================
Device Name : hdisk3
Device Open On Current Host? : NO
ODM Reservation Policy : PR EXCLUSIVE
ODM PR Key Value : 8477804151029074886
Device Reservation State : NO RESERVE
Registered PR Keys : No Keys Registered
PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C
PR Capabilities Byte[3] : 0xa1 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
For a detailed description of how SCSI-3 PR (Persistent Reserve) of a tie-breaker disk works, refer to “SCSI reservation” in Appendix A of the IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278.
When the Tie Breaker option of the split policy is selected, the merge policy is automatically set with the same tie-breaker option.
NFS tie-breaker split and merge policy
This section describes the tie-breaker split and merge policy tasks.
NFS server that is used for a tie breaker
The NFS server that is used for tie breaker is connected to a physical network other than the service networks that are configured in PowerHA. A logical choice is the management network that usually exists in all data center environment.
To configure the NFS server, complete the following steps:
1. Add /etc/host entries for the cluster nodes, for example:
172.16.25.31 testnode1
172.16.15.32 testnode2
2. Configure the NFS domain by running the following command:
chnfsdom powerha
3. Start nfsrgyd by running the following command:
startsrc –s nfsrgyd
4. Add an NFS file system for storing the tie-breaker files, as shown in Figure 10-10.
Figure 10-10 Adding a directory for NFS export
Here the NFS server is used as tie breaker for two clusters, redbookcluster and RBcluster, as shown in Example 10-3 on page 325.
Example 10-3 Directories exported
Example:
[root@atsnim:/]#exportfs
/software -vers=3,public,sec=sys:krb5p:krb5i:krb5:dh,rw
/pha -vers=3:4,sec=sys:krb5p:krb5i:krb5:dh,rw
/docs -vers=3,public,sec=sys:krb5p:krb5i:krb5:dh,rw
/sybase -sec=sys:krb5p:krb5i:krb5:dh,rw,root=172.16.0.0
/leilintemp -sec=sys:none,rw
/powerhatest -sec=sys:krb5p:krb5i:krb5:dh,rw,root=testnode1
/tiebreakers/redbookcluster -vers=4,sec=sys,rw,root=testnode1:testnode2
/tiebreakers/RBcluster -vers=4,sec=sys,rw,root=testnode3:testnode4
On each PowerHA node
Complete the following tasks:
1. Add an entry for the NFS server to /etc/hosts:
10.1.1.3 tiebreaker
2. Configure the NFS domain by running the following command:
chnfsdom powerha
3. Start nfsrgyd by running the following command:
startsrc –s nfsrgyd
4. Add the NFS tie-breaker directory to be mounted, as shown in Figure 10-11.
Figure 10-11 NFS directory to mount
Configuring PowerHA on one of the PowerHA nodes
Complete the following steps:
1. Configure the PowerHA tie-breaker split/merge policy, as shown in Figure 10-12.
Figure 10-12 NFS tie-breaker split policy
a. Input the host name of NFS server exporting the tie-breaker directory for the NFS tie breaker, for example, tiebreakers.
b. Add the IP entry for the host name of the NFS server to /etc/hosts:
 • Full path name of local mount point for mounting the NFS tiebreaker directory. For example, /tiebreaker.
 • Full path name of the directory that is exported from the NFS server. In this case /tiebreaker.
Figure 10-13 on page 327 shows an example of the NFS tie-breaker configuration.
Figure 10-13 NFS tie-breaker configuration
2. Sync cluster
When cluster services are started on each node, tie-breaker files are created on the NFS server, as shown in Example 10-4.
Example 10-4 NFS tie-breaker files created
[root@tiebreaker:/]#ls -Rl /tiebreakers
total 0
drwxr-xr-x 3 root system 256 Nov 23 20:50 RBcluster
drwxr-xr-x 2 root system 256 May 27 2016 lost+found
drwxr-xr-x 3 root system 256 Nov 23 20:51 redbookcluster
 
/tiebreakers/RBcluster:
total 0
-rwx------ 1 root system 0 Nov 23 20:50 PowerHA_NFS_Reserve
drwxr-xr-x 2 root system 256 Nov 23 20:50 PowerHA_NFS_ReserveviewFilesDir
 
/tiebreakers/RBcluster/PowerHA_NFS_ReserveviewFilesDir:
total 16
-rwx------ 1 root system 257 Nov 23 20:50 testnode3view
-rwx------ 1 root system 257 Nov 23 20:50 testnode4view
 
/tiebreakers/redbookcluster:
total 0
-rwx------ 1 root system 0 Nov 23 20:51 PowerHA_NFS_Reserve
drwxr-xr-x 2 root system 256 Nov 23 20:51 PowerHA_NFS_ReserveviewFilesDir
 
/tiebreakers/redbookcluster/PowerHA_NFS_ReserveviewFilesDir:
total 16
-rwx------ 1 root system 257 Nov 23 20:51 testnode1view
-rwx------ 1 root system 257 Nov 23 20:51 testnode2view
10.3 PowerHA quarantine policy
This section introduces the PowerHA quarantine policy. For more information about PowerHA quarantine policies, see IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278.
Quarantine policies were first introduced in PowerHA V7.2. A quarantine policy isolates the previously active node that was hosting a critical RG after a cluster split event or node failure occurs. The quarantine policy ensures that application data is not corrupted or lost.
There are two quarantine policies:
1. Active node halt
2. Disk fencing
10.3.1 Active node halt quarantine policy
When an RG is online on a cluster node, the node is said to be the active node for that RG. The backup or standby node for the RG is a cluster node where the RG comes online when the active node fails or when the RG is manually moved over.
With the active node halt policy (ANHP), in the event of a cluster split, the standby node for a critical RG attempts to halt the active node before taking over the RG and any other related RGs. This task is done by issuing command to the HMC, as shown in Figure 10-14.
Figure 10-14 Active node halt process
If the backup node fails to halt the active node, for example, the communication failure with HMC, the RG is not taken over. This policy prevents application data corruption due to the same RGs being online on more than one node at the same time.
Now, let us elaborate why we need to define a critical RG.
In the simplest configuration of a two-node cluster with one RG, there is no ambiguity as to which node can be halted by the ANHP in the event of a cluster split. But, when there are multiple RGs in a cluster, it is not as simple:
In a mutual takeover cluster configuration, different RGs are online on each cluster node and the nodes back up each other. An active node for one RG also is a backup or standby node for another RG. When a cluster split occurs, which node halts?
When a cluster with multiple nodes and RGs is partitioned or split, some of the nodes in each partition might have RGs online, for example, there are multiple active nodes in each partition. Which partition can have its nodes halted?
It is unwanted to have nodes halting one another, resulting in the cluster down as a whole.
PowerHA V7.2 introduces the Critical Resource Groups for a user to define which RG is the most important one when multiple RGs are configured. The ANHP can then use the critical RG to determine which node is halted or restarted. The node or the partition with the critical RG online is halted/ restarted and quarantined, as shown in Figure 10-14 on page 328.
10.3.2 Disk fencing quarantine
With this policy, the backup node fences off the active node from the shared disks before taking over the active node’s resources, as shown in Figure 10-15. This action prevents application data corruption by preventing the RG coming online on more than one node at a time. As for the ANHP, the user also must define the Critical Resource Group for this policy.
Because this policy only fences off disks from the active node without halt or restarting it, it is configured together with a split and merge policy.
Figure 10-15 Disk fencing quarantine
10.3.3 Configuration of quarantine policies
In the SMIT interface, select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Quarantine Policy, as shown in Figure 10-16.
Figure 10-16 Active node halt policy
The active node halt
This task consists of the following steps:
1. Configure the HMC for the cluster nodes to run HMC commands remotely without the need to specify password.
2. Add the public keys (id_rsa.pub) of cluster nodes to the authorized_keys2 in the .ssh directory on the HMC.
3. Configure the HMC to be used for halting nodes when the split occurs, as shown in Figure 10-17, Figure 10-18, and Figure 10-19 on page 332.
Figure 10-17 Active node halt policy HMC configuration
Figure 10-18 HMC definition for active node halt policy
Figure 10-19 Add HMC for active node halt policy
4. Configure the ANHP and specify the Critical Resource Group, as shown in Figure 10-20, Figure 10-21 on page 333, and Figure 10-22 on page 333.
Figure 10-20 Configure active node halt policy
Figure 10-21 Critical resource group for active node halt policy
Figure 10-22 Critical resource group add success
Disk fencing
Similar to the ANHP, a critical RG must be selected to go along with it, as shown in Figure 10-23 and Figure 10-24.
Figure 10-23 Disk fencing quarantine policy
Figure 10-24 Disk fencing critical resource group
The current setting of the quarantine policy can be checked by using clmgr, as shown in Example 10-5.
Example 10-5 The clmgr command displaying the current quarantine policy
root@testnode1[/]#clmgr query cluster | grep -i quarantine
QUARANTINE_POLICY="fencing"
 
Important: The disk fencing quarantine policy cannot be enabled or disabled if cluster services are active.
When cluster services are started on a node after enabling the Disk Fencing quarantine policy, the reservation policy and state of the shared volumes are set to PR Shared with the PR keys of both nodes registered. This action can be observed by using the devrsrv command, as shown in Example 10-6.
Example 10-6 Query reservation policy
root@testnode3[/]#clmgr query cluster | grep -i cluster_name
CLUSTER_NAME="RBcluster"
 
root@testnode3[/]#clmgr query nodes
testnode4
testnode3
 
root@testnode3[/]#clmgr query resource_group
rg
root@testnode3[/]#clmgr query resource_group rg | grep -i volume
VOLUME_GROUP="vg1"
root@testnode3[/]#lspv
hdisk0 00f8806f26239b8c rootvg active
hdisk2 00f8806f909bc31a caavg_private active
hdisk3 00f8806f909bc357 vg1 concurrent
hdisk4 00f8806f909bc396 vg1 concurrent
 
root@testnode3[/]#clRGinfo
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
rg ONLINE testnode3
OFFLINE testnode4
 
root@testnode3[/]#devrsrv -c query -l hdisk3
Device Reservation State Information
==================================================
Device Name : hdisk3
Device Open On Current Host? : YES
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 4503687425852313
Device Reservation State : PR SHARED
PR Generation Value : 15
PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)
PR Holder Key Value : 0
Registered PR Keys : 4503687425852313 9007287053222809
PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C
PR Capabilities Byte[3] : 0xa1 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
 
root@testnode3[/]#devrsrv -c query -l hdisk4
Device Reservation State Information
==================================================
Device Name : hdisk4
Device Open On Current Host? : YES
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 4503687425852313
Device Reservation State : PR SHARED
PR Generation Value : 15
PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)
PR Holder Key Value : 0
Registered PR Keys : 4503687425852313 9007287053222809
PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C
PR Capabilities Byte[3] : 0xa1 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
 
root@testnode4[/]#lspv
hdisk0 00f8806f26239b8c rootvg active
hdisk2 00f8806f909bc31a caavg_private active
hdisk3 00f8806f909bc357 vg1 concurrent
hdisk4 00f8806f909bc396 vg1 concurrent
 
root@testnode4[/]#devrsrv -c query -l hdisk3
Device Reservation State Information
==================================================
Device Name : hdisk3
Device Open On Current Host? : YES
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 9007287053222809
Device Reservation State : PR SHARED
PR Generation Value : 15
PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)
PR Holder Key Value : 0
Registered PR Keys : 4503687425852313 9007287053222809
PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C
PR Capabilities Byte[3] : 0xa1 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
 
root@testnode4[/]#devrsrv -c query -l hdisk4
Device Reservation State Information
==================================================
Device Name : hdisk4
Device Open On Current Host? : YES
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 9007287053222809
Device Reservation State : PR SHARED
PR Generation Value : 15
PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)
PR Holder Key Value : 0
Registered PR Keys : 4503687425852313 9007287053222809
PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C
PR Capabilities Byte[3] : 0xa1 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
The PR Shared reservation policy uses the SCSI-3 reservation of type WRITE EXCLUSIVE, ALL REGISTRANTS, as shown in Example 10-7 on page 337. Only nodes that are registered can write to the shared volumes. When a cluster split occurs, the standby node ejects the PR registration of the active node on all shared volumes of the affected RGs. In Example 10-6 on page 335, the only registrations that are left on hdisk3 and hdisk4 are of testnode4, effectively fencing off testnode3 from the shared volumes.
 
Note: Only a registered node can eject the registration of other nodes.
Example 10-7 WRITE EXCLUSIVE, ALL REGISTRANTS PR type
root@testnode4[/]#devrsrv -c query -l hdisk3
Device Reservation State Information
==================================================
Device Name : hdisk3
Device Open On Current Host? : YES
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 9007287053222809
Device Reservation State : PR SHARED
PR Generation Value : 15
PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)
PR Holder Key Value : 0
Registered PR Keys : 9007287053222809
PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C
PR Capabilities Byte[3] : 0xa1 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
 
root@testnode4[/]#devrsrv -c query -l hdisk4
Device Reservation State Information
==================================================
Device Name : hdisk4
Device Open On Current Host? : YES
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 9007287053222809
Device Reservation State : PR SHARED
PR Generation Value : 15
PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)
PR Holder Key Value : 0
Registered PR Keys : 9007287053222809
PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C
Node testnode3 is again registered on hdisk3 and hdisk4 when it has successfully rejoins testnode4 to form a cluster. You must perform a restart of cluster services on testnode3.
10.4 Changes in split and merge policies in PowerHA V7.2.1
This section provides a list of changes that are associated with the split and merge policies that are introduced in PowerHA V7.2.1 for AIX 7.2.1:
Split and merge policies are configurable for all cluster types when AIX is at Version 7.2.1, as summarized in Table 10-1.
Table 10-1 Split and merge policies for all cluster types
Cluster Type
Pre AIX 7.2.1
AIX 7.2.1
 
Split policy
Merge policy
Split and merge policy
Standard
Not supported
 
 
None-Majority
TB (Disk)-TB (Disk)
TB (NFS)-TB (NFS)
Manual-Manual
 
Stretched
None
Majority
TieBreaker
TieBreaker
Linked
None
Majority
TieBreaker
TieBreaker
Manual
Manual
Split and merge policies are configured as a whole instead of separately. These options can also vary a bit based on the exact AIX dependency.
The action plan for the split and merge policy is configurable.
An entry is added to the Problem Determination Tools menu for starting cluster services on merged node after a cluster split.
Changes were added to clmgr for configuring the split and merge policy.
10.4.1 Configuring the split and merge policy by using SMIT
The split and merge policies are now configured as a whole, as shown in Figure 10-25, instead of separately, as described in 10.2.3, “Configuration for the split and merge policy” on page 321.
Figure 10-25 Configuring the split handling policy
All three options, None, Tie Breaker, and Manual, are now available for all cluster types, which includes standard, stretched, and linked clusters.
Before PowerHA V7.2.1, the split policy has a default setting of None and the merge policy has default setting of Majority and the default action was Reboot (Figure 10-26). This behavior has not changed.
Figure 10-26 Split and merge action plan menu
For the Tie Breaker option, the action plan for split and merge is now configurable as follows (Figure 10-27):
Reboot.
This is the default option before PowerHA V7.1.2. The nodes of the losing partition are restarted when a cluster split occurs.
Disable applications auto-tart and reboot.
On a split event, the nodes on the losing partition are restarted, and the RGs cannot be brought online automatically after restart.
Disable Cluster Services Auto-Start and Reboot.
Upon a split event, the nodes on the losing partition are restarted. The cluster services, CAA/RSCT/PowerHA, are not started on restarted. After the split condition is healed, select Start CAA on Merged Node from SMIT to enable the cluster services and bring the cluster to a stable state.
 
Note: If you specify the Split-Merge policy as None-None, the action plan is not implemented and a restart does not occur after the cluster split and merge events. This option is only available in your environment if it is running IBM AIX 7.2 with Technology Level 1, or later.
Figure 10-27 Disk tie breaker split and merge action plan
Similarly, Figure 10-28 shows the NFS TieBreaker policy SMIT window.
Figure 10-28 NFS tie breaker split and merge action plan
10.4.2 Configuring the split and merge policy by using clmgr
The clmgr utility has the following changes for the split and merge policy configuration (Figure 10-29):
Added a none option to the merge policy.
There is a local and remote quorum directory.
Added diable_rgs_autostart and disable_cluster_services_autostart options to the action plan.
Figure 10-29 The clmgr split and merge options
The Split/Merge policy of none/none can be configured only by using clmgr, as shown in Example 10-8. There is no SMIT option to configure this option.
Example 10-8 The clmgr modify split/merge policy to none
# clmgr modify cluster SPLIT_POLICY=none MERGE_POLICY=none
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : None
Merge Handling Policy : None
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
10.4.3 Starting cluster services after a split
If the split-merge action plan of disable cluster services auto start is chosen in the configuration, then on a split event the losing partition nodes are restarted without bringing the cluster services online until these services are manually enabled.
It is required to enable the cluster services after a split situation is healed. Until the user resolves this enablement, the cluster services are not running on the losing partition nodes even after the networks rejoin. The losing partition nodes join the existing CAA cluster after the re-enable is performed. This is done by running smitty sysmirror and selecting Problem Determination Tools → Start CAA on Merged Node, as shown in Figure 10-30.
Figure 10-30 Starting Cluster Aware AIX on the merged node
10.4.4 Migration and limitation
Multiple split or merge situations cannot be handled at one time. For example, in the case of an asymmetric topology (AST), where some nodes have visibility to both islands, the nodes do not form a clean split. In such cases, a split event is not generated when AST halts a node to correct the asymmetry.
With the NFS tiebreaker split policy configured, if the tie breaker group leader (TBGL) node is restarted, then all other nodes in the winning partition are restarted. No preemption is supported in this case.
Tie-breaker disk preemption does not work in the case of a TBGL hard restart or power off.
The merge events are not available in a stretched cluster with versions earlier to AIX 7.2.1, as shown in Figure 10-31.
Figure 10-31 Split merge policies pre- and post-migration
10.5 Considerations for using split and merge quarantine policies
A split and merge policy is used for deciding which node or partition can be restarted when a cluster split occurs. A quarantine policy is used for fencing off, or quarantining, the active node from shared disks when a cluster split occurs. Both types of policies are designed to prevent data corruption in the event of cluster partitioning.
The quarantine policy does not require additional infrastructure resources, but the split and merge policy does. Users select the appropriate policy or combination of policies that suit their data center environments.
For example, instead of using the disk tie-breaker split and merge policy that requires one disk tie breaker per cluster, you want to use a single NFS server as a tie breaker for multiple clusters (Figure 10-32) to minimize resource requirements. This is a tradeoff between resource and effectiveness.
Figure 10-32 Using a single NFS server as a tie breaker for multiple clusters
For those who want to prevent only the possibility of data corruption with minimal configuration, and are satisfied with possible manual intervention that is required in the event of a cluster split, you can use the disk fencing quarantine policy. Again, this is a tradeoff. Figure 10-33 presents a comparison summary of these policies.
Figure 10-33 Comparison summary of split and merge policies
10.6 Split and merge policy testing environment
Figure 10-34 shows the topology of testing scenarios in this chapter.
Figure 10-34 Testing scenario for the split and merge policy
Our testing environment is a single PowerHA standard cluster. It includes two AIX LPARs with nodes host names PHA170 and PHA171. Each node has two network interfaces. One interface is used for communication with HMCs and NFS server, and the other is used in the PowerHA cluster. Each node has three FC adapters. The first adapter is used for rootvg, the second adapter is used for user shared data access, and the third one is used for tie-breaker access.
The PowerHA cluster is a basic configuration with the specific configuration option for different split and merge policies.
10.6.1 Basic configuration
Table 10-2 shows the PowerHA cluster’s attributes. This is a basic two-node PowerHA standard cluster.
Table 10-2 PowerHA cluster’s configuration
Component
PHA170
PHA171
Cluster name
PHA_cluster
Cluster type: Standard Cluster or No Site Cluster (NSC)
Network interface
en0: 172.16.51.170
neTmask: 255.255.255.0
Gateway: 172.16.51.1
en1: 172.16.15.242
en0: 172.16.51.171
Netmask: 255.255.255.0
Gateway: 172.16.51.1
en1: 172.16.15.243
Network
net_ether_01 (172.16.51.0/24)
CAA
Unicast
Repository disk: hdisk1
Shared VG
sharevg:hdisk2
Service IP
172.16.51.172 PHASvc
Resource Group
Resource Group testRG:
Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Never Fallback
Participating Nodes: PHA170 PHA171
Service IP Label: PHASvc
Volume Group: sharevg
10.6.2 Specific hardware configuration for some scenarios
This section describes the specific hardware configurations for some scenarios.
Split and merge policy is tie breaker (disk)
In this scenario, add one shared disk, hdisk2, to act as the tie breaker.
Split and merge policy is tie breaker (NFS)
In this scenario, add one network file system (NFS) node to act as the tie breaker.
Quarantine policy is active node halt policy
In this scenario, add two HMCs that are used to shut down the relevant LPARs in case of a cluster split scenario.
The following sections contain the detailed PowerHA configuration of each scenario.
10.6.3 Initial PowerHA service status for each scenario
Each scenario has the same start status for the PowerHA and CAA service’s status. We show the status in this section because we do not show it in each scenario.
PowerHA configuration
Example 10-9 shows PowerHA basic configuration with the cltopinfo command.
Example 10-9 PowerHA basic configuration that is shown with the cltopinfo command
# cltopinfo
Cluster Name: PHA_Cluster
Cluster Type: Standard
Heartbeat Type: Unicast
Repository Disk: hdisk1 (00fa2342a1093403)
 
There are 2 node(s) and 1 network(s) defined
 
NODE PHA170:
Network net_ether_01
PHASvc 172.16.51.172
PHA170 172.16.51.170
 
NODE PHA171:
Network net_ether_01
PHASvc 172.16.51.172
PHA171 172.16.51.171
 
Resource Group testRG
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Participating Nodes PHA170 PHA171
Service IP Label PHASvc
Volume Group sharevg
PowerHA service
Example 10-10 shows the PowerHA nodes status from each PowerHA node.
Example 10-10 PowerHA nodes status in each scenario before a cluster split
# clmgr -cv -a name,state,raw_state query node
# NAME:STATE:RAW_STATE
PHA170:NORMAL:ST_STABLE
PHA171:NORMAL:ST_STABLE
Example 10-11 shows the PowerHA RG status from each PowerHA node. The RG (testRG) is online on PHA170 node.
Example 10-11 PowerHA Resource Group status in each scenario before the cluster split
# clRGinfo -v
 
Cluster Name: PHA_Cluster
 
Resource Group Name: testRG
Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Never Fallback
Site Policy: ignore
Node State
---------------------------------------------------------------- ---------------
PHA170 ONLINE
PHA171 OFFLINE
CAA service status
Example 10-12 shows the CAA configuration with the lscluster -c command.
Example 10-12 Showing the CAA cluster configuration with the lscluster -c command
# lscluster -c
Cluster Name: PHA_Cluster
Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20
Number of nodes in cluster = 2
Cluster ID for node PHA170: 1
Primary IP address for node PHA170: 172.16.51.170
Cluster ID for node PHA171: 2
Primary IP address for node PHA171: 172.16.51.171
Number of disks in cluster = 1
Disk = hdisk1 UUID = 58a286b2-fe51-5e39-98b1-43acf62025ab cluster_major = 0 cluster_minor = 1
Multicast for site LOCAL: IPv4 228.16.51.170 IPv6 ff05::e410:33aa
Communication Mode: unicast
Local node maximum capabilities: SPLT_MRG, CAA_NETMON, AUTO_REPOS_REPLACE, HNAME_CHG, UNICAST, IPV6, SITE
Effective cluster-wide capabilities: SPLT_MRG, CAA_NETMON, AUTO_REPOS_REPLACE, HNAME_CHG, UNICAST, IPV6, SITE
Local node max level: 50000
Effective cluster level: 50000
Example 10-13 shows the CAA configuration with the lscluster -d command.
Example 10-13 CAA cluster configuration
# lscluster -d
Storage Interface Query
 
Cluster Name: PHA_Cluster
Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20
Number of nodes reporting = 2
Number of nodes expected = 2
 
Node PHA170
Node UUID = 28945a80-b516-11e6-8007-faac90b6fe20
Number of disks discovered = 1
hdisk1:
State : UP
uDid : 33213600507680284001D5800000000005C8B04214503IBMfcp
uUid : 58a286b2-fe51-5e39-98b1-43acf62025ab
Site uUid : 51735173-5173-5173-5173-517351735173
Type : REPDISK
 
Node PHA171
Node UUID = 28945a3a-b516-11e6-8007-faac90b6fe20
Number of disks discovered = 1
hdisk1:
State : UP
uDid : 33213600507680284001D5800000000005C8B04214503IBMfcp
uUid : 58a286b2-fe51-5e39-98b1-43acf62025ab
Site uUid : 51735173-5173-
 
Note: For production environments, configure additional backup repository disks.
PowerHA V7.2 supports up to six backup repository disks. It also supports automatic repository disk replacement in the event of repository disk failure. For more information, see IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278.
Example 10-14 and Example 10-15 show output from PHA170 and PHA171 nodes with the lscluster -m command. The current heartbeat channel is the network.
Example 10-14 CAA information from node PHA170
# hostname
PHA170
# lscluster -m
Calling node query for all nodes...
Node query number of nodes examined: 2
 
Node name: PHA171
Cluster shorthand id for node: 2
UUID for node: 28945a3a-b516-11e6-8007-faac90b6fe20
State of node: UP
Reason: NONE
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of clusters node is a member in: 1
CLUSTER NAME SHID UUID
PHA_Cluster 0 28bf3ac0-b516-11e6-8007-faac90b6fe20
SITE NAME SHID UUID
LOCAL 1 51735173-5173-5173-5173-517351735173
 
Points of contact for node: 1
-----------------------------------------------------------------------
Interface State Protocol Status SRC_IP->DST_IP
-----------------------------------------------------------------------
tcpsock->02 UP IPv4 none 172.16.51.170->172.16.51.171
Example 10-15 CAA information from node PHA171
# hostname
PHA171
# lscluster -m
Calling node query for all nodes...
Node query number of nodes examined: 2
 
Node name: PHA170
Cluster shorthand id for node: 1
UUID for node: 28945a80-b516-11e6-8007-faac90b6fe20
State of node: UP
Reason: NONE
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of clusters node is a member in: 1
CLUSTER NAME SHID UUID
PHA_Cluster 0 28bf3ac0-b516-11e6-8007-faac90b6fe20
SITE NAME SHID UUID
LOCAL 1 51735173-5173-5173-5173-517351735173
 
Points of contact for node: 1
-----------------------------------------------------------------------
Interface State Protocol Status SRC_IP->DST_IP
-----------------------------------------------------------------------
tcpsock->01 UP IPv4 none 172.16.51.171->172.16.51.170
Example 10-16 shows the current heartbeat devices that are configured in the testing environment. There is not a SAN-based heartbeat device.
Example 10-16 CAA interfaces
# lscluster -g
Network/Storage Interface Query
 
Cluster Name: PHA_Cluster
Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20
Number of nodes reporting = 2
Number of nodes stale = 0
Number of nodes expected = 2
 
Node PHA171
Node UUID = 28945a3a-b516-11e6-8007-faac90b6fe20
Number of interfaces discovered = 2
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = FA:9D:66:B2:87:20
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.51.171 broadcast 172.16.51.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.51.170
Interface number 2, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Smoothed RTT across interface = 750
Mean deviation in network RTT across interface = 1500
Probe interval for interface = 22500 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Interface state = UP RESTRICTED AIX_CONTROLLED
 
Node PHA170
Node UUID = 28945a80-b516-11e6-8007-faac90b6fe20
Number of interfaces discovered = 2
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = FA:AC:90:B6:FE:20
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0161081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.51.170 broadcast 172.16.51.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.51.170
Interface number 2, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Smoothed RTT across interface = 594
Mean deviation in network RTT across interface = 979
Probe interval for interface = 15730 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Interface state = UP RESTRICTED AIX_CONTROLLED
 
Note: To identify physical FC adapters that can be used in the PowerHA cluster as the SAN-based heartbeat, go to the IBM Knowledge Center.
At the time of writing, there is no plan to support this feature for all 16-Gb FC adapters.
Shared file system status
Example 10-17 shows that the /sharefs file system is mounted on PHA170 node. This is because the RG is online on this node.
Example 10-17 Shared file system status
(0) root @ PHA170: /
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
...
/dev/sharelv 1310720 1309864 1% 4 1% /sharefs
10.7 Scenario: Default split and merge policy
This section shows a scenario with the default split and merge policy.
10.7.1 Scenario description
Figure 10-35 shows the topology of the default split and merge scenario.
Figure 10-35 Topology of the default split and merge scenario
This scenario keeps the default configuration for the split and merge policy and does not set the quarantine policy. To simulate a cluster split, break the network communication between the two PowerHA nodes, and disable the repository disk access from the PHA170 node.
After a cluster split occurs, restore communications to generate a cluster merge event.
10.7.2 Split and merge configuration in PowerHA
In this scenario, it is not required to set specific parameters for the split and merge policy because it is the default policy. The clmgr command can be used to display the current policy, as shown in Example 10-18.
Example 10-18 The clmgr command displays the current split/merge settings
# clmgr view cluster SPLIT-MERGE
SPLIT_POLICY="none"
MERGE_POLICY="majority"
ACTION_PLAN="reboot"
<...>
Complete the following steps:
1. To change the current split and merge policy from the default by using SMIT, use the fast path of smitty cm_cluster_sm_policy_chk. Otherwise, run smitty sysmirror and select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy. Example 10-19 shows the window where you select the None option.
Example 10-19 Split handling policy
Split Handling Policy
Move cursor to desired item and press Enter.
None
TieBreaker
Manual
After pressing Enter, the menu shows the policy, as shown in Example 10-20.
Example 10-20 Split and merge management policy
Split and Merge Management Policy
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
Split Handling Policy None
Merge Handling Policy Majority +
Split and Merge Action Plan Reboot
2. Keep the default values and upon pressing Enter, you see the summary that is shown in Example 10-21.
Example 10-21 Successful setting of the split and merge policy
Command: OK stdout: yes stderr: no
 
Before command completion, additional instructions may appear below.
 
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : None
Merge Handling Policy : Majority
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
3. Synchronize the cluster. After the synchronization operation is complete, the cluster can be activated.
10.7.3 Cluster split
Before simulating a cluster split, check the status, as described in 10.6.3, “Initial PowerHA service status for each scenario” on page 347.
In this case, we sever all communications between two nodes at 21:55:23.
Steps of CAA and PowerHA on PHA170 node
The following events occur:
21:55:23: All communication between the two nodes is broken.
21:55:23: The PHA170 node marks REP_DOWN for the repository disk.
21:55:33: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.
21:56:02: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.
21:56:02: PowerHA triggers the split_merge_prompt split event.
21:56:11: PowerHA triggers the split_merge_prompt quorum event.
Then, keep the current PowerHA service status.
Steps of CAA and PowerHA on PHA171 node
The following events occur:
21:55:23: All communication between the two nodes is broken.
21:55:33: PHA171 node CAA marked ADAPTER_DOWN for PHA170 node.
21:56:02: PHA171 node CAA marked NODE_DOWN for PHA170 node.
21:56:02: PowerHA triggered split_merge_prompt split event.
21:56:07: PowerHA triggered split_merge_prompt quorum event.
 
Note: The log file of the CAA service is /var/adm/ras/syslog.caa.
Then, PHA171 takes over the RG.
You see that PHA171 took over the RG while the RG is still online on the PHA170 node.
 
Note: The duration between REP_DOWN or ADAPTER_DOWN to NODE DOWN is 30 seconds. This duration is controlled by the CAA parameter node_timeout. Its value can be shown by running the following command:
# clctrl -tune -L node_timeout
Here is the output:
NAME DEF MIN MAX UNIT SCOPE
ENTITY_NAME(UUID) CUR
--------------------------------------------------------------------------------
node_timeout 20000 10000 600000 milliseconds c n
PHA_Cluster(28bf3ac0-b516-11e6-8007-faac90b6fe20) 30000
--------------------------------------------------------------------------------
To change this value, either run the PowerHA clmgr command or use the SMIT menu:
From the SMIT menu, run smitty sysmirror, select Custom Cluster Configuration → Cluster Nodes and Networks → Manage the Cluster → Cluster heartbeat settings, and then change the Node Failure Detection Timeout parameter.
To use the clmgr command, run the following command:
clmgr modify cluster HEARTBEAT_FREQUENCY= <the value you want to set, default is 30>
Displaying the resource group status from the PHA170 node after the cluster split
Example 10-22 shows that the PHA170 node cannot get the PHA171 node’s status.
Example 10-22 Resource group unknown status post split
# hostname
PHA170
# clmgr -cv -a name,state,raw_state query node
# NAME:STATE:RAW_STATE
PHA170:NORMAL:ST_RP_RUNNING
PHA171:UNKNOWN:UNKNOWN
Example 10-23 shows that the RG is online on PHA170 node.
Example 10-23 Resource group still online PHA170 post split
# hostname
PHA170
# clRGinfo
Node State
---------------------------------------------------------------- ---------------
PHA170 ONLINE
PHA171 OFFLINE
Example 10-24 shows that the VG sharevg is varied on, and the file system /sharefs is mounted on PHA170 node and is writable.
Example 10-24 Volume group still online PHA170 post split
# hostname
PHA170
# lsvg sharevg
VOLUME GROUP: sharevg VG IDENTIFIER: 00fa4b4e00004c0000000158a8e55930
VG STATE: active PP SIZE: 32 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 29 (928 megabytes)
MAX LVs: 256 FREE PPs: 8 (256 megabytes)
 
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
...
/dev/sharelv 1310720 1309864 1% 4 1% /sharefs
Displaying the resource group status from the PHA171 node after the cluster split
Example 10-25 shows that the PHA171 node cannot get the PHA170 node’s status.
Example 10-25 Resource group warning and unknown on PHA171
# hostname
PHA171
# clmgr -cv -a name,state,raw_state query node
# NAME:STATE:RAW_STATE
PHA170:UNKNOWN:UNKNOWN
PHA171:WARNING:WARNING
Example 10-26 shows that the RG is online on PHA171 node too.
Example 10-26 Resource group online PHA171 post split
# hostname
PHA171
# clRGinfo
Node State
---------------------------------------------------------------- ---------------
PHA170 OFFLINE
PHA171 ONLINE
Example 10-27 shows that the VG sharevg is varied on and the file system /sharefs is mounted on PHA171 node, and it is writable too.
Example 10-27 Sharevg online on PHA171 post split
# hostname
PHA171
# lsvg sharevg
VOLUME GROUP: sharevg VG IDENTIFIER: 00fa4b4e00004c0000000158a8e55930
VG STATE: active PP SIZE: 32 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 29 (928 megabytes)
MAX LVs: 256 FREE PPs: 8 (256 megabytes)
<...>
 
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
<...>
/dev/sharelv 1310720 1309864 1% 4 1% /sharefs
As seen in Example 10-7 on page 337, the /sharefs file system is mounted on both nodes and in writable mode. Applications on two nodes can write at the same time. This is risky and easily can result in data corruption.
 
Note: This situation must always be avoided in a production environment.
 
10.7.4 Cluster merge
After the cluster split occurs, the RG was online on PHA171 node while it was still online on the PHA170 node. When the PowerHA cluster heartbeat communication is restored at 22:24:08, a PowerHA merge event was triggered.
The default merge policy is Majority and the action plan is Reboot. However, in our case, the rule in the cluster merge event is:
The node that has a lower node ID s survives, and the other node is restarted by RSCT.
This rule is also introduced in 10.2.2, “Merge policy” on page 320.
Example 10-28 shows how to display a PowerHA node’s node ID. You can see that PHA170 has the lower ID, so it is expected that PHA171 node is restarted.
Example 10-28 How to show a node ID for PowerHA nodes
# ./cl_query_hn_id
CAA host PHA170 with node id 1 corresponds to PowerHA node PHA170
CAA host PHA171 with node id 2 corresponds to PowerHA node PHA171
 
# lscluster -c
Cluster Name: PHA_Cluster
Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20
Number of nodes in cluster = 2
Cluster ID for node PHA170: 1
Primary IP address for node PHA170: 172.16.51.170
Cluster ID for node PHA171: 2
Primary IP address for node PHA171: 172.16.51.171
Number of disks in cluster = 1
Disk = hdisk1 UUID = 58a286b2-fe51-5e39-98b1-43acf62025ab cluster_major = 0 cluster_minor = 1
Multicast for site LOCAL: IPv4 228.16.51.170 IPv6 ff05::e410:33aa
Example 10-29 shows that the PHA171 node was rebooted at 22:25:02.
Example 10-29 Display error report with the errpt -c command
# hostname
PHA171
# errpt -c
A7270294 1127222416 P S cluster0 A merge has been detected.
78142BB8 1127222416 I O ConfigRM ConfigRM received Subcluster Merge event
F0851662 1127222416 I S ConfigRM The sub-domain containing the local node
9DEC29E1 1127222416 P O cthags Group Services daemon exit to merge doma
9DBCFDEE 1127222516 T O errdemon ERROR LOGGING TURNED ON
69350832 1127222516 T S SYSPROC SYSTEM SHUTDOWN BY USER
 
# errpt -aj 69350832
LABEL: REBOOT_ID
IDENTIFIER: 69350832
 
Date/Time: Sun Nov 27 22:25:02 CST 2016
Sequence Number: 701
Machine Id: 00FA23424C00
Node Id: PHA171
Class: S
Type: TEMP
WPAR: Global
Resource Name: SYSPROC
 
Description
SYSTEM SHUTDOWN BY USER
 
Probable Causes
SYSTEM SHUTDOWN
 
Detail Data
USER ID
0
0=SOFT IPL 1=HALT 2=TIME REBOOT
0
TIME TO REBOOT (FOR TIMED REBOOT ONLY)
0
PROCESS ID
13959442
PARENT PROCESS ID
4260250
PROGRAM NAME
hagsd
PARENT PROGRAM NAME
srcmstr
10.7.5 Scenario summary
With the default split and merge policy, when a cluster split happens, the RG is online on both PowerHA nodes. This is a risky situation that can result in data corruption. Careful planning must be done to avoid this scenario.
10.8 Scenario: Split and merge policy with a disk tie breaker
This section describes the split and merge policy scenario with a disk tie breaker.
10.8.1 Scenario description
Figure 10-36 is the reference topology for this scenario.
Figure 10-36 Split and merge topology scenario
There is one new shared disk, hdisk3, that is added in this scenario, which is used for the disk tie breaker.
 
Note: When using a tie-breaker disk for split and merge recovery handling, the disk must also be supported by the devrsrv command. This command is part of the AIX operating system.
At the time of writing, the EMC PowerPath disks are not supported for use as a tie-breaker disk.
Note: The tie-breaker disk is set to no_reserve for the reserve_policy with the chdev command before the start of the PowerHA service on both nodes. Otherwise, the tie-breaker policy cannot take effect in a cluster split event.
10.8.2 Split and merge configuration in PowerHA
Complete the following steps:
1. The fast path to set the split and merge policy is smitty cm_cluster_sm_policy_chk. The whole path is running smitty sysmirror and selecting Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy.
Example 10-30 shows the window to select the split handling policy; in this case, TieBreaker is selected.
Example 10-30 TieBreaker split handling policy
Split Handling Policy
Move cursor to desired item and press Enter.
None
TieBreaker
Manual
2. After pressing Enter, select the Disk option, as shown in Example 10-31.
Example 10-31 Select Tiebreaker
Select TieBreaker Type
Move cursor to desired item and press Enter.
Disk
NFS
F1=Help F2=Refresh F3=Cancel
Esc+8=Image Esc+0=Exit Enter=Do
3. Pressing Enter shows the disk tie breaker configuration window, as shown in Example 10-32. The merge handling policy is TieBreaker too, and you cannot change it. Also, keep the default action plan as Reboot.
Example 10-32 Disk tiebreaker configuration
Disk TieBreaker Configuration
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
Split Handling Policy TieBreaker
Merge Handling Policy TieBreaker
* Select Tie Breaker [] +
Split and Merge Action Plan Reboot
4. In the Select Tie Breaker field, press F4 to list the disks that can be used for the disk tie breaker, as shown in Example 10-33. We select hdisk3.
Example 10-33 Select tie breaker disk
Select Tie Breaker
Move cursor to desired item and press Enter.
None
hdisk3 (00fa2342a10932bf) on all cluster nodes
F1=Help F2=Refresh F3=Cancel
Esc+8=Image Esc+0=Exit Enter=Do
/=Find n=Find Next
5. Press Enter to display the summary, as shown in Example 10-34.
Example 10-34 Select the disk tie breaker status
Command: OK stdout: yes stderr: no
 
Before command completion, additional instructions may appear below.
 
hdisk3 changed
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : Tie Breaker
Merge Handling Policy : Tie Breaker
Tie Breaker : hdisk3
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
6. Synchronize the cluster. After the synchronization operation is complete, the cluster can be activated.
7. Run the clmgr command to query the current split and merge policy, as shown in Example 10-35.
Example 10-35 Display the newly set split and merge policies
# clmgr view cluster SPLIT-MERGE
SPLIT_POLICY="tiebreaker"
MERGE_POLICY="tiebreaker"
ACTION_PLAN="reboot"
TIEBREAKER="hdisk3"
<...>
After the PowerHA service start completes, you see that the reserve_policy of this disk is changed to PR_exclusive and one reserve key value is generated for this disk on each node. This disk is not reserved by any of the nodes. Example 10-36 shows the result from the two nodes.
Example 10-36 Reserve_policy on each node
(127) root @ PHA170: /
# lsattr -El hdisk3|egrep "PR_key_value|reserve_policy"
PR_key_value 2763601723737305030 Persistant Reserve Key Value True+
reserve_policy PR_exclusive Reserve Policy True+
 
# devrsrv -c query -l hdisk3
Device Reservation State Information
==================================================
Device Name : hdisk3
Device Open On Current Host? : NO
ODM Reservation Policy : PR EXCLUSIVE
ODM PR Key Value : 2763601723737305030
Device Reservation State : NO RESERVE
Registered PR Keys : No Keys Registered
PR Capabilities Byte[2] : 0x11 CRH PTPL_C
PR Capabilities Byte[3] : 0x80
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
 
 
(0) root @ PHA171: /
# lsattr -El hdisk3|egrep "PR_key_value|reserve_policy"
PR_key_value 6664187022250383046 Persistant Reserve Key Value True+
reserve_policy PR_exclusive Reserve Policy True+
 
# devrsrv -c query -l hdisk3
Device Reservation State Information
==================================================
Device Name : hdisk3
Device Open On Current Host? : NO
ODM Reservation Policy : PR EXCLUSIVE
ODM PR Key Value : 6664187022250383046
Device Reservation State : NO RESERVE
Registered PR Keys : No Keys Registered
PR Capabilities Byte[2] : 0x11 CRH PTPL_C
PR Capabilities Byte[3] : 0x80
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_ARfor
10.8.3 Cluster split
Before simulating a cluster split, check the current cluster status. For more information, see 10.6.3, “Initial PowerHA service status for each scenario” on page 347.
When the tie breaker split and merge policy is enabled, the rule is the TBGL node has higher priority to the reserve tiebreaker device than other nodes. If this node reserves the tie-breaker device successfully, then other nodes are restarted.
For this scenario, Example 10-37 shows that the PHA171 node is the current TBGL. So, it is expected that the PHA171 node reserves the tie breaker device, and the PHA170 node is restarted. Any RG on the PHA170 node is taken over to the PHA171 node.
Example 10-37 Display the tiebreaker group leader
# lssrc -ls IBM.ConfigRM|grep Group
Group IBM.ConfigRM:
GroupLeader: PHA171, 0xdc7bf2c9d20096c6, 2
TieBreaker GroupLeader: PHA171, 0xdc7bf2c9d20096c6, 2
In this case, we broke all communication between the two nodes at 01:36:12.
Result and log on the PHA170 node
The following events occur:
01:36:12: All communication between the two nodes is broken.
01:36:22: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.
01:36:52: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.
01:36:52: PowerHA triggers the split_merge_prompt split event.
01:36:57: PowerHA triggers the split_merge_prompt quorum event.
01:37:00: The PHA170 node restarts.
Example 10-38 shows output of the errpt command on the PHA170 node. The PHA170 node restarts at 01:37:00.
Example 10-38 PHA170 restart post split
C7E7362C 1128013616 T S cluster0 Node is heartbeating solely over disk or
4D91E3EA 1128013616 P S cluster0 A split has been detected.
2B138850 1128013616 I O ConfigRM ConfigRM received Subcluster Split event
DC73C03A 1128013616 T S fscsi1 SOFTWARE PROGRAM ERROR
<...>
C62E1EB7 1128013616 P H hdisk1 DISK OPERATION ERROR
<...>
80732E3 1128013716 P S ConfigRM The operating system is being rebooted t
 
# errpt -aj B80732E3|more
---------------------------------------------------------------------------
LABEL: CONFIGRM_REBOOTOS_E
IDENTIFIER: B80732E3
 
Date/Time: Mon Nov 28 01:37:00 CST 2016
Sequence Number: 1620
Machine Id: 00FA4B4E4C00
Node Id: PHA170
Class: S
Type: PERM
WPAR: Global
Resource Name: ConfigRM
 
Description
The operating system is being rebooted to ensure that critical resources are
stopped so that another sub-domain that has operational quorum may recover
these resources without causing corruption or conflict.
 
Probable Causes
Critical resources are active and the active sub-domain does not have
operational quorum.
 
Failure Causes
Critical resources are active and the active sub-domain does not have
operational quorum.
 
Recommended Actions
After node finishes rebooting, resolve problems that caused the operational
quorum to be lost.
 
Detail Data
DETECTING MODULE
RSCT,PeerDomain.C,1.99.22.299,23992
ERROR ID
Result and log on the PHA171 node
The following events occur:
01:36:12: All communication between the two nodes is broken.
01:36:22: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.
01:36:52: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.
01:36:52: PowerHA triggers a split_merge_prompt split event.
01:37:04: PowerHA triggers a split_merge_prompt quorum event, and then PHA171 takes over the RG.
01:37:15: PowerHA completes the RG takeover operation.
As shown in Example 10-38 on page 364 with the time stamp, PHA170 restarts at 01:37:00. PHA171 starts the takeover of the RG at 01:37:04. There is no opportunity for both nodes to mount the /sharefs file system at the same time so that the data integrity is maintained.
The PHA171 node holds the tiebreaker disk during as cluster split
Example 10-39 shows that the tiebreaker disk is reserved by the PHA171 node after the cluster split event happens.
Example 10-39 Tiebreaker disk reservation from PHA171
# hostname
PHA171
 
# lsattr -El hdisk3|egrep "PR_key_value|reserve_policy"
PR_key_value 6664187022250383046 Persistant Reserve Key Value True+
reserve_policy PR_exclusive Reserve Policy True+
 
# devrsrv -c query -l hdisk3
Device Reservation State Information
==================================================
Device Name : hdisk3
Device Open On Current Host? : NO
ODM Reservation Policy : PR EXCLUSIVE
ODM PR Key Value : 6664187022250383046
Device Reservation State : PR EXCLUSIVE
PR Generation Value : 152
PR Type : PR_WE_RO (WRITE EXCLUSIVE, REGISTRANTS ONLY)
PR Holder Key Value : 6664187022250383046
Registered PR Keys : 6664187022250383046 6664187022250383046
PR Capabilities Byte[2] : 0x11 CRH PTPL_C
PR Capabilities Byte[3] : 0x81 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
10.8.4 How to change the tie breaker group leader manually
To change the TBGL manually, simply restart the current TBGL. For example, if the PHA170 node is the current TBGL, to change PHA171 as the tie breaker leader, restart the PHA170 node. During this restart, the TBGL is switched to the PHA171 node. After the PHA170 comes back, the group leader does not change until PHA171 is shut down or restarts.
10.8.5 Cluster merge
After the PHA170 node restart completes, restore all communications between the two nodes. If you want to enable the tiebreaker disk on the PHA170 node, just after the FC link is restored, run the cfgmgr command. Then, the paths of the tiebreaker disk are in active status, as shown in Example 10-40.
Example 10-40 Path status post split
# hostname
PHA170
 
# lspath -l hdisk1
Missing hdisk1 fscsi1
Missing hdisk1 fscsi1
 
-> After run ‘cfgmgr’ command
# lspath -l hdisk1
Enabled hdisk1 fscsi1
Enabled hdisk1 fscsi1
Within 1 minute of the repository disk being enabled, the CAA services start automatically. You can monitor the process by viewing the /var/adm/ras/syslog.caa log file.
Using the lscluster -m command, check whether the CAA service started. When ready, start the PowerHA service with the smitty clstart or clmgr start node PHA170 command.
You can also bring the CAA services and PowerHA services online together manually by running the following command:
clmgr start node PHA170 START_CAA=yes
During the start of the PowerHA services, the tie breaker device reservation is released on the PHA171 node automatically. Example 10-41 shows the device reservation state after the PowerHA service starts.
Example 10-41 Disk reservation post merge
# hostname
PHA171
 
# devrsrv -c query -l hdisk3
Device Reservation State Information
==================================================
Device Name : hdisk3
Device Open On Current Host? : NO
ODM Reservation Policy : PR EXCLUSIVE
ODM PR Key Value : 6664187022250383046
Device Reservation State : NO RESERVE
Registered PR Keys : No Keys Registered
PR Capabilities Byte[2] : 0x11 CRH PTPL_C
PR Capabilities Byte[3] : 0x81 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
10.8.6 Scenario summary
If you set a disk tie breaker as split and merge policy for the PowerHA cluster, when the cluster split occurs, the TBGL has a higher priority to reserve the tie breaker device. Other nodes restart. The RGs are online on the surviving node.
During the cluster merge process, the tiebreaker reservation is automatically released.
10.9 Scenario: Split and merge policy with the NFS tie breaker
This section describes the split and merge scenario with the NFS tie-breaker policy.
10.9.1 Scenario description
Figure 10-37 shows the topology of this scenario.
Figure 10-37 Split and merge topology scenario with the NFS tie breaker
In this scenario, there is one NFS server. Each PowerHA node has one network interface, en1, which is used to communicate with the NFS server. The NFS tie breaker requires NFS protocol version 4.
10.9.2 Setting up the NFS environment
On the NFS server, complete the following steps:
1. Edit /etc/hosts and add the PowerHA nodes definition, as shown in Example 10-42.
Example 10-42 Add nodes to NFS server /etc/hosts
cat /etc/hosts
<...>
172.16.15.242 PHA170_hmc
172.16.15.243 PHA171_hmc
2. Create the directory for export by running the following command:
mkdir -p /nfs_tiebreaker
3. Configure the NFS domain by running the following command:
chnfsdom nfs_local_domain
4. Start the nfsrgyd service by running the following command:
startsrc -s nfsrgyd
5. Change the NFS version 4 root location to / by running the following command:
chnfs -r /
6. Add the /nfs_tiebreaker directory to the export list by running the following command:
/usr/sbin/mknfsexp -d '/nfs_tiebreaker' '-B' -v '4' -S 'sys,krb5p,krb5i,krb5,dh' -t 'rw' -r 'PHA170_hmc,PHA171_hmc'
Alternatively, you can run smitty nfs, as shown in Example 10-43.
Example 10-43 NFS add directory to export
Add a Directory to Exports List
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP] [Entry Fields]
* Pathname of directory to export [/nfs_tiebreaker]
Anonymous UID [-2]
Public filesystem? no +
* Export directory now, system restart or both both +
Pathname of alternate exports file []
Allow access by NFS versions [4] +
External name of directory (NFS V4 access only) []
Referral locations (NFS V4 access only) []
Replica locations []
Ensure primary hostname in replica list yes +
Allow delegation? []
Scatter none +
* Security method 1 [sys,krb5p,krb5i,krb5,dh] +
* Mode to export directory read-write +
Hostname list. If exported read-mostly []
Hosts & netgroups allowed client access []
Hosts allowed root access [PHA170_hmc1,PHA171_hmc]
You can verify that the directory is exported by viewing the /etc/exports file, as shown in Example 10-44.
Example 10-44 The /etc/exports file
# cat /etc/exports
/nfs_tiebreaker -vers=4,sec=sys:krb5p:krb5i:krb5:dh,rw,root=PHA170_hmc:PHA171_hmc
On the NFS clients and PowerHA nodes, complete the following tasks:
Edit /etc/hosts and add the NFS server definition, as shown in Example 10-45.
Example 10-45 NFS clients /etc/hosts
# hostname
PHA170
# cat /etc/hosts
...
172.16.51.170 PHA170
172.16.51.171 PHA171
172.16.51.172 PHASvc
172.16.15.242 PHA170_hmc
 
172.16.15.222 nfsserver
Now, verify that the new NFS mount point can be mounted on all the nodes, as shown in Example 10-46.
Example 10-46 Mount the NFS directory
(0) root @ PHA170: /
# mount -o vers=4 nfsserver:/nfs_tiebreaker /mnt
 
# df|grep mnt
nfsserver:/nfs_tiebreaker 786432 429256 46% 11704 20% /mnt
 
# echo "test.." > /mnt/1.out
# cat /mnt/1.out
test..
# rm /mnt/1.out
 
# umount /mnt
10.9.3 Setting the NFS split and merge policies
When the NFS configuration finishes, configure PowerHA by completing the following steps:
1. The fast path to set the split and merge policy is smitty cm_cluster_sm_policy_chk. The full path is to run smitty sysmirror and select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy.
2. Select TieBreaker, as shown in Example 10-30 on page 361. After pressing Enter, select the NFS option, as shown in Example 10-47.
Example 10-47 NFS TieBreaker
 
Select TieBreaker Type
Move cursor to desired item and press Enter.
Disk
NFS
F1=Help F2=Refresh F3=Cancel
Esc+8=Image Esc+0=Exit Enter=Do
3. After pressing Enter, the NFS tiebreaker configuration panel opens, as shown in Example 10-48. The merge handling policy is TieBreaker too, and it cannot be changed. Also, keep the default action plan as Reboot.
Example 10-48 NFS TieBreaker configuration menu
NFS TieBreaker Configuration
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
Split Handling Policy NFS
Merge Handling Policy NFS
* NFS Export Server [nfsserver]
* Local Mount Directory [/nfs_tiebreaker]
* NFS Export Directory [/nfs_tiebreaker]
Split and Merge Action Plan Reboot
After pressing enter, Example 10-49 shows the NFS TieBreaker configuration summary.
Example 10-49 NFS TieBreaker configuration summary
Command: OK stdout: yes stderr: no
 
Before command completion, additional instructions may appear below.
 
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : NFS
Merge Handling Policy : NFS
NFS Export Server :
nfsserver
Local Mount Directory :
/nfs_tiebreaker
NFS Export Directory :
/nfs_tiebreaker
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
The configuration is added to the HACMPsplitmerge ODM database, as shown in Example 10-50.
Example 10-50 HACMPsplitmerge ODM
# odmget HACMPsplitmerge
 
HACMPsplitmerge:
id = 0
policy = "split"
value = "NFS"
 
HACMPsplitmerge:
id = 0
policy = "merge"
value = "NFS"
 
HACMPsplitmerge:
id = 0
policy = "action"
value = "Reboot"
 
HACMPsplitmerge:
id = 0
policy = "nfs_quorumserver"
value = "nfsserver"
 
HACMPsplitmerge:
id = 0
policy = "local_quorumdirectory"
value = "/nfs_tiebreaker"
 
HACMPsplitmerge:
id = 0
policy = "remote_quorumdirectory"
value = "/nfs_tiebreaker"
4. Synchronize the cluster. After the synchronization operation completes, the cluster can be activated.
Upon the cluster start, the PowerHA nodes mount the NFS automatically on both nodes, as shown in Example 10-51.
Example 10-51 NFS mount on both nodes
# clcmd mount|egrep -i "node|nfs"
NODE PHA171
node mounted mounted over vfs date options
nfsserver /nfs_tiebreaker /nfs_tiebreaker nfs4 Dec 01 08:50 vers=4,fg,soft,retry=1,timeo=10
 
NODE PHA170
node mounted mounted over vfs date options
nfsserver /nfs_tiebreaker /nfs_tiebreaker nfs4 Dec 01 08:50 vers=4,fg,soft,retry=1,timeo=10
10.9.4 Cluster split
If you enable the tie breaker split and merge policy, in a cluster split scenario, the rule is that the TBGL node has a higher priority to reserve a tie-breaker device than the other nodes. The node add its node name to the PowerHA_NFS_Reserve file, gets the reservation, and locks it. In this scenario, the file is in the /nfs_tiebreaker directory.
In our case, the PHA171 node is the current TBGL, as shown in Example 10-52 on page 373. So, it is expected that the PHA171 node survives and the PHA170 node restarts. The RG on the PHA170 node is taken to the PHA171 node.
Example 10-52 NFS Tiebreaker groupleader
# lssrc -ls IBM.ConfigRM|grep Group
Group IBM.ConfigRM:
GroupLeader: PHA171, 0xdc7bf2c9d20096c6, 2
TieBreaker GroupLeader: PHA171, 0xdc7bf2c9d20096c6, 2
In this case, we broke all communication between both nodes at 07:23:49.
Result and log on the PHA170 node
The following events occur:
07:23:49: All communication between the two nodes is broken.
07:23:59: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.
07:24:29: The PHA170 node CAA mark NODE_DOWN for the PHA171 node.
07:24:29: PowerHA triggers the split_merge_prompt split event.
07:24:35: PowerHA triggers the split_merge_prompt quorum event.
07:24:38: The PHA170 node is restarted by RSCT.
Example 10-53 shows the output of the errpt command on the PHA170 node. This node restarts at 07:24:38.
Example 10-53 Errpt on PHA170
C7E7362C 1128072416 T S cluster0 Node is heartbeating solely over disk or
4D91E3EA 1128072416 P S cluster0 A split has been detected.
2B138850 1128072416 I O ConfigRM ConfigRM received Subcluster Split event
<...>
A098BF90 1128072416 P S ConfigRM The operational quorum state of the acti
AB59ABFF 1128072416 U U LIBLVM Remote node Concurrent Volume Group fail
421B554F 1128072416 P S ConfigRM The operational quorum state of the acti
AB59ABFF 1128072416 U U LIBLVM Remote node Concurrent Volume Group fail
B80732E3 1128072416 P S ConfigRM The operating system is being rebooted t
 
# errpt -aj B80732E3
LABEL: CONFIGRM_REBOOTOS_E
IDENTIFIER: B80732E3
 
Date/Time: Mon Nov 28 07:24:38 CST 2016
Sequence Number: 1839
Machine Id: 00FA4B4E4C00
Node Id: PHA170
Class: S
Type: PERM
WPAR: Global
Resource Name: ConfigRM
 
Description
The operating system is being rebooted to ensure that critical resources are
stopped so that another sub-domain that has operational quorum may recover
these resources without causing corruption or conflict.
 
Probable Causes
Critical resources are active and the active sub-domain does not have
operational quorum.
 
Failure Causes
Critical resources are active and the active sub-domain does not have
operational quorum.
 
Recommended Actions
After node finishes rebooting, resolve problems that caused the operational
quorum to be lost.
 
Detail Data
DETECTING MODULE
RSCT,PeerDomain.C,1.99.22.299,23992
ERROR ID
REFERENCE CODE
Result and log on the PHA171 node
The following events occur:
07:23:49: All communication between the two nodes is broken.
07:24:02: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.
07:24:32: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.
07:24:32: PowerHA triggers a split_merge_prompt split event.
07:24:42: PowerHA triggers a split_merge_prompt quorum event.
07:24:43: PowerHA starts to online RG on the PHA171 node.
07:25:03: Complete the RG online operation.
From the time stamp information that is shown in Example 10-53 on page 373, PHA170 restarts at 07:24:38, and PHA171 starts to take over RGs at 07:24:43. There is no opportunity for both nodes to mount the /sharefs file system at the same time, so the data integrity is maintained.
Example 10-54 shows that the PHA171 node wrote its node name into the PowerHA_NFS_Reserve file successfully.
Example 10-54 NFS file that is written with the node name
# hostname
PHA171
 
# pwd
/nfs_tiebreaker
 
# ls -l
total 8
-rw-r--r-- 1 nobody nobody 257 Nov 28 07:24 PowerHA_NFS_Reserve
drwxr-xr-x 2 nobody nobody 256 Nov 28 04:06 PowerHA_NFS_ReserveviewFilesDir
 
# cat PowerHA_NFS_Reserve
PHA171
10.9.5 Cluster merge
After CAA services start successfully, the PowerHA_NFS_Reserve file is cleaned up for the next cluster split event. Example 10-55 shows that the size of PowerHA_NFS_Reserve file is changed to zero after the CAA service is restored.
Example 10-55 NFS file zeroed out after the CAA is restored
# ls -l
total 0
-rw-r--r-- 1 nobody nobody 0 Nov 28 09:05 PowerHA_NFS_Reserve
drwxr-xr-x 2 nobody nobody 256 Nov 28 09:05 PowerHA_NFS_ReserveviewFilesDir
10.9.6 Scenario summary
When the NFS tiebreaker is set as a split and merge policy when a cluster split occurs, the TBGL has a higher priority to reserve NFS. Other nodes restart, and the RGs are online on the surviving node.
During the cluster merge process, the NFS tiebreaker reservations are released automatically.
10.10 Scenario: Split and merge policy is manual
This section presents a split and merge manual policy scenario.
10.10.1 Scenario description
Figure 10-38 shows the topology of this scenario.
Figure 10-38 Manual split merge cluster topology
10.10.2 Split and merge configuration in PowerHA
The fast path to set the split and merge policy is smitty cm_cluster_sm_policy_chk. The full path is running smitty sysmirror and then selecting Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy.
We select Manual for the split handling policy, as shown in Example 10-56.
Example 10-56 Manual split policy
Split Handling Policy
Move cursor to desired item and press Enter.
None
TieBreaker
Manual
After pressing Enter, the configuration panel opens, as shown in Example 10-57.
Example 10-57 Manual split and merge configuration menu
Split and Merge Management Policy
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
 
[Entry Fields]
Split Handling Policy Manual
Merge Handling Policy Manual
Notify Method []
Notify Interval (seconds) []
Maximum Notifications []
Split and Merge Action Plan Reboot
When selecting Manual as the split handling policy, the merge handling policy also is Manual. This setting is required and cannot be changed.
There are other options that can be changed. Table 10-3 shows the context-sensitive help for these items. This scenario keeps the default values.
Table 10-3 Information table to help explain the split handling policy
Name
Context-sensitive help (F1)
Associated list (F4)
Notify Method
A method to be invoked in addition to a message to /dev/console to inform the operator of the need to chose which site continues after a split or merge. The method is specified as a path name, followed by optional parameters. When invoked, the last parameter is either split or merge to indicate the event.
None.
Notify Interval (seconds)
The frequency of the notification (time, in seconds, between messages) to inform the operator of the need to chose which site continues after a split or merge.
10..3600
Default is 30s, and then increases in frequency.
Maximum Notifications
The maximum number of times that PowerHA SystemMirror prompts the operator to chose which site continues after a split or merge.
3..1000
Default is infinite.
Split and Merge Action Plan
1. Reboot: Nodes on the loosing partition restart.
2. Disable Applications Auto-Start and Reboot: Nodes on the loosing partition restart. The RGs cannot be brought online until the merge finishes.
3. Disable Cluster Services Auto-Start and Reboot: Nodes on the loosing partition restart. CAA does not start. After the split condition is healed, you must run clenablepostsplit to bring the cluster back to a stable state.
1. Reboot.
2. Disable Applications Auto-Start and Reboot.
3. Disable Cluster Services Auto-Start and Reboot.
Example 10-58 shows the summary after confirming the manual policy configuration.
Example 10-58 Manual split merge configuration summary
Command: OK stdout: yes stderr: no
 
Before command completion, additional instructions may appear below.
 
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : Manual
Merge Handling Policy : Manual
Notify Method :
Notify Interval (seconds) :
Maximum Notifications :
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
The PowerHA clmgr command provides an option to display the cluster split and merge policy, as shown in Example 10-59.
Example 10-59 The clmgr output of split merge policies enabled
# clmgr view cluster SPLIT-MERGE
SPLIT_POLICY="manual"
MERGE_POLICY="manual"
ACTION_PLAN="reboot"
TIEBREAKER=""
NOTIFY_METHOD=""
NOTIFY_INTERVAL=""
MAXIMUM_NOTIFICATIONS=""
DEFAULT_SURVIVING_SITE=""
APPLY_TO_PPRC_TAKEOVER="n"
Synchronize the cluster. After the synchronization operation completes, the cluster can be activated.
10.10.3 Cluster split
Before simulating a cluster split, check its status, as described in 10.6.3, “Initial PowerHA service status for each scenario” on page 347.
In this case, we broke all communication between both nodes at 21:43:33.
Result and log on the PHA170 node
The following events occur:
21:43:33: All communication between the two nodes is broken.
21:43:43: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.
21:44:13: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.
21:44:13: The PowerHA triggers a split_merge_prompt split event.
Then, every console on the PHA170 node receives the message that is shown in Example 10-60.
Example 10-60 Manual split console confirmation message on the PHA170
Broadcast message from root@PHA170 (tty) at 21:44:14 ...
 
A cluster split has been detected.
You must decide if this side of the partitioned cluster is to continue.
To have it continue, enter
 
/usr/es/sbin/cluster/utilities/cl_sm_continue
 
To have the recovery action - Reboot - taken on all nodes on this partition, enter
 
/usr/es/sbin/cluster/utilities/cl_sm_recover
LOCAL_PARTITION 1 PHA170 OTHER_PARTITION 2 PHA171
Also, in the hacmp.out log of the PHA170 node, there is a notification that is logged about a prompt for a split notification, as shown in Example 10-61.
Example 10-61 The hacmp.out log shows a split notification
Fri Dec 2 21:44:13 CST 2016 cl_sm_prompt (19136930): EVENT START: split_merge_prompt split LOCAL_PARTITION 1 PHA170 OTHER_PARTITION 2 PHA171 1
Fri Dec 2 21:44:14 CST 2016 cl_sm_prompt (19136930): split = Manual merge = Manual which = split split = Manual merge = Manual which = split
Fri Dec 2 21:44:14 CST 2016 cl_sm_prompt (19136930): Received a split notification for which a manual response is required.
Fri Dec 2 21:44:14 CST 2016 cl_sm_prompt (19136930): In manual for a split notification with Reboot
Result and log on the PHA171 node
The following events occur:
21:43:33: All communication between the two nodes is broken.
21:43:43: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.
21:44:13: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.
21:44:13: PowerHA triggers the split_merge_prompt split event.
Every console of the PHA170 node also receives a message, as shown in Example 10-62.
Example 10-62 Manual split console confirmation message on PHA171
Broadcast message from root@PHA171 (tty) at 21:44:13 ...
 
A cluster split has been detected.
You must decide if this side of the partitioned cluster is to continue.
To have it continue, enter
 
/usr/es/sbin/cluster/utilities/cl_sm_continue
 
To have the recovery action - Reboot - taken on all nodes on this partition, enter
 
/usr/es/sbin/cluster/utilities/cl_sm_recover
LOCAL_PARTITION 2 PHA171 OTHER_PARTITION 1 PHA170
 
Note: When the cl_sm_continue command is run on one node, this node continues to survive and takes over the RG if needed. Typically, this command is run on only one of the nodes.
When the cl_sm_recover command is run on one node, this node restarts. Typically, you do not want to run this command on both nodes.
This scenario runs the cl_sm_recover command on the PHA170 node, as shown in Example 10-63. We also run the cl_sm_continue command on the PHA171 node.
Example 10-63 Running cl_sm recover on PHA170
# date
Fri Dec 2 21:44:25 CST 2016
/usr/es/sbin/cluster/utilities/cl_sm_recover
Resource Class Action Response for ResolveOpQuorumTie
Example 10-64 is the output of the errpt -c command. The PHA170 node restarts after running the cl_sm_recover command.
Example 10-64 The errpt output from the PHA170 post manual split
errpt -c
4D91E3EA 1202214416 P S cluster0 A split has been detected.
2B138850 1202214416 I O ConfigRM ConfigRM received Subcluster Split event
A098BF90 1202214416 P S ConfigRM The operational quorum state of the acti
<...>
B80732E3 1202214416 P S ConfigRM The operating system is being rebooted t
<...>
9DBCFDEE 1202214616 T O errdemon ERROR LOGGING TURNED ON
69350832 1202214516 T S SYSPROC SYSTEM SHUTDOWN BY USER
<...>
The ConfigRM service log that is shown in Example 10-65 indicates that this node restarts at 21:44:48.
Example 10-65 ConfigRM service log from PHA170
[32] 12/02/16 _CFD 21:44:48.386539 !!!!!!!!!!!!!!!!! PeerDomainRcp::haltOSExecute (method=1). !!!!!!!!!!!!!!!!!!!!!
[28] 12/02/16 _CFD 21:44:48.386540 ConfigRMUtils::log_error() Entered
[32] 12/02/16 _CFD 21:44:48.386911 logerr: In File=../../../../../src/rsct/rm/ConfigRM/PeerDomain.C (Version=1.99.22.299 Line=23992) :
CONFIGRM_REBOOTOS_ER
The operating system is being rebooted to ensure that critical resources are stopped so that another sub-domain that has operational quorum may recover these resources without causing corruption or conflict.
 
Note: To generate the IBM.ConfigRM service logs, run the following commands:
# cd /var/ct/IW/log/mc/IBM.ConfigRM
# rpttr -o dct trace.* > ConfigRM.out
Then, check the ConfigRM.out file to get the relevant logs.
After the PHA170 node restarts, run the cl_sm_continue command operation on the PHA171 node, as shown in Example 10-66.
Example 10-66 The cl_sm_continue command on the PHA171 node
# date
Fri Dec 2 21:45:08 CST 2016
# /usr/es/sbin/cluster/utilities/cl_sm_continue
Resource Class Action Response for ResolveOpQuorumTie
Then, the PHA171 node continues and proceeds to acquire the RG, as shown in the cluster.log file in Example 10-67.
Example 10-67 Cluster.log file from the PHA171 acquiring the resource group
Dec 2 21:45:26 PHA171 local0:crit clstrmgrES[10027332]: Fri Dec 2 21:45:26 Removing 1 from ml_idx
Dec 2 21:45:26 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: split_merge_prompt quorum YES@SEQ@145@QRMNT@9@DE@11@NSEQ@8@OLD@1@NEW@0
Dec 2 21:45:26 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: split_merge_prompt quorum YES@SEQ@145@QRMNT@9@DE@11@NSEQ@8@OLD@1@NEW@
0 0
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: node_down PHA170
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_down PHA170 0
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_release PHA171 1
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move PHA171 1 RELEASE
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move PHA171 1 RELEASE 0
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_release PHA171 1 0
Dec 2 21:45:28 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence PHA171 1
Dec 2 21:45:28 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence PHA171 1 0
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence PHA171 1
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence PHA171 1 0
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_acquire PHA171 1
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move PHA171 1 ACQUIRE
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: acquire_takeover_addr
Dec 2 21:45:31 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: acquire_takeover_addr 0
Dec 2 21:45:33 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move PHA171 1 ACQUIRE 0
Dec 2 21:45:33 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_acquire PHA171 1 0
Dec 2 21:45:33 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_complete PHA171 1
Dec 2 21:45:34 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_complete PHA171 1 0
Dec 2 21:45:36 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: node_down_complete PHA170
Dec 2 21:45:36 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_down_complete PHA170 0
10.10.4 Cluster merge
In this case, the PHA170 restarts. After this restart operation completes, and when the heartbeat channel is restored, then you can merge this PowerHA cluster.
The steps are similar to the one that are described in 10.8.5, “Cluster merge” on page 366.
10.10.5 Scenario summary
If you want to decide when a cluster split occurs, then use the Manual policy for split and merge.
10.11 Scenario: Active node halt policy quarantine
This section presents a scenario for an ANHP quarantine.
10.11.1 Scenario description
Figure 10-39 shows the topology of this scenario.
Figure 10-39 Active node halt policy quarantine
There are two HMCs in this scenario. Each HMC has two network interfaces: One is used to connect to the server’s FSP adapter, and the other one is used to communicate with the PowerHA nodes. In this scenario, one node tries to shut down another node through the HMC by using the ssh protocol.
The two HMCs provide high availability functions. If one HMC fails, PowerHA uses another HMC to continue operations.
10.11.2 HMC password-less access configuration
Add the HMCs host names and their IP addresses into the /etc/hosts file on the PowerHA nodes:
172.16.15.55 HMC55
172.16.15.239 HMC239
Example 10-68 shows how to set up the HMC password-less access from the PHA170 node to one HMC.
Example 10-68 The ssh password-less setup of HMC55
# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (//.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in //.ssh/id_rsa.
Your public key has been saved in //.ssh/id_rsa.pub.
The key fingerprint is:
64:f0:68:a0:9e:51:11:dc:e6:c5:fc:bf:74:36:72:cb root@PHA170
The key's randomart image is:
+--[ RSA 2048]----+
| .=+.o |
| o..o++ |
| o oo.+. |
| . o ..o . |
| o S . |
| + = |
| . B o |
| . E |
| |
+-----------------+
 
# KEY=`cat ~/.ssh/id_rsa.pub` && ssh hscroot@HMC55 mkauthkeys -a "$KEY"
Warning: Permanently added 'HMC55' (ECDSA) to the list of known hosts.
hscroot@HMC55's password: -> enter the password here
 
-> check if it is ok to access this HMC without password
# ssh hscroot@HMC55 lshmc -V
"version= Version: 8
Release: 8.4.0
Service Pack: 2
HMC Build level 20160816.1
","base_version=V8R8.4.0
"
Example 10-69 shows how to set up HMC password-less access from the PHA170 node to another HMC.
Example 10-69 The ssh password-less setup of HMC239
# KEY=`cat ~/.ssh/id_rsa.pub` && ssh hscroot@HMC239 mkauthkeys -a "$KEY"
Warning: Permanently added 'HMC239' (ECDSA) to the list of known hosts.
hscroot@HMC239's password: -> enter password here
 
(0) root @ PHA170: /.ssh
# ssh hscroot@HMC239 lshmc -V
"version= Version: 8
Release: 8.4.0
Service Pack: 2
HMC Build level 20160816.1
","base_version=V8R8.4.0
 
Note: The operation that is shown in Example 10-69 on page 383 is also repeated for the PHA171 node.
10.11.3 HMC configuration in PowerHA
Complete the following steps:
1. The SMIT fast path is smitty cm_cluster_quarintine_halt. The full path is to run smitty sysmirror and then select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy → Quarantine Policy → Active Node Halt Policy.
We choose the HMC Configuration, as shown in Example 10-70.
Example 10-70 Active node halt policy HMC configuration
Active Node Halt Policy
 
Move cursor to desired item and press Enter.
 
HMC Configuration
Configure Active Node Halt Policy
2. Select Add HMC Definition, as shown in Example 10-71 and press Enter. Then, the detailed definition menu opens, as shown in Example 10-72 on page 385.
Example 10-71 Adding an HMC
HMC Configuration
 
Move cursor to desired item and press Enter.
 
Add HMC Definition
Change/Show HMC Definition
Remove HMC Definition
Change/Show HMC List for a Node
Change/Show HMC List for a Site
Change/Show Default HMC Tunables
Change/Show Default HMC List
Example 10-72 HMC55 definition
Add HMC Definition
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* HMC name [HMC55]
DLPAR operations timeout (in minutes) []
Number of retries []
Delay between retries (in seconds) []
Nodes [PHA171 PHA170]
Sites []
Check connectivity between HMC and nodes Yes
Table 10-4 shows the help and information list for adding the HMC definition.
Table 10-4 Context-sensitive help and associated list for adding an HMC definition
Name
Context-sensitive help (F1)
Associated list (F4)
HMC name
Enter the host name for the HMC. An IP address is also accepted here. IPv4 and IPv6 addresses are supported.
Yes (single-selection).
Obtained by running the following command:
/usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc –a IP
DLPAR operations timeout (in minutes)
Enter a timeout in minutes for DLPAR commands that are run on an HMC (use the -w parameter). This –w parameter exists only on the chhwres command when allocating or releasing resources. It is adjusted according to the type of resources (for memory, 1 minute per gigabytes is added to this timeout. Setting no value means that you use the default value, which is defined in the Change/Show Default HMC Tunables panel. When -1 is displayed in this field, it indicates that the default value is used.
None. This parameter is not used in an ANHP scenario.
Number of retries
Enter a number of times one HMC command is retried before the HMC is considered as non-responding. The next HMC in the list is used after this number of retries fails. Setting no value means that you use the default value, which is defined in the Change/Show Default HMC Tunables panel. When -1 is displayed in this field, it indicates that the default value is used.
None. The default value is 5.
Delay between retries (in seconds)
Enter a delay in seconds between two successive retries. Setting no value means that you use the default value, which is defined in Change/Show Default HMC Tunables panel. When -1 is displayed in this field, it indicates that the default value is used.
None. The default value is 10s.
3. Add the first HMC55 for the two PowerHA nodes and keep the default value for the other items. Upon pressing Enter, PowerHA checks whether the current node can access HMC55 without a password, as shown in Example 10-73.
Example 10-73 HMC connectivity verification
COMMAND STATUS
 
Command: OK stdout: yes stderr: no
 
Before command completion, additional instructions may appear below.
 
 
Checking HMC connectivity between "PHA171" node and "HMC55" HMC : success!
Checking HMC connectivity between "PHA170" node and "HMC55" HMC : success!
4. Then, add another HMC, HMC239, as shown in Example 10-74.
Example 10-74 HMC239 definition
Add HMC Definition
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* HMC name [HMC239]
DLPAR operations timeout (in minutes) []
Number of retries []
Delay between retries (in seconds) []
Nodes [PHA171 PHA170]
Sites []
Check connectivity between HMC and nodes Yes
You can use the clmgr commands to show the current setting of the HMC, as shown in Example 10-75.
Example 10-75 The clmgrn command displaying the HMC configurations
(0) root @ PHA170: /
# clmgr query hmc -v
NAME="HMC55"
TIMEOUT="-1" -> ‘-1’ means use default value
RETRY_COUNT="-1" -> ‘-1’ means use default value
RETRY_DELAY="-1" -> ‘-1’ means use default value
NODES="PHA171 PHA170"
STATUS="UP"
VERSION="V8R8.4.0.2"
 
NAME="HMC239"
TIMEOUT="-1"
RETRY_COUNT="-1"
RETRY_DELAY="-1"
NODES="PHA171 PHA170"
STATUS="UP"
VERSION="V8R8.6.0.0"
 
(0) root @ PHA170: /
# clmgr query cluster hmc
DEFAULT_HMC_TIMEOUT="10"
DEFAULT_HMC_RETRY_COUNT="5"
DEFAULT_HMC_RETRY_DELAY="10"
DEFAULT_HMCS_LIST="HMC55 HMC239"
10.11.4 Quarantine policy configuration in PowerHA
Complete the following steps:
1. The SMIT fast path is smitty cm_cluster_quarintine_halt. The full path is to run smitty sysmirror and then select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy → Quarantine Policy → Quarantine Policy.
The panel that is shown in Example 10-76 opens. Select the Configure Active Node Halt Policy.
Example 10-76 Configuring the active node halt policy
Active Node Halt Policy
 
Move cursor to desired item and press Enter.
 
HMC Configuration
Configure Active Node Halt Policy
2. The window in Example 10-77 is shown. Enable the Active Node Halt Policy and set the RG testRG as the Critical Resource Group.
Example 10-77 Enabling the active node halt policy and setting and critical resource group
Active Node Halt Policy
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Active Node Halt Policy Yes +
* Critical Resource Group [testRG] +
In this scenario, there is only one RG, so we set it as the critical RG. For a description about the critical RG, see 10.3.1, “Active node halt quarantine policy” on page 328.
Example 10-78 shows the summary after pressing Enter.
Example 10-78 Cluster status summary
COMMAND STATUS
 
Command: OK stdout: yes stderr: no
 
Before command completion, additional instructions may appear below.
 
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : None
Merge Handling Policy : Majority
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
Active Node Halt Policy : Yes
Critical Resource Group : testRG
 
Note: If the split and merge policy is tiebreaker or manual, then the ANHP policy does not take effect. Make sure to set the Split Handling Policy to None before setting the ANHP policy.
3. Use the clmgr command to check the current configuration, as shown in Example 10-79.
Example 10-79 Checking the current cluster configuration
# clmgr view cluster|egrep -i "quarantine|critical"
QUARANTINE_POLICY="halt"
CRITICAL_RG="testRG"
 
# clmgr q cluster SPLIT-MERGE
SPLIT_POLICY="none"
MERGE_POLICY="majority"
ACTION_PLAN="reboot"
4. When the HMC and ANHP configuration is complete, verify and synchronize the cluster.
During the verification and synchronization process, the LPAR name and system information of the PowerHA nodes are added into the HACMPdynresop ODM database. They are used when ANHP is triggered, as shown in Example 10-80.
Example 10-80 Information that is stored in the HACMPdynresop
# odmget HACMPdynresop
 
HACMPdynresop:
key = "PHA170_LPAR_NAME"
value = "T_PHA170" -> LPAR name can be different with hostname, hostname is PHA170
 
HACMPdynresop:
key = "PHA170_MANAGED_SYSTEM"
value = "8284-22A*844B4EW" -> This value is System Model * Machine Serial Number
 
HACMPdynresop:
key = "PHA171_LPAR_NAME"
value = "T_PHA171"
 
HACMPdynresop:
key = "PHA171_MANAGED_SYSTEM"
value = "8408-E8E*842342W"
 
Note: You can obtain the LPAR name from AIX by running either uname -L or lparstat -i.
The requirements are as follows:
Hardware firmware level 840 or later
AIX 7.1 TL4 or 7.2 or later
HMC V8 R8.4.0 (PTF MH01559) with a mandatory interim fix (PTF MH01560)
Here is an example output:
(0) root @ PHA170: /
# hostname
PHA170
# uname -L
5 T_PHA170
10.11.5 Simulating a cluster split
Before simulating a cluster split, check the cluster’s status, as described in 10.6.3, “Initial PowerHA service status for each scenario” on page 347.
This scenario sets the Split Handling Policy to None and sets the Quarantine Policy to ANHP. The Critical Resource Group is testRG and is online on the PHA170 node at this time. When the cluster split occurs, it is expected that a backup node of this RG (PHA171) takes over the RG. During this process, PowerHA tries to shut down the PHA170 node through the HMC.
In this scenario, we broke all communication between two nodes at 02:44:04.
The main steps of CAA and PowerHA on the PHA171 node
The following events occur:
02:44:04: All communication between the two nodes is broken.
02:44:17: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.
02:44:47: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.
02:44:47: PowerHA triggers the split_merge_prompt split event.
02:44:52: PowerHA triggers the split_merge_prompt quorum event, and then PHA171 takes over the RG.
02:44:55: In the rg_move_acquire event, PowerHA shuts down PHA170 through the HMC.
02:46:35: The PHA171 node completes the RG takeover.
The main steps of CAA and PowerHA on PHA170 node
The following events occur:
02:44:04: All communication between the two nodes is broken.
02:44:17: The PHA170 node marks REP_DOWN for the repository disk.
02:44:17: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.
02:44:47: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.
02:44:47: PowerHA triggers a split_merge_prompt split event.
02:44:52: PowerHA triggers a split_merge_prompt quorum event.
02:44:55: The PHA170 node halts.
Example 10-81 shows the PowerHA cluster.log file of the PHA171 node.
Example 10-81 PHA171 node cluster.log file information
Dec 3 02:44:47 PHA171 EVENT START: split_merge_prompt split
Dec 3 02:44:47 PHA171 EVENT COMPLETED: split_merge_prompt split
Dec 3 02:44:52 PHA171 local0:crit clstrmgrES[7471396]: Sat Dec 3 02:44:52 Removing 1 from ml_idx
Dec 3 02:44:52 PHA171 EVENT START: split_merge_prompt quorum
Dec 3 02:44:52 PHA171 EVENT COMPLETED: split_merge_prompt quorum
Dec 3 02:44:52 PHA171 EVENT START: node_down PHA170
Dec 3 02:44:52 PHA171 EVENT COMPLETED: node_down PHA170 0
Dec 3 02:44:52 PHA171 EVENT START: rg_move_release PHA171 1
Dec 3 02:44:53 PHA171 EVENT START: rg_move PHA171 1 RELEASE
Dec 3 02:44:53 PHA171 EVENT COMPLETED: rg_move PHA171 1 RELEASE 0
Dec 3 02:44:53 PHA171 EVENT COMPLETED: rg_move_release PHA171 1 0
Dec 3 02:44:53 PHA171 EVENT START: rg_move_fence PHA171 1
Dec 3 02:44:53 PHA171 EVENT COMPLETED: rg_move_fence PHA171 1 0
Dec 3 02:44:55 PHA171 EVENT START: rg_move_fence PHA171 1
Dec 3 02:44:55 PHA171 EVENT COMPLETED: rg_move_fence PHA171 1 0
Dec 3 02:44:55 PHA171 EVENT START: rg_move_acquire PHA171 1
-> At 02:44:58, PowerHA triggered HMC to shutdown PHA170 node
Dec 3 02:46:28 PHA171 EVENT START: rg_move PHA171 1 ACQUIRE
Dec 3 02:46:28 PHA171 EVENT START: acquire_takeover_addr
Dec 3 02:46:29 PHA171 EVENT COMPLETED: acquire_takeover_addr 0
Dec 3 02:46:31 PHA171 EVENT COMPLETED: rg_move PHA171 1 ACQUIRE 0
Dec 3 02:46:31 PHA171 EVENT COMPLETED: rg_move_acquire PHA171 1 0
Dec 3 02:46:31 PHA171 EVENT START: rg_move_complete PHA171 1
Dec 3 02:46:33 PHA171 EVENT COMPLETED: rg_move_complete PHA171 1 0
Dec 3 02:46:35 PHA171 EVENT START: node_down_complete PHA170
Dec 3 02:46:35 PHA171 EVENT COMPLETED: node_down_complete PHA170 0
Example 10-82 shows the PowerHA hacmp.out file on the PHA171 node. The log indicates that PowerHA triggers a shutdown of the PHA170 node command at 02:44:55. This operation is in the PowerHA rg_move_acquire event.
Example 10-82 The PHA171 node hacmp.out file
Dec 3 2016 02:44:55 GMT -06:00 EVENT START: rg_move_acquire PHA171 1
<...>
:clhmccmd[hmccmdexec:3707] : Start ssh command at Sat Dec 3 02:44:58 CST 2016
:clhmccmd[hmccmdexec:1] ssh <...> hscroot@HMC55 'chsysstate -m SVRP8-S822-08-SN844B4EW -r lpar -o shutdown --immed -n T_PHA170 2>&1
<...>
 
Note: PowerHA on the PHA171 node shuts down the PHA170 node before acquiring the service IP and varyonvg share VG. Only when this operation completes successfully does PowerHA continue other operations. If this operation fails, PowerHA is in the error state and does not continue. So, the data in the share VG is safe.
10.11.6 Cluster merge occurs
In this case, the PHA170 node halts after the cluster split occurs. When resolving cluster split issues, start PHA170 manually. After checking that the CAA service is up by running the lscluster -m command, you can start the PowerHA service on the PHA170 node.
The steps are similar to what is described in 10.8.5, “Cluster merge” on page 366.
10.11.7 Scenario summary
Except for the cluster split and merge policies, PowerHA provides the ANHP quarantine policy to keep high availability and data safe in the case of a cluster split scenario. The policy also takes effect in case of a sick but not dead node. For more information, see 10.1.1, “Causes of a partitioned cluster” on page 317.
10.12 Scenario: Enabling the disk fencing quarantine policy
This section describes the scenario when disk fencing is enabled as the quarantine policy.
10.12.1 Scenario description
Figure 10-40 shows the topology of this scenario.
Figure 10-40 Topology scenario for the quarantine policy
In this scenario, the quarantine policy is disk fencing. There is one RG (testRG) in this cluster, so this RG is also marked as a Critical in Disk Fencing in the configuration.
There is one VG (sharevg) in this RG, and there is one hdisk in this VG. You must set the parameter reserve_policy to no_reserve for all the disks if you want to enable the disk fencing policy. In our case, hdisk2 is used, so you must run the following command on each PowerHA node:
chdev -l hdisk2 -a reserve_policy=no_reserve
10.12.2 Quarantine policy configuration in PowerHA
This section describes the quarantine policy configuration in a PowerHA cluster.
Ensuring that the active node halt policy is disabled
 
Note: If the ANHP policy is also enabled, in case of a cluster split, ANHP takes effect first.
Complete the following steps:
1. Use the SMIT fast path smitty cm_cluster_quarintine_halt, or run smitty sysmirror and then select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy → Quarantine Policy → Active Node Halt Policy.
2. Example 10-83 shows the window. Select Configure Active Node Halt Policy.
Example 10-83 Configure the active node halt policy
Active Node Halt Policy
 
Move cursor to desired item and press Enter.
 
HMC Configuration
Configure Active Node Halt Policy
3. Example 10-84 shows where you can disable the ANHP.
Example 10-84 Disable the active node halt policy
Active Node Halt Policy
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
 
[Entry Fields]
* Active Node Halt Policy No +
* Critical Resource Group [testRG]
Enabling the disk fencing quarantine policy
Use the SMIT fast path smitty cm_cluster_quarantine_disk_dialog, or you can run smitty sysmirror and select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy → Quarantine Policy → Disk Fencing.
Example 10-85 on page 393 shows that disk fencing is enabled and the Critical Resource Group is testRG.
Example 10-85 Disk fencing enabled and critical resource group selection
Disk Fencing
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Disk Fencing Yes +
* Critical Resource Group [testRG]
After pressing Enter, Example 10-86 shows the summary of the split and merge policy setting.
Example 10-86 Split and merge policy setting summary
Command: OK stdout: yes stderr: no
 
Before command completion, additional instructions may appear below.
 
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : None
Merge Handling Policy : Majority
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
Disk Fencing : Yes
Critical Resource Group : testRG
 
Note: If you want to enable only the disk fencing policy, you also must set the split handling policy to None.
Check the current settings
You can use the clmgr or the odmget command to check the current settings, as shown in Example 10-87 and Example 10-88.
Example 10-87 Checking the current cluster settings
# clmgr view cluster|egrep -i "quarantine|critical"
QUARANTINE_POLICY="fencing"
CRITICAL_RG="testRG"
Example 10-88 Checking the split and merge cluster settings
# odmget HACMPsplitmerge
 
HACMPsplitmerge:
id = 0
policy = "split"
value = "None"
 
HACMPsplitmerge:
id = 0
policy = "merge"
value = "Majority"
 
HACMPsplitmerge:
id = 0
policy = "action"
value = "Reboot"
 
HACMPsplitmerge:
id = 0
policy = "anhp"
value = "No" -->> Important, make sure ANHP is disable.
 
HACMPsplitmerge:
id = 0
policy = "critical_rg"
value = "testRG"
 
HACMPsplitmerge:
id = 0
policy = "scsi"
value = "Yes"
Performing a PowerHA cluster verification and synchronization
 
Note: Before you perform a cluster verification and synchronization, check whether the reserve_policy for the shared disks are set to no_reserve.
After the verification and synchronization, you can see that the reserve_policy of hdisk2 changed to PR_shared and also generated one PR_key_value on each node.
Example 10-89 shows the PR_key_value and reserve_policy setting in the PHA170 node.
Example 10-89 The PR_key_value and reserve_policy settings on PHA170 node
# hostname
PHA170
 
# lsattr -El hdisk2|egrep "PR|reserve_policy"
PR_key_value 0x10001472090686 Persistant Reserve Key Value True+
reserve_policy PR_shared Reserve Policy True+
 
# devrsrv -c query -l hdisk2
Device Reservation State Information
==================================================
Device Name : hdisk2
Device Open On Current Host? : NO
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 4503687439910534
Device Reservation State : NO RESERVE
Registered PR Keys : No Keys Registered
PR Capabilities Byte[2] : 0x11 CRH PTPL_C
PR Capabilities Byte[3] : 0x81 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
 
-> [HEX]0x10001472090686 = [DEC]4503687439910534
Example 10-90 shows the PR_key_value and the reserve_policy setting on the PHA171 node.
Example 10-90 PR_key_value and reserve_policy settings on the PHA171 node
# hostname
PHA171
 
# lsattr -El hdisk2|egrep "PR|reserve_policy"
PR_key_value 0x20001472090686 Persistant Reserve Key Value True+
reserve_policy PR_shared Reserve Policy True+
 
# devrsrv -c query -l hdisk2
Device Reservation State Information
==================================================
Device Name : hdisk2
Device Open On Current Host? : NO
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 9007287067281030
Device Reservation State : NO RESERVE
Registered PR Keys : No Keys Registered
PR Capabilities Byte[2] : 0x11 CRH PTPL_C
PR Capabilities Byte[3] : 0x81 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
 
-> [HEX]0x20001472090686 = [DEC]9007287067281030
10.12.3 Simulating a cluster split
Before simulating a cluster split, check the cluster’s status, as described in 10.6.3, “Initial PowerHA service status for each scenario” on page 347.
This scenario sets the split handling policy to None and sets the quarantine policy to disk fencing. The Critical Resource Group is testRG and is online on the PHA170 node at this time. When the cluster split occurs, it is expected that the backup node of this RG (PHA171) takes over the RG. During this process, PowerHA on the PHA171 node fences out PHA170 node from accessing the disk and allows itself to access it. PowerHA tries to use this method to keep the data safe.
In this case, we broke all communication between two nodes at 04:14:12.
Main steps of CAA and PowerHA on the PHA171 node
The following events occur:
04:14:12: All communication between the two nodes is broken.
04:14:24: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.
04:14:54: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.
04:14:54” PowerHA triggers a split_merge_prompt split event.
04:15:04: PowerHA triggers a split_merge_prompt quorum event, and then PHA171 took over the RG.
04:15:07: In the rg_move_acquire event, PowerHA preempts the PHA170 node from Volume Group sharevg.
04:15:14: The PHA171 node completes the RG takeover.
Example 10-91 shows the output of the PowerHA cluster.log file.
Example 10-91 PowerHA cluster.log output
Dec 3 04:14:54 PHA171 EVENT START: split_merge_prompt split
Dec 3 04:15:04 PHA171 EVENT COMPLETED: split_merge_prompt split
Dec 3 04:15:04 PHA171 local0:crit clstrmgrES[19530020]: Sat Dec 3 04:15:04 Removing 1 from ml_idx
Dec 3 04:15:04 PHA171 EVENT START: split_merge_prompt quorum
Dec 3 04:15:04 PHA171 EVENT COMPLETED: split_merge_prompt quorum
Dec 3 04:15:04 PHA171 EVENT START: node_down PHA170
Dec 3 04:15:04 PHA171 EVENT COMPLETED: node_down PHA170 0
Dec 3 04:15:05 PHA171 EVENT START: rg_move_release PHA171 1
Dec 3 04:15:05 PHA171 EVENT START: rg_move PHA171 1 RELEASE
Dec 3 04:15:05 PHA171 EVENT COMPLETED: rg_move PHA171 1 RELEASE 0
Dec 3 04:15:05 PHA171 EVENT COMPLETED: rg_move_release PHA171 1 0
Dec 3 04:15:05 PHA171 EVENT START: rg_move_fence PHA171 1
Dec 3 04:15:05 PHA171 EVENT COMPLETED: rg_move_fence PHA171 1 0
Dec 3 04:15:07 PHA171 EVENT START: rg_move_fence PHA171 1
Dec 3 04:15:07 PHA171 EVENT COMPLETED: rg_move_fence PHA171 1 0
Dec 3 04:15:07 PHA171 EVENT START: rg_move_acquire PHA171 1
-> At 04:15:07, PowerHA preempted PHA170 node from Volume Group sharevg, and continue
Dec 3 04:15:08 PHA171 EVENT START: rg_move PHA171 1 ACQUIRE
Dec 3 04:15:08 PHA171 EVENT START: acquire_takeover_addr
Dec 3 04:15:08 PHA171 EVENT COMPLETED: acquire_takeover_addr 0
Dec 3 04:15:10 PHA171 EVENT COMPLETED: rg_move PHA171 1 ACQUIRE 0
Dec 3 04:15:10 PHA171 EVENT COMPLETED: rg_move_acquire PHA171 1 0
Dec 3 04:15:10 PHA171 EVENT START: rg_move_complete PHA171 1
Dec 3 04:15:11 PHA171 EVENT COMPLETED: rg_move_complete PHA171 1 0
Dec 3 04:15:13 PHA171 EVENT START: node_down_complete PHA170
Dec 3 04:15:14 PHA171 EVENT COMPLETED: node_down_complete PHA170 0
Example 10-92 shows the output of the PowerHA hacmp.out file. It indicates that PowerHA triggers the preempt operation in the cl_scsipr_preempt script.
Example 10-92 PowerHA hacmp.out file output
Dec 3 2016 04:15:07 GMT -06:00 EVENT START: rg_move_acquire PHA171 1
...
:cl_scsipr_preempt[85] PR_Key=0x10001472090686
:cl_scsipr_preempt[106] : Node PHA170 is down, preempt PHA170 from the Volume Groups,
:cl_scsipr_preempt[107] : which are part of any Resource Group.
:cl_scsipr_preempt[109] odmget HACMPgroup
:cl_scsipr_preempt[109] sed -n $'/group =/{ s/.*"\(.*\)"/\1/; h; } /nodes =/{ /[ "]PHA170[ "]/{ g; p; } }'
:cl_scsipr_preempt[109] ResGrps=testRG
:cl_scsipr_preempt[109] typeset ResGrps
:cl_scsipr_preempt[115] clodmget -n -q group='testRG and name like *VOLUME_GROUP' -f value HACMPresource
:cl_scsipr_preempt[115] VolGrps=sharevg
:cl_scsipr_preempt[115] typeset VolGrps
:cl_scsipr_preempt[118] clpr_ReadRes_vg sharevg
Number of disks in VG sharevg: 1
hdisk2
:cl_scsipr_preempt[120] clpr_verifyKey_vg sharevg 0x20001472090686
Number of disks in VG sharevg: 1
hdisk2
:cl_scsipr_preempt[124] : Node PHA170 is down, preempting that node from Volume Group sharevg.
:cl_scsipr_preempt[126] clpr_preempt_abort_vg sharevg 0x10001472090686
Number of disks in VG sharevg: 1
hdisk2
...
Main steps of CAA and PowerHA on the PHA170 node
The following events occur:
04:14:12: All communication between the two nodes is broken.
04:14:21: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.
04:14:51: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.
04:14:51: PowerHA triggers the split_merge_prompt split event.
04:14:56: Removing 2 from ml_idx.
04:14:56: PowerHA triggers a split_merge_prompt quorum event.
04:14:58: EVENT START: node_down PHA171.
04:14:58: EVENT COMPLETED: node_down PHA171.
No other events occur on the PHA170 node.
After some time, at 04:15:16, the /sharefs file system is fenced out and the application on the PHA170 node cannot perform an update operation to it, but the application can still perform read operations from it.
Example 10-93 shows the PowerHA cluster.log file of the PHA171 node.
Example 10-93 PowerHA cluster.log file of the PHA171 node
PHA170:
4D91E3EA 1203041416 P S cluster0 A split has been detected.
2B138850 1203041416 I O ConfigRM ConfigRM received Subcluster Split event
...
A098BF90 1203041416 P S ConfigRM The operational quorum state of the acti
4BDDFBCC 1203041416 I S ConfigRM The operational quorum state of the acti
AB59ABFF 1203041416 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 1203041416 U U LIBLVM Remote node Concurrent Volume Group fail
...
65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED
E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM
EA88F829 1203041516 I O SYSJ2 USER DATA I/O ERROR
65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED
65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED
E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM
52715FA5 1203041516 U H LVDD FAILED TO WRITE VOLUME GROUP STATUS AREA
F7DDA124 1203041516 U H LVDD PHYSICAL VOLUME DECLARED MISSING
CAD234BE 1203041516 U H LVDD QUORUM LOST, VOLUME GROUP CLOSING
E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM
52715FA5 1203041516 U H LVDD FAILED TO WRITE VOLUME GROUP STATUS AREA
CAD234BE 1203041516 U H LVDD QUORUM LOST, VOLUME GROUP CLOSING
65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED
65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED
E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM
78ABDDEB 1203041516 I O SYSJ2 META-DATA I/O ERROR
78ABDDEB 1203041516 I O SYSJ2 META-DATA I/O ERROR
65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED
E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM
C1348779 1203041516 I O SYSJ2 LOG I/O ERROR
B6DB68E0 1203041516 I O SYSJ2 FILE SYSTEM RECOVERY REQUIRED
Example 10-94 shows detailed information about event EA88F829.
Example 10-94 Showing event EA88F829
LABEL: J2_USERDATA_EIO
IDENTIFIER: EA88F829
 
Date/Time: Mon Dec 3 04:15:16 CST 2016
Sequence Number: 12629
Machine Id: 00FA4B4E4C00
Node Id: PHA170
Class: O
Type: INFO
WPAR: Global
Resource Name: SYSJ2
 
Description
USER DATA I/O ERROR
 
Probable Causes
ADAPTER HARDWARE OR MICROCODE
DISK DRIVE HARDWARE OR MICROCODE
SOFTWARE DEVICE DRIVER
STORAGE CABLE LOOSE, DEFECTIVE, OR UNTERMINATED
 
Recommended Actions
CHECK CABLES AND THEIR CONNECTIONS
INSTALL LATEST ADAPTER AND DRIVE MICROCODE
INSTALL LATEST STORAGE DEVICE DRIVERS
IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE
 
Detail Data
JFS2 MAJOR/MINOR DEVICE NUMBER
0064 0001
FILE SYSTEM DEVICE AND MOUNT POINT
/dev/sharelv, /sharefs
Example 10-95 shows the output of the devrsrv command on the PHA170 node. It indicates that hdisk2 was held by the 9007287067281030 PR key, and this key belongs to the PHA171 node.
Example 10-95 The devrsrv command output of the PHA170 node
# hostname
PHA170
 
# devrsrv -c query -l hdisk2
Device Reservation State Information
==================================================
Device Name : hdisk2
Device Open On Current Host? : YES
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 4503687439910534
Device Reservation State : PR SHARED
PR Generation Value : 34
PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)
PR Holder Key Value : 0
Registered PR Keys : 9007287067281030 9007287067281030
PR Capabilities Byte[2] : 0x11 CRH PTPL_C
PR Capabilities Byte[3] : 0x81 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
Example 10-96 shows the output of the devrsrv command on the PHA171 node.
Example 10-96 The devrsrv command output of the PHA171 node
# hostname
PHA170
# devrsrv -c query -l hdisk2
Device Reservation State Information
==================================================
Device Name : hdisk2
Device Open On Current Host? : YES
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 9007287067281030
Device Reservation State : PR SHARED
PR Generation Value : 34
PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)
PR Holder Key Value : 0
Registered PR Keys : 9007287067281030 9007287067281030
PR Capabilities Byte[2] : 0x11 CRH PTPL_C
PR Capabilities Byte[3] : 0x81 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
 
Note: From the above description, you can see that the PHA171 node takes over the RG and the data in /sharefs file system is safe, and the service IP is attached on PHA171 node too. But the service IP is also online in the PHA170 node. So there is a risk that there is an IP conflict. So, you need to do some manual operations to avoid this risk, including rebooting the PHA170 node manually.
10.12.4 Simulating a cluster merge
Restarting or shutting down the PHA170 node is one method to avoid a service IP conflict.
In this scenario, restart the PHA170 node and restore all communication between the two nodes. After checking that the CAA service is up by running the lscluster -m command, start the PowerHA service on the PHA170 node.
During the start of the PowerHA service, in the node_up event, PowerHA on the PHA170 node resets the reservation for the shared disks.
Example 10-97 shows the output of the PowerHA cluster.log file on the PHA170 node.
Example 10-97 PowerHA cluster.log file on the PHA170 node
Dec 3 04:41:05 PHA170 local0:crit clstrmgrES[10486088]: Sat Dec 3 04:41:05 HACMP: clstrmgrES: VRMF fix level in product ODM = 0
Dec 3 04:41:05 PHA170 local0:crit clstrmgrES[10486088]: Sat Dec 3 04:41:05 CLSTR_JOIN_AUTO_START - This is the normal start request
Dec 3 04:41:18 PHA170 user:notice PowerHA SystemMirror for AIX: EVENT START: node_up PHA170
-> PowerHA reseted reservation for shared disks
Dec 3 04:41:20 PHA170 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_up PHA170 0
Dec 3 04:41:22 PHA170 user:notice PowerHA SystemMirror for AIX: EVENT START: node_up_complete PHA170
Dec 3 04:41:22 PHA170 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_up_complete PHA170 0
Example 10-98 shows the output of the node_up event in PHA170. The log indicates that PowerHA registers its key into the shared disks of the sharevg.
Example 10-98 The node_up event output of the PHA170 node
Dec 3 2016 04:41:18 GMT -06:00 EVENT START: node_up PHA170
...
:node_up[node_up_scsipr_init:122] clpr_reg_res_vg sharevg 0x10001472090686
Number of disks in VG sharevg: 1
hdisk2
:node_up[node_up_scsipr_init:123] (( 0 != 0 ))
:node_up[node_up_scsipr_init:139] : Checking if reservation succeeded
:node_up[node_up_scsipr_init:141] clpr_verifyKey_vg sharevg 0x10001472090686
Number of disks in VG sharevg: 1
hdisk2
:node_up[node_up_scsipr_init:142] RC1=0
:node_up[node_up_scsipr_init:143] (( 0 == 1 ))
:node_up[node_up_scsipr_init:149] (( 0 == 0 ))
:node_up[node_up_scsipr_init:153] : Reservation success
Example 10-99 shows that the PR key value of PHA170 node is registered to hdisk2. Thus, it is ready for the next cluster split event.
Example 10-99 PHA170 PR key value
# hostname
PHA171
 
# devrsrv -c query -l hdisk2
Device Reservation State Information
==================================================
Device Name : hdisk2
Device Open On Current Host? : YES
ODM Reservation Policy : PR SHARED
ODM PR Key Value : 9007287067281030
Device Reservation State : PR SHARED
PR Generation Value : 38
PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)
PR Holder Key Value : 0
Registered PR Keys : 4503687439910534 9007287067281030
9007287067281030 4503687439910534
PR Capabilities Byte[2] : 0x11 CRH PTPL_C
PR Capabilities Byte[3] : 0x81 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
Sat Dec 3 04:41:22 CST 2016
10.12.5 Scenario summary
Except for the cluster split and merge policies, PowerHA provides a disk fencing quarantine policy to keep high availability and data safe in case of cluster split scenarios. It also takes effect in the case of sick but not dead. For more information, see 10.1.1, “Causes of a partitioned cluster” on page 317.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset