Chapter 10. Cluster partitioning management update

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Cluster partitioning management update

From Version 7.1 forward, PowerHA SystemMirror provides more split and merge policies. Split and merge policies are important features in PowerHA SystemMirror because they are used to protect customers’ data consistency and maintain application running stably in cluster split scenarios and other unstable situations. They are vital for customer environments.

This chapter describes split and merge policies.

This chapter covers the following topics:

•Introduction to cluster partitioning

•PowerHA cluster split and merge policies (before PowerHA V7.2.1)

•PowerHA quarantine policy

•Changes in split and merge policies in PowerHA V7.2.1

•Considerations for using split and merge quarantine policies

•Split and merge policy testing environment

•Scenario: Default split and merge policy

•Scenario: Split and merge policy with a disk tie breaker

•Scenario: Split and merge policy with the NFS tie breaker

•Scenario: Split and merge policy is manual

•Scenario: Active node halt policy quarantine

•Scenario: Enabling the disk fencing quarantine policy

10.1 Introduction to cluster partitioning

During normal operation, cluster nodes regularly exchange messages, commonly called heartbeats, to determine the health of each other. Figure 10-1 depicts a healthy two-node PowerHA cluster.

Figure 10-1 A healthy two-node PowerHA Cluster with heartbeat messages exchanged

When both the active and backup nodes fail to receive heartbeat messages, each node falsely declares the other node to be down, as shown in Figure 10-2. When this happens, the backup node attempts to takeover the shared resources, including shared data volumes. As a result, both nodes might be writing to the shared data and caused data corruption.

Figure 10-2 Cluster that is partitioned when nodes failed to communicate through heartbeat message exchange

When a set of nodes fails to communicate with the remaining set of nodes in a cluster, the cluster is said to be partitioned. This is also known as node isolation, or more commonly, split brain.

Note: As two-node clusters are by far the most common PowerHA cluster configuration, we introduce cluster partitioning concepts in the following sections in the context of a two-node cluster. These basic concepts can be applied similarly to clusters with more than two nodes and are further elaborated where necessary.

10.1.1 Causes of a partitioned cluster

Loss of all heartbeats can be caused by one of the following situations:

•When all communication paths between the nodes fail (as shown in Figure 10-2 on page 316).

Here is an example scenario based on a real-world experience:

a. A cluster had two communication paths for heartbeat, the network and repository disk. The PowerHA network heartbeat mode was configured as multicast.

b. One day, a network configuration change was made that disabled the multicast network communication. As a result, network heartbeating no longer worked. But, system administrators were unaware of this problem because they did not monitor the PowerHA network status. The network heartbeat failure was left uncorrected.

c. The cluster continued to operate with heartbeat through the repository disk.

d. Some days later, the repository disk failed and the cluster was partitioned.

•One of the nodes is sick but not dead.

One node cannot send/receive heartbeat messages for a period, but resumes sending/receiving heartbeat messages afterward.

•Another possible scenario is:

a. There is a cluster with nodes in separate physical hosts with dual Virtual I/O Servers (VIOSs).

b. Due to some software or firmware defect, one node cannot perform I/O through the VIOSs for a period but resumes I/O afterward. This causes an intermittent loss of heartbeats through all communication paths between the nodes.

c. When the duration of I/O freeze exceeds the node failure detection time, the nodes declare each other as down and the cluster is partitioned.

Although increasing the number of communication paths for heartbeating can minimize the occurrence of cluster partitioning due to communication path failure, the possibility cannot be eliminated completely.

10.1.2 Terminology

Here is the terminology that is used throughout this chapter:

Cluster split When the nodes in a cluster fail to communicate with each other for a period, each node declares the other node as down. The cluster is split into partitions. A cluster split is said to have occurred.

Split policy A PowerHA split policy defines the behavior of a cluster when a cluster split occurs.

Cluster merge A PowerHA cluster merge policy defines the bahavior of a cluster when a cluster merge occurs.

Merge policy A PowerHA merge policy defines the behavior of a cluster when a cluster merge occurs.

Quarantine policy A PowerHA quarantine policy defines how a standby node isolates or quarantines an active node or partition from the shared data to prevent data corruption when a cluster split occurs.

Critical resource group When multiple resource groups (RGs) are configured in a cluster, the RG that is considered as most important or critical to the user is defined as the Critical Resource Group for a quarantine policy. For more information, see 10.3.1, “Active node halt quarantine policy” on page 328.

Standard cluster A standard cluster is a traditional PowerHA cluster.

Stretched cluster A stretched cluster is a PowerHA V7 cluster with nodes that are in sites within the same geographic location. All cluster nodes are connected to the same active and backup repository disks in a common storage area network (SAN).

Linked cluster A linked cluster is a PowerHA V7 cluster with nodes that are in sites in different geographic locations. Nodes in each site have their own active and backup repository disks. The active repository disks in the two sites are kept in sync by Cluster Aware AIX (CAA).

10.2 PowerHA cluster split and merge policies (before PowerHA V7.2.1)

This section provides an introduction to PowerHA split and merge policies before PowerHA for AIX V7.2.1.

For more information, see the following IBM Redbooks:

•IBM PowerHA SystemMirror for AIX Cookbook, SG24-7739

•IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278

10.2.1 Split policy

Before PowerHA V7.1, when a cluster split occurs, the backup node tries to take over the resources of the primary node, which results in a split-brain situation.

The PowerHA split policy was first introduced in PowerHA V7.1.3 with two options:

•None

This is the default option where the primary and backup nodes operate independently of each other after a split occurs, resulting in the same behavior as earlier versions during a split-brain situation.

•Tie breaker

This option is applicable to only clusters with sites configured. When a split occurs, the partition that fails to acquire the SCSI reservation on the tie-breaker disk has its nodes restarted. For a two-node cluster, one node is restarted, as shown in Figure 10-3 on page 319.

Note: EMC PowerPath disks are not supported as tie-breaker disks.

Figure 10-3 Disk tie-breaker split policy

PowerHA V7.2 added the following options to the split policy:

•Manual option

Initially, this option was applicable only to linked clusters. However, in PowerHA V7.2.1, it is now available for all cluster types. When a split occurs, each node waits for input from the user at the console to choose whether to continue running cluster services or restart the node.

•NFS support for the tie-breaker option

When a split occurs, the partition that fails to acquire a lock on the tie-breaker NFS file has its nodes restarted. For a two-node cluster, one node is restarted, as shown in Figure 10-4.

Figure 10-4 NFS tie-breaker split policy

Note: PowerHA V7.2.1 running and migrated to AIX 7.2.1 supports subcluster split and merge functions among all types of PowerHA clusters.

10.2.2 Merge policy

Before PowerHA V7.1, the default action when a merge occurs is to halt one of the nodes based on a predefined algorithm, such as halting the node with the highest node ID. There is no guarantee that the active node is not the one that is halted. The intention is to minimize the possibility of data corruption after a split-brain situation occurs.

The PowerHA merge policy was first introduced in PowerHA V7.1.3 with two options:

•Majority

This is the default option. The partition with the highest number of nodes remains online. If each partition has the same number of nodes, then the partition that has the lowest node ID is chosen. The partition that does not remain online is restarted, as specified by the chosen action plan. This behavior is similar to previous versions, as shown in Figure 10-5.

Figure 10-5 Default merge policy: Halt one of the nodes

•Tie breaker

Each partition attempts to acquire a SCSI reserve on the tie-breaker disk. The partition that cannot reserve the disk is restarted, or has cluster services that are restarted, as specified by the chosen action plan. If this option is selected, the split-policy configuration must also use the tie-breaker option.

PowerHA V7.2 added the following options to the merge policy:

•Manual option

This option is applicable only to linked clusters. When a split occurs, each node waits for input from the user at the console to choose whether to continue running cluster services or restart the node.

•Priority option

This policy indicates that the highest priority site continues to operate when a cluster merge event occurs. The sites are assigned with a priority based on the order they are listed in the site list. The first site in the site list is the highest priority site. This policy is only available for linked clusters.

3. NFS support for the tie-breaker option

When a split occurs, the partition that fails to acquire a lock on the tie-breaker NFS file has its nodes restarted. If this option is selected, the split-policy configuration must also use the tie-breaker option.

10.2.3 Configuration for the split and merge policy

Complete the following steps:

1. In the SMIT interface, select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split Management Policy, as shown in Figure 10-6.

Figure 10-6 Configuring the cluster split and merge policy

2. Select TieBreaker. The manual option is not available if the cluster you are configuring is not a linked cluster. Select either Disk or NFS as the tie breaker, as shown in Figure 10-7.

Figure 10-7 Selecting the tie-breaker disk

Disk tie-breaker split and merge policy

Select the disk to be used as the tie-breaker disk and synchronize the cluster. Figure 10-8 shows select hdisk3 as the tie breaker device.

Figure 10-8 Tie-breaker disk split policy

Figure 10-9 shows the result after confirming the configuration.

Figure 10-9 Tie-breaker disk successfully added

Before configuring a disk as tie breaker, you can check its current reservation policy by using the AIX command devrsrv, as shown in Example 10-1.

Example 10-1 The devrsrv command shows no reserve

root@testnode1[/]# devrsrv -c query -l hdisk3

Device Reservation State Information

==================================================

Device Name : hdisk3

Device Open On Current Host? : NO

ODM Reservation Policy : NO RESERVE

Device Reservation State : NO RESERVE

When cluster services are started, the first time after a disk tie breaker is configured on a node, the reservation policy of the tie-breaker disk is properly set to PR_exclusive with a persistent reserve key, as shown in Example 10-2.

Example 10-2 The devrsrv command shows PR_exclusive

root@testnode1[/]# devrsrv -c query -l hdisk3

Device Reservation State Information

==================================================

Device Name : hdisk3

Device Open On Current Host? : NO

ODM Reservation Policy : PR EXCLUSIVE

ODM PR Key Value : 8477804151029074886

Device Reservation State : NO RESERVE

Registered PR Keys : No Keys Registered

PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C

PR Capabilities Byte[3] : 0xa1 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

For a detailed description of how SCSI-3 PR (Persistent Reserve) of a tie-breaker disk works, refer to “SCSI reservation” in Appendix A of the IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278.

When the Tie Breaker option of the split policy is selected, the merge policy is automatically set with the same tie-breaker option.

NFS tie-breaker split and merge policy

This section describes the tie-breaker split and merge policy tasks.

NFS server that is used for a tie breaker

The NFS server that is used for tie breaker is connected to a physical network other than the service networks that are configured in PowerHA. A logical choice is the management network that usually exists in all data center environment.

To configure the NFS server, complete the following steps:

1. Add /etc/host entries for the cluster nodes, for example:

172.16.25.31 testnode1

172.16.15.32 testnode2

2. Configure the NFS domain by running the following command:

chnfsdom powerha

3. Start nfsrgyd by running the following command:

startsrc –s nfsrgyd

4. Add an NFS file system for storing the tie-breaker files, as shown in Figure 10-10.

Figure 10-10 Adding a directory for NFS export

Here the NFS server is used as tie breaker for two clusters, redbookcluster and RBcluster, as shown in Example 10-3 on page 325.

Example 10-3 Directories exported

Example:

[root@atsnim:/]#exportfs

/software -vers=3,public,sec=sys:krb5p:krb5i:krb5:dh,rw

/pha -vers=3:4,sec=sys:krb5p:krb5i:krb5:dh,rw

/docs -vers=3,public,sec=sys:krb5p:krb5i:krb5:dh,rw

/sybase -sec=sys:krb5p:krb5i:krb5:dh,rw,root=172.16.0.0

/leilintemp -sec=sys:none,rw

/powerhatest -sec=sys:krb5p:krb5i:krb5:dh,rw,root=testnode1

/tiebreakers/redbookcluster -vers=4,sec=sys,rw,root=testnode1:testnode2

/tiebreakers/RBcluster -vers=4,sec=sys,rw,root=testnode3:testnode4

On each PowerHA node

Complete the following tasks:

1. Add an entry for the NFS server to /etc/hosts:

10.1.1.3 tiebreaker

2. Configure the NFS domain by running the following command:

chnfsdom powerha

3. Start nfsrgyd by running the following command:

startsrc –s nfsrgyd

4. Add the NFS tie-breaker directory to be mounted, as shown in Figure 10-11.

Figure 10-11 NFS directory to mount

Configuring PowerHA on one of the PowerHA nodes

Complete the following steps:

1. Configure the PowerHA tie-breaker split/merge policy, as shown in Figure 10-12.

Figure 10-12 NFS tie-breaker split policy

a. Input the host name of NFS server exporting the tie-breaker directory for the NFS tie breaker, for example, tiebreakers.

b. Add the IP entry for the host name of the NFS server to /etc/hosts:

• Full path name of local mount point for mounting the NFS tiebreaker directory. For example, /tiebreaker.

• Full path name of the directory that is exported from the NFS server. In this case /tiebreaker.

Figure 10-13 on page 327 shows an example of the NFS tie-breaker configuration.

Figure 10-13 NFS tie-breaker configuration

2. Sync cluster

When cluster services are started on each node, tie-breaker files are created on the NFS server, as shown in Example 10-4.

Example 10-4 NFS tie-breaker files created

[root@tiebreaker:/]#ls -Rl /tiebreakers

total 0

drwxr-xr-x 3 root system 256 Nov 23 20:50 RBcluster

drwxr-xr-x 2 root system 256 May 27 2016 lost+found

drwxr-xr-x 3 root system 256 Nov 23 20:51 redbookcluster

/tiebreakers/RBcluster:

total 0

-rwx------ 1 root system 0 Nov 23 20:50 PowerHA_NFS_Reserve

drwxr-xr-x 2 root system 256 Nov 23 20:50 PowerHA_NFS_ReserveviewFilesDir

/tiebreakers/RBcluster/PowerHA_NFS_ReserveviewFilesDir:

total 16

-rwx------ 1 root system 257 Nov 23 20:50 testnode3view

-rwx------ 1 root system 257 Nov 23 20:50 testnode4view

/tiebreakers/redbookcluster:

total 0

-rwx------ 1 root system 0 Nov 23 20:51 PowerHA_NFS_Reserve

drwxr-xr-x 2 root system 256 Nov 23 20:51 PowerHA_NFS_ReserveviewFilesDir

/tiebreakers/redbookcluster/PowerHA_NFS_ReserveviewFilesDir:

total 16

-rwx------ 1 root system 257 Nov 23 20:51 testnode1view

-rwx------ 1 root system 257 Nov 23 20:51 testnode2view

10.3 PowerHA quarantine policy

This section introduces the PowerHA quarantine policy. For more information about PowerHA quarantine policies, see IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278.

Quarantine policies were first introduced in PowerHA V7.2. A quarantine policy isolates the previously active node that was hosting a critical RG after a cluster split event or node failure occurs. The quarantine policy ensures that application data is not corrupted or lost.

There are two quarantine policies:

1. Active node halt

2. Disk fencing

10.3.1 Active node halt quarantine policy

When an RG is online on a cluster node, the node is said to be the active node for that RG. The backup or standby node for the RG is a cluster node where the RG comes online when the active node fails or when the RG is manually moved over.

With the active node halt policy (ANHP), in the event of a cluster split, the standby node for a critical RG attempts to halt the active node before taking over the RG and any other related RGs. This task is done by issuing command to the HMC, as shown in Figure 10-14.

Figure 10-14 Active node halt process

If the backup node fails to halt the active node, for example, the communication failure with HMC, the RG is not taken over. This policy prevents application data corruption due to the same RGs being online on more than one node at the same time.

Now, let us elaborate why we need to define a critical RG.

In the simplest configuration of a two-node cluster with one RG, there is no ambiguity as to which node can be halted by the ANHP in the event of a cluster split. But, when there are multiple RGs in a cluster, it is not as simple:

•In a mutual takeover cluster configuration, different RGs are online on each cluster node and the nodes back up each other. An active node for one RG also is a backup or standby node for another RG. When a cluster split occurs, which node halts?

•When a cluster with multiple nodes and RGs is partitioned or split, some of the nodes in each partition might have RGs online, for example, there are multiple active nodes in each partition. Which partition can have its nodes halted?

It is unwanted to have nodes halting one another, resulting in the cluster down as a whole.

PowerHA V7.2 introduces the Critical Resource Groups for a user to define which RG is the most important one when multiple RGs are configured. The ANHP can then use the critical RG to determine which node is halted or restarted. The node or the partition with the critical RG online is halted/ restarted and quarantined, as shown in Figure 10-14 on page 328.

10.3.2 Disk fencing quarantine

With this policy, the backup node fences off the active node from the shared disks before taking over the active node’s resources, as shown in Figure 10-15. This action prevents application data corruption by preventing the RG coming online on more than one node at a time. As for the ANHP, the user also must define the Critical Resource Group for this policy.

Because this policy only fences off disks from the active node without halt or restarting it, it is configured together with a split and merge policy.

Figure 10-15 Disk fencing quarantine

10.3.3 Configuration of quarantine policies

In the SMIT interface, select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Quarantine Policy, as shown in Figure 10-16.

Figure 10-16 Active node halt policy

The active node halt

This task consists of the following steps:

1. Configure the HMC for the cluster nodes to run HMC commands remotely without the need to specify password.

2. Add the public keys (id_rsa.pub) of cluster nodes to the authorized_keys2 in the .ssh directory on the HMC.

3. Configure the HMC to be used for halting nodes when the split occurs, as shown in Figure 10-17, Figure 10-18, and Figure 10-19 on page 332.

Figure 10-17 Active node halt policy HMC configuration

Figure 10-18 HMC definition for active node halt policy

Figure 10-19 Add HMC for active node halt policy

4. Configure the ANHP and specify the Critical Resource Group, as shown in Figure 10-20, Figure 10-21 on page 333, and Figure 10-22 on page 333.

Figure 10-20 Configure active node halt policy

Figure 10-21 Critical resource group for active node halt policy

Figure 10-22 Critical resource group add success

Disk fencing

Similar to the ANHP, a critical RG must be selected to go along with it, as shown in Figure 10-23 and Figure 10-24.

Figure 10-23 Disk fencing quarantine policy

Figure 10-24 Disk fencing critical resource group

The current setting of the quarantine policy can be checked by using clmgr, as shown in Example 10-5.

Example 10-5 The clmgr command displaying the current quarantine policy

root@testnode1[/]#clmgr query cluster | grep -i quarantine

QUARANTINE_POLICY="fencing"

Important: The disk fencing quarantine policy cannot be enabled or disabled if cluster services are active.

When cluster services are started on a node after enabling the Disk Fencing quarantine policy, the reservation policy and state of the shared volumes are set to PR Shared with the PR keys of both nodes registered. This action can be observed by using the devrsrv command, as shown in Example 10-6.

Example 10-6 Query reservation policy

root@testnode3[/]#clmgr query cluster | grep -i cluster_name

CLUSTER_NAME="RBcluster"

root@testnode3[/]#clmgr query nodes

testnode4

testnode3

root@testnode3[/]#clmgr query resource_group

root@testnode3[/]#clmgr query resource_group rg | grep -i volume

VOLUME_GROUP="vg1"

root@testnode3[/]#lspv

hdisk0 00f8806f26239b8c rootvg active

hdisk2 00f8806f909bc31a caavg_private active

hdisk3 00f8806f909bc357 vg1 concurrent

hdisk4 00f8806f909bc396 vg1 concurrent

root@testnode3[/]#clRGinfo

-----------------------------------------------------------------------------

Group Name State Node

-----------------------------------------------------------------------------

rg ONLINE testnode3

OFFLINE testnode4

root@testnode3[/]#devrsrv -c query -l hdisk3

Device Reservation State Information

==================================================

Device Name : hdisk3

Device Open On Current Host? : YES

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 4503687425852313

Device Reservation State : PR SHARED

PR Generation Value : 15

PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)

PR Holder Key Value : 0

Registered PR Keys : 4503687425852313 9007287053222809

PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C

PR Capabilities Byte[3] : 0xa1 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

root@testnode3[/]#devrsrv -c query -l hdisk4

Device Reservation State Information

==================================================

Device Name : hdisk4

Device Open On Current Host? : YES

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 4503687425852313

Device Reservation State : PR SHARED

PR Generation Value : 15

PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)

PR Holder Key Value : 0

Registered PR Keys : 4503687425852313 9007287053222809

PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C

PR Capabilities Byte[3] : 0xa1 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

root@testnode4[/]#lspv

hdisk0 00f8806f26239b8c rootvg active

hdisk2 00f8806f909bc31a caavg_private active

hdisk3 00f8806f909bc357 vg1 concurrent

hdisk4 00f8806f909bc396 vg1 concurrent

root@testnode4[/]#devrsrv -c query -l hdisk3

Device Reservation State Information

==================================================

Device Name : hdisk3

Device Open On Current Host? : YES

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 9007287053222809

Device Reservation State : PR SHARED

PR Generation Value : 15

PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)

PR Holder Key Value : 0

Registered PR Keys : 4503687425852313 9007287053222809

PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C

PR Capabilities Byte[3] : 0xa1 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

root@testnode4[/]#devrsrv -c query -l hdisk4

Device Reservation State Information

==================================================

Device Name : hdisk4

Device Open On Current Host? : YES

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 9007287053222809

Device Reservation State : PR SHARED

PR Generation Value : 15

PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)

PR Holder Key Value : 0

Registered PR Keys : 4503687425852313 9007287053222809

PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C

PR Capabilities Byte[3] : 0xa1 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

The PR Shared reservation policy uses the SCSI-3 reservation of type WRITE EXCLUSIVE, ALL REGISTRANTS, as shown in Example 10-7 on page 337. Only nodes that are registered can write to the shared volumes. When a cluster split occurs, the standby node ejects the PR registration of the active node on all shared volumes of the affected RGs. In Example 10-6 on page 335, the only registrations that are left on hdisk3 and hdisk4 are of testnode4, effectively fencing off testnode3 from the shared volumes.

Note: Only a registered node can eject the registration of other nodes.

Example 10-7 WRITE EXCLUSIVE, ALL REGISTRANTS PR type

root@testnode4[/]#devrsrv -c query -l hdisk3

Device Reservation State Information

==================================================

Device Name : hdisk3

Device Open On Current Host? : YES

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 9007287053222809

Device Reservation State : PR SHARED

PR Generation Value : 15

PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)

PR Holder Key Value : 0

Registered PR Keys : 9007287053222809

PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C

PR Capabilities Byte[3] : 0xa1 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

root@testnode4[/]#devrsrv -c query -l hdisk4

Device Reservation State Information

==================================================

Device Name : hdisk4

Device Open On Current Host? : YES

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 9007287053222809

Device Reservation State : PR SHARED

PR Generation Value : 15

PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)

PR Holder Key Value : 0

Registered PR Keys : 9007287053222809

PR Capabilities Byte[2] : 0x15 CRH ATP_C PTPL_C

Node testnode3 is again registered on hdisk3 and hdisk4 when it has successfully rejoins testnode4 to form a cluster. You must perform a restart of cluster services on testnode3.

10.4 Changes in split and merge policies in PowerHA V7.2.1

This section provides a list of changes that are associated with the split and merge policies that are introduced in PowerHA V7.2.1 for AIX 7.2.1:

•Split and merge policies are configurable for all cluster types when AIX is at Version 7.2.1, as summarized in Table 10-1.

Table 10-1 Split and merge policies for all cluster types

Cluster Type	Pre AIX 7.2.1		AIX 7.2.1
	Split policy	Merge policy	Split and merge policy
Standard	Not supported		None-Majority TB (Disk)-TB (Disk) TB (NFS)-TB (NFS) Manual-Manual
Stretched	None	Majority
Stretched	TieBreaker	TieBreaker
Linked	None	Majority
	TieBreaker	TieBreaker
	Manual	Manual

•Split and merge policies are configured as a whole instead of separately. These options can also vary a bit based on the exact AIX dependency.

•The action plan for the split and merge policy is configurable.

•An entry is added to the Problem Determination Tools menu for starting cluster services on merged node after a cluster split.

•Changes were added to clmgr for configuring the split and merge policy.

10.4.1 Configuring the split and merge policy by using SMIT

The split and merge policies are now configured as a whole, as shown in Figure 10-25, instead of separately, as described in 10.2.3, “Configuration for the split and merge policy” on page 321.

Figure 10-25 Configuring the split handling policy

All three options, None, Tie Breaker, and Manual, are now available for all cluster types, which includes standard, stretched, and linked clusters.

Before PowerHA V7.2.1, the split policy has a default setting of None and the merge policy has default setting of Majority and the default action was Reboot (Figure 10-26). This behavior has not changed.

Figure 10-26 Split and merge action plan menu

For the Tie Breaker option, the action plan for split and merge is now configurable as follows (Figure 10-27):

•Reboot.

This is the default option before PowerHA V7.1.2. The nodes of the losing partition are restarted when a cluster split occurs.

•Disable applications auto-tart and reboot.

On a split event, the nodes on the losing partition are restarted, and the RGs cannot be brought online automatically after restart.

•Disable Cluster Services Auto-Start and Reboot.

Upon a split event, the nodes on the losing partition are restarted. The cluster services, CAA/RSCT/PowerHA, are not started on restarted. After the split condition is healed, select Start CAA on Merged Node from SMIT to enable the cluster services and bring the cluster to a stable state.

Note: If you specify the Split-Merge policy as None-None, the action plan is not implemented and a restart does not occur after the cluster split and merge events. This option is only available in your environment if it is running IBM AIX 7.2 with Technology Level 1, or later.

Figure 10-27 Disk tie breaker split and merge action plan

Similarly, Figure 10-28 shows the NFS TieBreaker policy SMIT window.

Figure 10-28 NFS tie breaker split and merge action plan

10.4.2 Configuring the split and merge policy by using clmgr

The clmgr utility has the following changes for the split and merge policy configuration (Figure 10-29):

•Added a none option to the merge policy.

•There is a local and remote quorum directory.

•Added diable_rgs_autostart and disable_cluster_services_autostart options to the action plan.

Figure 10-29 The clmgr split and merge options

The Split/Merge policy of none/none can be configured only by using clmgr, as shown in Example 10-8. There is no SMIT option to configure this option.

Example 10-8 The clmgr modify split/merge policy to none

# clmgr modify cluster SPLIT_POLICY=none MERGE_POLICY=none

The PowerHA SystemMirror split and merge policies have been updated.

Current policies are:

Split Handling Policy : None

Merge Handling Policy : None

Split and Merge Action Plan : Reboot

The configuration must be synchronized to make this change known across the cluster.

10.4.3 Starting cluster services after a split

If the split-merge action plan of disable cluster services auto start is chosen in the configuration, then on a split event the losing partition nodes are restarted without bringing the cluster services online until these services are manually enabled.

It is required to enable the cluster services after a split situation is healed. Until the user resolves this enablement, the cluster services are not running on the losing partition nodes even after the networks rejoin. The losing partition nodes join the existing CAA cluster after the re-enable is performed. This is done by running smitty sysmirror and selecting Problem Determination Tools → Start CAA on Merged Node, as shown in Figure 10-30.

Figure 10-30 Starting Cluster Aware AIX on the merged node

10.4.4 Migration and limitation

Multiple split or merge situations cannot be handled at one time. For example, in the case of an asymmetric topology (AST), where some nodes have visibility to both islands, the nodes do not form a clean split. In such cases, a split event is not generated when AST halts a node to correct the asymmetry.

With the NFS tiebreaker split policy configured, if the tie breaker group leader (TBGL) node is restarted, then all other nodes in the winning partition are restarted. No preemption is supported in this case.

Tie-breaker disk preemption does not work in the case of a TBGL hard restart or power off.

The merge events are not available in a stretched cluster with versions earlier to AIX 7.2.1, as shown in Figure 10-31.

Figure 10-31 Split merge policies pre- and post-migration

10.5 Considerations for using split and merge quarantine policies

A split and merge policy is used for deciding which node or partition can be restarted when a cluster split occurs. A quarantine policy is used for fencing off, or quarantining, the active node from shared disks when a cluster split occurs. Both types of policies are designed to prevent data corruption in the event of cluster partitioning.

The quarantine policy does not require additional infrastructure resources, but the split and merge policy does. Users select the appropriate policy or combination of policies that suit their data center environments.

For example, instead of using the disk tie-breaker split and merge policy that requires one disk tie breaker per cluster, you want to use a single NFS server as a tie breaker for multiple clusters (Figure 10-32) to minimize resource requirements. This is a tradeoff between resource and effectiveness.

Figure 10-32 Using a single NFS server as a tie breaker for multiple clusters

For those who want to prevent only the possibility of data corruption with minimal configuration, and are satisfied with possible manual intervention that is required in the event of a cluster split, you can use the disk fencing quarantine policy. Again, this is a tradeoff. Figure 10-33 presents a comparison summary of these policies.

Figure 10-33 Comparison summary of split and merge policies

10.6 Split and merge policy testing environment

Figure 10-34 shows the topology of testing scenarios in this chapter.

Figure 10-34 Testing scenario for the split and merge policy

Our testing environment is a single PowerHA standard cluster. It includes two AIX LPARs with nodes host names PHA170 and PHA171. Each node has two network interfaces. One interface is used for communication with HMCs and NFS server, and the other is used in the PowerHA cluster. Each node has three FC adapters. The first adapter is used for rootvg, the second adapter is used for user shared data access, and the third one is used for tie-breaker access.

The PowerHA cluster is a basic configuration with the specific configuration option for different split and merge policies.

10.6.1 Basic configuration

Table 10-2 shows the PowerHA cluster’s attributes. This is a basic two-node PowerHA standard cluster.

Table 10-2 PowerHA cluster’s configuration

Component	PHA170	PHA171
Cluster name	PHA_cluster Cluster type: Standard Cluster or No Site Cluster (NSC)
Network interface	en0: 172.16.51.170 neTmask: 255.255.255.0 Gateway: 172.16.51.1 en1: 172.16.15.242	en0: 172.16.51.171 Netmask: 255.255.255.0 Gateway: 172.16.51.1 en1: 172.16.15.243
Network	net_ether_01 (172.16.51.0/24)
CAA	Unicast Repository disk: hdisk1
Shared VG	sharevg:hdisk2
Service IP	172.16.51.172 PHASvc
Resource Group	Resource Group testRG: •Startup Policy: Online On Home Node Only •Fallover Policy: Fallover To Next Priority Node In The List •Fallback Policy: Never Fallback •Participating Nodes: PHA170 PHA171 •Service IP Label: PHASvc •Volume Group: sharevg

10.6.2 Specific hardware configuration for some scenarios

This section describes the specific hardware configurations for some scenarios.

Split and merge policy is tie breaker (disk)

In this scenario, add one shared disk, hdisk2, to act as the tie breaker.

Split and merge policy is tie breaker (NFS)

In this scenario, add one network file system (NFS) node to act as the tie breaker.

Quarantine policy is active node halt policy

In this scenario, add two HMCs that are used to shut down the relevant LPARs in case of a cluster split scenario.

The following sections contain the detailed PowerHA configuration of each scenario.

10.6.3 Initial PowerHA service status for each scenario

Each scenario has the same start status for the PowerHA and CAA service’s status. We show the status in this section because we do not show it in each scenario.

PowerHA configuration

Example 10-9 shows PowerHA basic configuration with the cltopinfo command.

Example 10-9 PowerHA basic configuration that is shown with the cltopinfo command

# cltopinfo

Cluster Name: PHA_Cluster

Cluster Type: Standard

Heartbeat Type: Unicast

Repository Disk: hdisk1 (00fa2342a1093403)

There are 2 node(s) and 1 network(s) defined

NODE PHA170:

Network net_ether_01

PHASvc 172.16.51.172

PHA170 172.16.51.170

NODE PHA171:

Network net_ether_01

PHASvc 172.16.51.172

PHA171 172.16.51.171

Resource Group testRG

Startup Policy Online On Home Node Only

Fallover Policy Fallover To Next Priority Node In The List

Fallback Policy Never Fallback

Participating Nodes PHA170 PHA171

Service IP Label PHASvc

Volume Group sharevg

PowerHA service

Example 10-10 shows the PowerHA nodes status from each PowerHA node.

Example 10-10 PowerHA nodes status in each scenario before a cluster split

# clmgr -cv -a name,state,raw_state query node

# NAME:STATE:RAW_STATE

PHA170:NORMAL:ST_STABLE

PHA171:NORMAL:ST_STABLE

Example 10-11 shows the PowerHA RG status from each PowerHA node. The RG (testRG) is online on PHA170 node.

Example 10-11 PowerHA Resource Group status in each scenario before the cluster split

# clRGinfo -v

Cluster Name: PHA_Cluster

Resource Group Name: testRG

Startup Policy: Online On Home Node Only

Fallover Policy: Fallover To Next Priority Node In The List

Fallback Policy: Never Fallback

Site Policy: ignore

Node State

---------------------------------------------------------------- ---------------

PHA170 ONLINE

PHA171 OFFLINE

CAA service status

Example 10-12 shows the CAA configuration with the lscluster -c command.

Example 10-12 Showing the CAA cluster configuration with the lscluster -c command

# lscluster -c

Cluster Name: PHA_Cluster

Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20

Number of nodes in cluster = 2

Cluster ID for node PHA170: 1

Primary IP address for node PHA170: 172.16.51.170

Cluster ID for node PHA171: 2

Primary IP address for node PHA171: 172.16.51.171

Number of disks in cluster = 1

Disk = hdisk1 UUID = 58a286b2-fe51-5e39-98b1-43acf62025ab cluster_major = 0 cluster_minor = 1

Multicast for site LOCAL: IPv4 228.16.51.170 IPv6 ff05::e410:33aa

Communication Mode: unicast

Local node maximum capabilities: SPLT_MRG, CAA_NETMON, AUTO_REPOS_REPLACE, HNAME_CHG, UNICAST, IPV6, SITE

Effective cluster-wide capabilities: SPLT_MRG, CAA_NETMON, AUTO_REPOS_REPLACE, HNAME_CHG, UNICAST, IPV6, SITE

Local node max level: 50000

Effective cluster level: 50000

Example 10-13 shows the CAA configuration with the lscluster -d command.

Example 10-13 CAA cluster configuration

# lscluster -d

Storage Interface Query

Cluster Name: PHA_Cluster

Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20

Number of nodes reporting = 2

Number of nodes expected = 2

Node PHA170

Node UUID = 28945a80-b516-11e6-8007-faac90b6fe20

Number of disks discovered = 1

hdisk1:

State : UP

uDid : 33213600507680284001D5800000000005C8B04214503IBMfcp

uUid : 58a286b2-fe51-5e39-98b1-43acf62025ab

Site uUid : 51735173-5173-5173-5173-517351735173

Type : REPDISK

Node PHA171

Node UUID = 28945a3a-b516-11e6-8007-faac90b6fe20

Number of disks discovered = 1

hdisk1:

State : UP

uDid : 33213600507680284001D5800000000005C8B04214503IBMfcp

uUid : 58a286b2-fe51-5e39-98b1-43acf62025ab

Site uUid : 51735173-5173-

Note: For production environments, configure additional backup repository disks.

PowerHA V7.2 supports up to six backup repository disks. It also supports automatic repository disk replacement in the event of repository disk failure. For more information, see IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278.

Example 10-14 and Example 10-15 show output from PHA170 and PHA171 nodes with the lscluster -m command. The current heartbeat channel is the network.

Example 10-14 CAA information from node PHA170

# hostname

PHA170

# lscluster -m

Calling node query for all nodes...

Node query number of nodes examined: 2

Node name: PHA171

Cluster shorthand id for node: 2

UUID for node: 28945a3a-b516-11e6-8007-faac90b6fe20

State of node: UP

Reason: NONE

Smoothed rtt to node: 7

Mean Deviation in network rtt to node: 3

Number of clusters node is a member in: 1

CLUSTER NAME SHID UUID

PHA_Cluster 0 28bf3ac0-b516-11e6-8007-faac90b6fe20

SITE NAME SHID UUID

LOCAL 1 51735173-5173-5173-5173-517351735173

Points of contact for node: 1

-----------------------------------------------------------------------

Interface State Protocol Status SRC_IP->DST_IP

-----------------------------------------------------------------------

tcpsock->02 UP IPv4 none 172.16.51.170->172.16.51.171

Example 10-15 CAA information from node PHA171

# hostname

PHA171

# lscluster -m

Calling node query for all nodes...

Node query number of nodes examined: 2

Node name: PHA170

Cluster shorthand id for node: 1

UUID for node: 28945a80-b516-11e6-8007-faac90b6fe20

State of node: UP

Reason: NONE

Smoothed rtt to node: 7

Mean Deviation in network rtt to node: 3

Number of clusters node is a member in: 1

CLUSTER NAME SHID UUID

PHA_Cluster 0 28bf3ac0-b516-11e6-8007-faac90b6fe20

SITE NAME SHID UUID

LOCAL 1 51735173-5173-5173-5173-517351735173

Points of contact for node: 1

-----------------------------------------------------------------------

Interface State Protocol Status SRC_IP->DST_IP

-----------------------------------------------------------------------

tcpsock->01 UP IPv4 none 172.16.51.171->172.16.51.170

Example 10-16 shows the current heartbeat devices that are configured in the testing environment. There is not a SAN-based heartbeat device.

Example 10-16 CAA interfaces

# lscluster -g

Network/Storage Interface Query

Cluster Name: PHA_Cluster

Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20

Number of nodes reporting = 2

Number of nodes stale = 0

Number of nodes expected = 2

Node PHA171

Node UUID = 28945a3a-b516-11e6-8007-faac90b6fe20

Number of interfaces discovered = 2

Interface number 1, en0

IFNET type = 6 (IFT_ETHER)

NDD type = 7 (NDD_ISO88023)

MAC address length = 6

MAC address = FA:9D:66:B2:87:20

Smoothed RTT across interface = 0

Mean deviation in network RTT across interface = 0

Probe interval for interface = 990 ms

IFNET flags for interface = 0x1E084863

NDD flags for interface = 0x0021081B

Interface state = UP

Number of regular addresses configured on interface = 1

IPv4 ADDRESS: 172.16.51.171 broadcast 172.16.51.255 netmask 255.255.255.0

Number of cluster multicast addresses configured on interface = 1

IPv4 MULTICAST ADDRESS: 228.16.51.170

Interface number 2, dpcom

IFNET type = 0 (none)

NDD type = 305 (NDD_PINGCOMM)

Smoothed RTT across interface = 750

Mean deviation in network RTT across interface = 1500

Probe interval for interface = 22500 ms

IFNET flags for interface = 0x00000000

NDD flags for interface = 0x00000009

Interface state = UP RESTRICTED AIX_CONTROLLED

Node PHA170

Node UUID = 28945a80-b516-11e6-8007-faac90b6fe20

Number of interfaces discovered = 2

Interface number 1, en0

IFNET type = 6 (IFT_ETHER)

NDD type = 7 (NDD_ISO88023)

MAC address length = 6

MAC address = FA:AC:90:B6:FE:20

Smoothed RTT across interface = 0

Mean deviation in network RTT across interface = 0

Probe interval for interface = 990 ms

IFNET flags for interface = 0x1E084863

NDD flags for interface = 0x0161081B

Interface state = UP

Number of regular addresses configured on interface = 1

IPv4 ADDRESS: 172.16.51.170 broadcast 172.16.51.255 netmask 255.255.255.0

Number of cluster multicast addresses configured on interface = 1

IPv4 MULTICAST ADDRESS: 228.16.51.170

Interface number 2, dpcom

IFNET type = 0 (none)

NDD type = 305 (NDD_PINGCOMM)

Smoothed RTT across interface = 594

Mean deviation in network RTT across interface = 979

Probe interval for interface = 15730 ms

IFNET flags for interface = 0x00000000

NDD flags for interface = 0x00000009

Interface state = UP RESTRICTED AIX_CONTROLLED

Note: To identify physical FC adapters that can be used in the PowerHA cluster as the SAN-based heartbeat, go to the IBM Knowledge Center.

At the time of writing, there is no plan to support this feature for all 16-Gb FC adapters.

Shared file system status

Example 10-17 shows that the /sharefs file system is mounted on PHA170 node. This is because the RG is online on this node.

Example 10-17 Shared file system status

(0) root @ PHA170: /

# df

Filesystem 512-blocks Free %Used Iused %Iused Mounted on

...

/dev/sharelv 1310720 1309864 1% 4 1% /sharefs

10.7 Scenario: Default split and merge policy

This section shows a scenario with the default split and merge policy.

10.7.1 Scenario description

Figure 10-35 shows the topology of the default split and merge scenario.

Figure 10-35 Topology of the default split and merge scenario

This scenario keeps the default configuration for the split and merge policy and does not set the quarantine policy. To simulate a cluster split, break the network communication between the two PowerHA nodes, and disable the repository disk access from the PHA170 node.

After a cluster split occurs, restore communications to generate a cluster merge event.

10.7.2 Split and merge configuration in PowerHA

In this scenario, it is not required to set specific parameters for the split and merge policy because it is the default policy. The clmgr command can be used to display the current policy, as shown in Example 10-18.

Example 10-18 The clmgr command displays the current split/merge settings

# clmgr view cluster SPLIT-MERGE

SPLIT_POLICY="none"

MERGE_POLICY="majority"

ACTION_PLAN="reboot"

<...>

Complete the following steps:

1. To change the current split and merge policy from the default by using SMIT, use the fast path of smitty cm_cluster_sm_policy_chk. Otherwise, run smitty sysmirror and select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy. Example 10-19 shows the window where you select the None option.

Example 10-19 Split handling policy

Split Handling Policy

Move cursor to desired item and press Enter.

None

TieBreaker

Manual

After pressing Enter, the menu shows the policy, as shown in Example 10-20.

Example 10-20 Split and merge management policy

Split and Merge Management Policy

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Split Handling Policy None

Merge Handling Policy Majority +

Split and Merge Action Plan Reboot

2. Keep the default values and upon pressing Enter, you see the summary that is shown in Example 10-21.

Example 10-21 Successful setting of the split and merge policy

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

The PowerHA SystemMirror split and merge policies have been updated.

Current policies are:

Split Handling Policy : None

Merge Handling Policy : Majority

Split and Merge Action Plan : Reboot

The configuration must be synchronized to make this change known across the cluster.

3. Synchronize the cluster. After the synchronization operation is complete, the cluster can be activated.

10.7.3 Cluster split

Before simulating a cluster split, check the status, as described in 10.6.3, “Initial PowerHA service status for each scenario” on page 347.

In this case, we sever all communications between two nodes at 21:55:23.

Steps of CAA and PowerHA on PHA170 node

The following events occur:

•21:55:23: All communication between the two nodes is broken.

•21:55:23: The PHA170 node marks REP_DOWN for the repository disk.

•21:55:33: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.

•21:56:02: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.

•21:56:02: PowerHA triggers the split_merge_prompt split event.

•21:56:11: PowerHA triggers the split_merge_prompt quorum event.

Then, keep the current PowerHA service status.

Steps of CAA and PowerHA on PHA171 node

The following events occur:

•21:55:23: All communication between the two nodes is broken.

•21:55:33: PHA171 node CAA marked ADAPTER_DOWN for PHA170 node.

•21:56:02: PHA171 node CAA marked NODE_DOWN for PHA170 node.

•21:56:02: PowerHA triggered split_merge_prompt split event.

•21:56:07: PowerHA triggered split_merge_prompt quorum event.

Note: The log file of the CAA service is /var/adm/ras/syslog.caa.

Then, PHA171 takes over the RG.

You see that PHA171 took over the RG while the RG is still online on the PHA170 node.

Note: The duration between REP_DOWN or ADAPTER_DOWN to NODE DOWN is 30 seconds. This duration is controlled by the CAA parameter node_timeout. Its value can be shown by running the following command:

# clctrl -tune -L node_timeout

Here is the output:

NAME DEF MIN MAX UNIT SCOPE

ENTITY_NAME(UUID) CUR

--------------------------------------------------------------------------------

node_timeout 20000 10000 600000 milliseconds c n

PHA_Cluster(28bf3ac0-b516-11e6-8007-faac90b6fe20) 30000

--------------------------------------------------------------------------------

To change this value, either run the PowerHA clmgr command or use the SMIT menu:

•From the SMIT menu, run smitty sysmirror, select Custom Cluster Configuration → Cluster Nodes and Networks → Manage the Cluster → Cluster heartbeat settings, and then change the Node Failure Detection Timeout parameter.

•To use the clmgr command, run the following command:

clmgr modify cluster HEARTBEAT_FREQUENCY= <the value you want to set, default is 30>

Displaying the resource group status from the PHA170 node after the cluster split

Example 10-22 shows that the PHA170 node cannot get the PHA171 node’s status.

Example 10-22 Resource group unknown status post split

# hostname

PHA170

# clmgr -cv -a name,state,raw_state query node

# NAME:STATE:RAW_STATE

PHA170:NORMAL:ST_RP_RUNNING

PHA171:UNKNOWN:UNKNOWN

Example 10-23 shows that the RG is online on PHA170 node.

Example 10-23 Resource group still online PHA170 post split

# hostname

PHA170

# clRGinfo

Node State

---------------------------------------------------------------- ---------------

PHA170 ONLINE

PHA171 OFFLINE

Example 10-24 shows that the VG sharevg is varied on, and the file system /sharefs is mounted on PHA170 node and is writable.

Example 10-24 Volume group still online PHA170 post split

# hostname

PHA170

# lsvg sharevg

VOLUME GROUP: sharevg VG IDENTIFIER: 00fa4b4e00004c0000000158a8e55930

VG STATE: active PP SIZE: 32 megabyte(s)

VG PERMISSION: read/write TOTAL PPs: 29 (928 megabytes)

MAX LVs: 256 FREE PPs: 8 (256 megabytes)

# df

Filesystem 512-blocks Free %Used Iused %Iused Mounted on

...

/dev/sharelv 1310720 1309864 1% 4 1% /sharefs

Displaying the resource group status from the PHA171 node after the cluster split

Example 10-25 shows that the PHA171 node cannot get the PHA170 node’s status.

Example 10-25 Resource group warning and unknown on PHA171

# hostname

PHA171

# clmgr -cv -a name,state,raw_state query node

# NAME:STATE:RAW_STATE

PHA170:UNKNOWN:UNKNOWN

PHA171:WARNING:WARNING

Example 10-26 shows that the RG is online on PHA171 node too.

Example 10-26 Resource group online PHA171 post split

# hostname

PHA171

# clRGinfo

Node State

---------------------------------------------------------------- ---------------

PHA170 OFFLINE

PHA171 ONLINE

Example 10-27 shows that the VG sharevg is varied on and the file system /sharefs is mounted on PHA171 node, and it is writable too.

Example 10-27 Sharevg online on PHA171 post split

# hostname

PHA171

# lsvg sharevg

VOLUME GROUP: sharevg VG IDENTIFIER: 00fa4b4e00004c0000000158a8e55930

VG STATE: active PP SIZE: 32 megabyte(s)

VG PERMISSION: read/write TOTAL PPs: 29 (928 megabytes)

MAX LVs: 256 FREE PPs: 8 (256 megabytes)

<...>

# df

Filesystem 512-blocks Free %Used Iused %Iused Mounted on

<...>

/dev/sharelv 1310720 1309864 1% 4 1% /sharefs

As seen in Example 10-7 on page 337, the /sharefs file system is mounted on both nodes and in writable mode. Applications on two nodes can write at the same time. This is risky and easily can result in data corruption.

Note: This situation must always be avoided in a production environment.

10.7.4 Cluster merge

After the cluster split occurs, the RG was online on PHA171 node while it was still online on the PHA170 node. When the PowerHA cluster heartbeat communication is restored at 22:24:08, a PowerHA merge event was triggered.

The default merge policy is Majority and the action plan is Reboot. However, in our case, the rule in the cluster merge event is:

The node that has a lower node ID s survives, and the other node is restarted by RSCT.

This rule is also introduced in 10.2.2, “Merge policy” on page 320.

Example 10-28 shows how to display a PowerHA node’s node ID. You can see that PHA170 has the lower ID, so it is expected that PHA171 node is restarted.

Example 10-28 How to show a node ID for PowerHA nodes

# ./cl_query_hn_id

CAA host PHA170 with node id 1 corresponds to PowerHA node PHA170

CAA host PHA171 with node id 2 corresponds to PowerHA node PHA171

# lscluster -c

Cluster Name: PHA_Cluster

Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20

Number of nodes in cluster = 2

Cluster ID for node PHA170: 1

Primary IP address for node PHA170: 172.16.51.170

Cluster ID for node PHA171: 2

Primary IP address for node PHA171: 172.16.51.171

Number of disks in cluster = 1

Disk = hdisk1 UUID = 58a286b2-fe51-5e39-98b1-43acf62025ab cluster_major = 0 cluster_minor = 1

Multicast for site LOCAL: IPv4 228.16.51.170 IPv6 ff05::e410:33aa

Example 10-29 shows that the PHA171 node was rebooted at 22:25:02.

Example 10-29 Display error report with the errpt -c command

# hostname

PHA171

# errpt -c

A7270294 1127222416 P S cluster0 A merge has been detected.

78142BB8 1127222416 I O ConfigRM ConfigRM received Subcluster Merge event

F0851662 1127222416 I S ConfigRM The sub-domain containing the local node

9DEC29E1 1127222416 P O cthags Group Services daemon exit to merge doma

9DBCFDEE 1127222516 T O errdemon ERROR LOGGING TURNED ON

69350832 1127222516 T S SYSPROC SYSTEM SHUTDOWN BY USER

# errpt -aj 69350832

LABEL: REBOOT_ID

IDENTIFIER: 69350832

Date/Time: Sun Nov 27 22:25:02 CST 2016

Sequence Number: 701

Machine Id: 00FA23424C00

Node Id: PHA171

Class: S

Type: TEMP

WPAR: Global

Resource Name: SYSPROC

Description

SYSTEM SHUTDOWN BY USER

Probable Causes

SYSTEM SHUTDOWN

Detail Data

USER ID

0=SOFT IPL 1=HALT 2=TIME REBOOT

TIME TO REBOOT (FOR TIMED REBOOT ONLY)

PROCESS ID

13959442

PARENT PROCESS ID

4260250

PROGRAM NAME

hagsd

PARENT PROGRAM NAME

srcmstr

10.7.5 Scenario summary

With the default split and merge policy, when a cluster split happens, the RG is online on both PowerHA nodes. This is a risky situation that can result in data corruption. Careful planning must be done to avoid this scenario.

10.8 Scenario: Split and merge policy with a disk tie breaker

This section describes the split and merge policy scenario with a disk tie breaker.

10.8.1 Scenario description

Figure 10-36 is the reference topology for this scenario.

Figure 10-36 Split and merge topology scenario

There is one new shared disk, hdisk3, that is added in this scenario, which is used for the disk tie breaker.

Note: When using a tie-breaker disk for split and merge recovery handling, the disk must also be supported by the devrsrv command. This command is part of the AIX operating system.

At the time of writing, the EMC PowerPath disks are not supported for use as a tie-breaker disk.

Note: The tie-breaker disk is set to no_reserve for the reserve_policy with the chdev command before the start of the PowerHA service on both nodes. Otherwise, the tie-breaker policy cannot take effect in a cluster split event.

10.8.2 Split and merge configuration in PowerHA

Complete the following steps:

1. The fast path to set the split and merge policy is smitty cm_cluster_sm_policy_chk. The whole path is running smitty sysmirror and selecting Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy.

Example 10-30 shows the window to select the split handling policy; in this case, TieBreaker is selected.

Example 10-30 TieBreaker split handling policy

Split Handling Policy

Move cursor to desired item and press Enter.

None

TieBreaker

Manual

2. After pressing Enter, select the Disk option, as shown in Example 10-31.

Example 10-31 Select Tiebreaker

Select TieBreaker Type

Move cursor to desired item and press Enter.

Disk

NFS

F1=Help F2=Refresh F3=Cancel

Esc+8=Image Esc+0=Exit Enter=Do

3. Pressing Enter shows the disk tie breaker configuration window, as shown in Example 10-32. The merge handling policy is TieBreaker too, and you cannot change it. Also, keep the default action plan as Reboot.

Example 10-32 Disk tiebreaker configuration

Disk TieBreaker Configuration

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Split Handling Policy TieBreaker

Merge Handling Policy TieBreaker

* Select Tie Breaker [] +

Split and Merge Action Plan Reboot

4. In the Select Tie Breaker field, press F4 to list the disks that can be used for the disk tie breaker, as shown in Example 10-33. We select hdisk3.

Example 10-33 Select tie breaker disk

Select Tie Breaker

Move cursor to desired item and press Enter.

None

hdisk3 (00fa2342a10932bf) on all cluster nodes

F1=Help F2=Refresh F3=Cancel

Esc+8=Image Esc+0=Exit Enter=Do

/=Find n=Find Next

5. Press Enter to display the summary, as shown in Example 10-34.

Example 10-34 Select the disk tie breaker status

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

hdisk3 changed

The PowerHA SystemMirror split and merge policies have been updated.

Current policies are:

Split Handling Policy : Tie Breaker

Merge Handling Policy : Tie Breaker

Tie Breaker : hdisk3

Split and Merge Action Plan : Reboot

The configuration must be synchronized to make this change known across the cluster.

6. Synchronize the cluster. After the synchronization operation is complete, the cluster can be activated.

7. Run the clmgr command to query the current split and merge policy, as shown in Example 10-35.

Example 10-35 Display the newly set split and merge policies

# clmgr view cluster SPLIT-MERGE

SPLIT_POLICY="tiebreaker"

MERGE_POLICY="tiebreaker"

ACTION_PLAN="reboot"

TIEBREAKER="hdisk3"

<...>

After the PowerHA service start completes, you see that the reserve_policy of this disk is changed to PR_exclusive and one reserve key value is generated for this disk on each node. This disk is not reserved by any of the nodes. Example 10-36 shows the result from the two nodes.

Example 10-36 Reserve_policy on each node

(127) root @ PHA170: /

# lsattr -El hdisk3|egrep "PR_key_value|reserve_policy"

PR_key_value 2763601723737305030 Persistant Reserve Key Value True+

reserve_policy PR_exclusive Reserve Policy True+

# devrsrv -c query -l hdisk3

Device Reservation State Information

==================================================

Device Name : hdisk3

Device Open On Current Host? : NO

ODM Reservation Policy : PR EXCLUSIVE

ODM PR Key Value : 2763601723737305030

Device Reservation State : NO RESERVE

Registered PR Keys : No Keys Registered

PR Capabilities Byte[2] : 0x11 CRH PTPL_C

PR Capabilities Byte[3] : 0x80

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

(0) root @ PHA171: /

# lsattr -El hdisk3|egrep "PR_key_value|reserve_policy"

PR_key_value 6664187022250383046 Persistant Reserve Key Value True+

reserve_policy PR_exclusive Reserve Policy True+

# devrsrv -c query -l hdisk3

Device Reservation State Information

==================================================

Device Name : hdisk3

Device Open On Current Host? : NO

ODM Reservation Policy : PR EXCLUSIVE

ODM PR Key Value : 6664187022250383046

Device Reservation State : NO RESERVE

Registered PR Keys : No Keys Registered

PR Capabilities Byte[2] : 0x11 CRH PTPL_C

PR Capabilities Byte[3] : 0x80

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_ARfor

10.8.3 Cluster split

Before simulating a cluster split, check the current cluster status. For more information, see 10.6.3, “Initial PowerHA service status for each scenario” on page 347.

When the tie breaker split and merge policy is enabled, the rule is the TBGL node has higher priority to the reserve tiebreaker device than other nodes. If this node reserves the tie-breaker device successfully, then other nodes are restarted.

For this scenario, Example 10-37 shows that the PHA171 node is the current TBGL. So, it is expected that the PHA171 node reserves the tie breaker device, and the PHA170 node is restarted. Any RG on the PHA170 node is taken over to the PHA171 node.

Example 10-37 Display the tiebreaker group leader

# lssrc -ls IBM.ConfigRM|grep Group

Group IBM.ConfigRM:

GroupLeader: PHA171, 0xdc7bf2c9d20096c6, 2

TieBreaker GroupLeader: PHA171, 0xdc7bf2c9d20096c6, 2

To change the TBGL manually, see 10.8.4, “How to change the tie breaker group leader manually” on page 366.

In this case, we broke all communication between the two nodes at 01:36:12.

Result and log on the PHA170 node

The following events occur:

•01:36:12: All communication between the two nodes is broken.

•01:36:22: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.

•01:36:52: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.

•01:36:52: PowerHA triggers the split_merge_prompt split event.

•01:36:57: PowerHA triggers the split_merge_prompt quorum event.

•01:37:00: The PHA170 node restarts.

Example 10-38 shows output of the errpt command on the PHA170 node. The PHA170 node restarts at 01:37:00.

Example 10-38 PHA170 restart post split

C7E7362C 1128013616 T S cluster0 Node is heartbeating solely over disk or

4D91E3EA 1128013616 P S cluster0 A split has been detected.

2B138850 1128013616 I O ConfigRM ConfigRM received Subcluster Split event

DC73C03A 1128013616 T S fscsi1 SOFTWARE PROGRAM ERROR

<...>

C62E1EB7 1128013616 P H hdisk1 DISK OPERATION ERROR

<...>

80732E3 1128013716 P S ConfigRM The operating system is being rebooted t

# errpt -aj B80732E3|more

---------------------------------------------------------------------------

LABEL: CONFIGRM_REBOOTOS_E

IDENTIFIER: B80732E3

Date/Time: Mon Nov 28 01:37:00 CST 2016

Sequence Number: 1620

Machine Id: 00FA4B4E4C00

Node Id: PHA170

Class: S

Type: PERM

WPAR: Global

Resource Name: ConfigRM

Description

The operating system is being rebooted to ensure that critical resources are

stopped so that another sub-domain that has operational quorum may recover

these resources without causing corruption or conflict.

Probable Causes

Critical resources are active and the active sub-domain does not have

operational quorum.

Failure Causes

Critical resources are active and the active sub-domain does not have

operational quorum.

Recommended Actions

After node finishes rebooting, resolve problems that caused the operational

quorum to be lost.

Detail Data

DETECTING MODULE

RSCT,PeerDomain.C,1.99.22.299,23992

ERROR ID

Result and log on the PHA171 node

The following events occur:

•01:36:12: All communication between the two nodes is broken.

•01:36:22: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.

•01:36:52: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.

•01:36:52: PowerHA triggers a split_merge_prompt split event.

•01:37:04: PowerHA triggers a split_merge_prompt quorum event, and then PHA171 takes over the RG.

•01:37:15: PowerHA completes the RG takeover operation.

As shown in Example 10-38 on page 364 with the time stamp, PHA170 restarts at 01:37:00. PHA171 starts the takeover of the RG at 01:37:04. There is no opportunity for both nodes to mount the /sharefs file system at the same time so that the data integrity is maintained.

The PHA171 node holds the tiebreaker disk during as cluster split

Example 10-39 shows that the tiebreaker disk is reserved by the PHA171 node after the cluster split event happens.

Example 10-39 Tiebreaker disk reservation from PHA171

# hostname

PHA171

# lsattr -El hdisk3|egrep "PR_key_value|reserve_policy"

PR_key_value 6664187022250383046 Persistant Reserve Key Value True+

reserve_policy PR_exclusive Reserve Policy True+

# devrsrv -c query -l hdisk3

Device Reservation State Information

==================================================

Device Name : hdisk3

Device Open On Current Host? : NO

ODM Reservation Policy : PR EXCLUSIVE

ODM PR Key Value : 6664187022250383046

Device Reservation State : PR EXCLUSIVE

PR Generation Value : 152

PR Type : PR_WE_RO (WRITE EXCLUSIVE, REGISTRANTS ONLY)

PR Holder Key Value : 6664187022250383046

Registered PR Keys : 6664187022250383046 6664187022250383046

PR Capabilities Byte[2] : 0x11 CRH PTPL_C

PR Capabilities Byte[3] : 0x81 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

10.8.4 How to change the tie breaker group leader manually

To change the TBGL manually, simply restart the current TBGL. For example, if the PHA170 node is the current TBGL, to change PHA171 as the tie breaker leader, restart the PHA170 node. During this restart, the TBGL is switched to the PHA171 node. After the PHA170 comes back, the group leader does not change until PHA171 is shut down or restarts.

10.8.5 Cluster merge

After the PHA170 node restart completes, restore all communications between the two nodes. If you want to enable the tiebreaker disk on the PHA170 node, just after the FC link is restored, run the cfgmgr command. Then, the paths of the tiebreaker disk are in active status, as shown in Example 10-40.

Example 10-40 Path status post split

# hostname

PHA170

# lspath -l hdisk1

Missing hdisk1 fscsi1

-> After run ‘cfgmgr’ command

# lspath -l hdisk1

Enabled hdisk1 fscsi1

Within 1 minute of the repository disk being enabled, the CAA services start automatically. You can monitor the process by viewing the /var/adm/ras/syslog.caa log file.

Using the lscluster -m command, check whether the CAA service started. When ready, start the PowerHA service with the smitty clstart or clmgr start node PHA170 command.

You can also bring the CAA services and PowerHA services online together manually by running the following command:

clmgr start node PHA170 START_CAA=yes

During the start of the PowerHA services, the tie breaker device reservation is released on the PHA171 node automatically. Example 10-41 shows the device reservation state after the PowerHA service starts.

Example 10-41 Disk reservation post merge

# hostname

PHA171

# devrsrv -c query -l hdisk3

Device Reservation State Information

==================================================

Device Name : hdisk3

Device Open On Current Host? : NO

ODM Reservation Policy : PR EXCLUSIVE

ODM PR Key Value : 6664187022250383046

Device Reservation State : NO RESERVE

Registered PR Keys : No Keys Registered

PR Capabilities Byte[2] : 0x11 CRH PTPL_C

PR Capabilities Byte[3] : 0x81 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

10.8.6 Scenario summary

If you set a disk tie breaker as split and merge policy for the PowerHA cluster, when the cluster split occurs, the TBGL has a higher priority to reserve the tie breaker device. Other nodes restart. The RGs are online on the surviving node.

During the cluster merge process, the tiebreaker reservation is automatically released.

10.9 Scenario: Split and merge policy with the NFS tie breaker

This section describes the split and merge scenario with the NFS tie-breaker policy.

10.9.1 Scenario description

Figure 10-37 shows the topology of this scenario.

Figure 10-37 Split and merge topology scenario with the NFS tie breaker

In this scenario, there is one NFS server. Each PowerHA node has one network interface, en1, which is used to communicate with the NFS server. The NFS tie breaker requires NFS protocol version 4.

10.9.2 Setting up the NFS environment

On the NFS server, complete the following steps:

1. Edit /etc/hosts and add the PowerHA nodes definition, as shown in Example 10-42.

Example 10-42 Add nodes to NFS server /etc/hosts

cat /etc/hosts

<...>

172.16.15.242 PHA170_hmc

172.16.15.243 PHA171_hmc

2. Create the directory for export by running the following command:

mkdir -p /nfs_tiebreaker

3. Configure the NFS domain by running the following command:

chnfsdom nfs_local_domain

4. Start the nfsrgyd service by running the following command:

startsrc -s nfsrgyd

5. Change the NFS version 4 root location to / by running the following command:

chnfs -r /

6. Add the /nfs_tiebreaker directory to the export list by running the following command:

/usr/sbin/mknfsexp -d '/nfs_tiebreaker' '-B' -v '4' -S 'sys,krb5p,krb5i,krb5,dh' -t 'rw' -r 'PHA170_hmc,PHA171_hmc'

Alternatively, you can run smitty nfs, as shown in Example 10-43.

Example 10-43 NFS add directory to export

Add a Directory to Exports List

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]

* Pathname of directory to export [/nfs_tiebreaker]

Anonymous UID [-2]

Public filesystem? no +

* Export directory now, system restart or both both +

Pathname of alternate exports file []

Allow access by NFS versions [4] +

External name of directory (NFS V4 access only) []

Referral locations (NFS V4 access only) []

Replica locations []

Ensure primary hostname in replica list yes +

Allow delegation? []

Scatter none +

* Security method 1 [sys,krb5p,krb5i,krb5,dh] +

* Mode to export directory read-write +

Hostname list. If exported read-mostly []

Hosts & netgroups allowed client access []

Hosts allowed root access [PHA170_hmc1,PHA171_hmc]

You can verify that the directory is exported by viewing the /etc/exports file, as shown in Example 10-44.

Example 10-44 The /etc/exports file

# cat /etc/exports

/nfs_tiebreaker -vers=4,sec=sys:krb5p:krb5i:krb5:dh,rw,root=PHA170_hmc:PHA171_hmc

On the NFS clients and PowerHA nodes, complete the following tasks:

•Edit /etc/hosts and add the NFS server definition, as shown in Example 10-45.

Example 10-45 NFS clients /etc/hosts

# hostname

PHA170

# cat /etc/hosts

...

172.16.51.170 PHA170

172.16.51.171 PHA171

172.16.51.172 PHASvc

172.16.15.242 PHA170_hmc

172.16.15.222 nfsserver

•Now, verify that the new NFS mount point can be mounted on all the nodes, as shown in Example 10-46.

Example 10-46 Mount the NFS directory

(0) root @ PHA170: /

# mount -o vers=4 nfsserver:/nfs_tiebreaker /mnt

# df|grep mnt

nfsserver:/nfs_tiebreaker 786432 429256 46% 11704 20% /mnt

# echo "test.." > /mnt/1.out

# cat /mnt/1.out

test..

# rm /mnt/1.out

# umount /mnt

10.9.3 Setting the NFS split and merge policies

When the NFS configuration finishes, configure PowerHA by completing the following steps:

1. The fast path to set the split and merge policy is smitty cm_cluster_sm_policy_chk. The full path is to run smitty sysmirror and select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy.

2. Select TieBreaker, as shown in Example 10-30 on page 361. After pressing Enter, select the NFS option, as shown in Example 10-47.

Example 10-47 NFS TieBreaker

Select TieBreaker Type

Move cursor to desired item and press Enter.

Disk

NFS

F1=Help F2=Refresh F3=Cancel

Esc+8=Image Esc+0=Exit Enter=Do

3. After pressing Enter, the NFS tiebreaker configuration panel opens, as shown in Example 10-48. The merge handling policy is TieBreaker too, and it cannot be changed. Also, keep the default action plan as Reboot.

Example 10-48 NFS TieBreaker configuration menu

NFS TieBreaker Configuration

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Split Handling Policy NFS

Merge Handling Policy NFS

* NFS Export Server [nfsserver]

* Local Mount Directory [/nfs_tiebreaker]

* NFS Export Directory [/nfs_tiebreaker]

Split and Merge Action Plan Reboot

After pressing enter, Example 10-49 shows the NFS TieBreaker configuration summary.

Example 10-49 NFS TieBreaker configuration summary

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

The PowerHA SystemMirror split and merge policies have been updated.

Current policies are:

Split Handling Policy : NFS

Merge Handling Policy : NFS

NFS Export Server :

nfsserver

Local Mount Directory :

/nfs_tiebreaker

NFS Export Directory :

/nfs_tiebreaker

Split and Merge Action Plan : Reboot

The configuration must be synchronized to make this change known across the cluster.

The configuration is added to the HACMPsplitmerge ODM database, as shown in Example 10-50.

Example 10-50 HACMPsplitmerge ODM

# odmget HACMPsplitmerge

HACMPsplitmerge:

id = 0

policy = "split"

value = "NFS"

HACMPsplitmerge:

id = 0

policy = "merge"

value = "NFS"

HACMPsplitmerge:

id = 0

policy = "action"

value = "Reboot"

HACMPsplitmerge:

id = 0

policy = "nfs_quorumserver"

value = "nfsserver"

HACMPsplitmerge:

id = 0

policy = "local_quorumdirectory"

value = "/nfs_tiebreaker"

HACMPsplitmerge:

id = 0

policy = "remote_quorumdirectory"

value = "/nfs_tiebreaker"

4. Synchronize the cluster. After the synchronization operation completes, the cluster can be activated.

Upon the cluster start, the PowerHA nodes mount the NFS automatically on both nodes, as shown in Example 10-51.

Example 10-51 NFS mount on both nodes

# clcmd mount|egrep -i "node|nfs"

NODE PHA171

node mounted mounted over vfs date options

nfsserver /nfs_tiebreaker /nfs_tiebreaker nfs4 Dec 01 08:50 vers=4,fg,soft,retry=1,timeo=10

NODE PHA170

node mounted mounted over vfs date options

nfsserver /nfs_tiebreaker /nfs_tiebreaker nfs4 Dec 01 08:50 vers=4,fg,soft,retry=1,timeo=10

10.9.4 Cluster split

If you enable the tie breaker split and merge policy, in a cluster split scenario, the rule is that the TBGL node has a higher priority to reserve a tie-breaker device than the other nodes. The node add its node name to the PowerHA_NFS_Reserve file, gets the reservation, and locks it. In this scenario, the file is in the /nfs_tiebreaker directory.

In our case, the PHA171 node is the current TBGL, as shown in Example 10-52 on page 373. So, it is expected that the PHA171 node survives and the PHA170 node restarts. The RG on the PHA170 node is taken to the PHA171 node.

Example 10-52 NFS Tiebreaker groupleader

# lssrc -ls IBM.ConfigRM|grep Group

Group IBM.ConfigRM:

GroupLeader: PHA171, 0xdc7bf2c9d20096c6, 2

TieBreaker GroupLeader: PHA171, 0xdc7bf2c9d20096c6, 2

To change the TBGL manually, see 10.8.4, “How to change the tie breaker group leader manually” on page 366.

In this case, we broke all communication between both nodes at 07:23:49.

Result and log on the PHA170 node

The following events occur:

•07:23:49: All communication between the two nodes is broken.

•07:23:59: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.

•07:24:29: The PHA170 node CAA mark NODE_DOWN for the PHA171 node.

•07:24:29: PowerHA triggers the split_merge_prompt split event.

•07:24:35: PowerHA triggers the split_merge_prompt quorum event.

•07:24:38: The PHA170 node is restarted by RSCT.

Example 10-53 shows the output of the errpt command on the PHA170 node. This node restarts at 07:24:38.

Example 10-53 Errpt on PHA170

C7E7362C 1128072416 T S cluster0 Node is heartbeating solely over disk or

4D91E3EA 1128072416 P S cluster0 A split has been detected.

2B138850 1128072416 I O ConfigRM ConfigRM received Subcluster Split event

<...>

A098BF90 1128072416 P S ConfigRM The operational quorum state of the acti

AB59ABFF 1128072416 U U LIBLVM Remote node Concurrent Volume Group fail

421B554F 1128072416 P S ConfigRM The operational quorum state of the acti

AB59ABFF 1128072416 U U LIBLVM Remote node Concurrent Volume Group fail

B80732E3 1128072416 P S ConfigRM The operating system is being rebooted t

# errpt -aj B80732E3

LABEL: CONFIGRM_REBOOTOS_E

IDENTIFIER: B80732E3

Date/Time: Mon Nov 28 07:24:38 CST 2016

Sequence Number: 1839

Machine Id: 00FA4B4E4C00

Node Id: PHA170

Class: S

Type: PERM

WPAR: Global

Resource Name: ConfigRM

Description

The operating system is being rebooted to ensure that critical resources are

stopped so that another sub-domain that has operational quorum may recover

these resources without causing corruption or conflict.

Probable Causes

Critical resources are active and the active sub-domain does not have

operational quorum.

Failure Causes

Critical resources are active and the active sub-domain does not have

operational quorum.

Recommended Actions

After node finishes rebooting, resolve problems that caused the operational

quorum to be lost.

Detail Data

DETECTING MODULE

RSCT,PeerDomain.C,1.99.22.299,23992

ERROR ID

REFERENCE CODE

Result and log on the PHA171 node

The following events occur:

•07:23:49: All communication between the two nodes is broken.

•07:24:02: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.

•07:24:32: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.

•07:24:32: PowerHA triggers a split_merge_prompt split event.

•07:24:42: PowerHA triggers a split_merge_prompt quorum event.

•07:24:43: PowerHA starts to online RG on the PHA171 node.

•07:25:03: Complete the RG online operation.

From the time stamp information that is shown in Example 10-53 on page 373, PHA170 restarts at 07:24:38, and PHA171 starts to take over RGs at 07:24:43. There is no opportunity for both nodes to mount the /sharefs file system at the same time, so the data integrity is maintained.

Example 10-54 shows that the PHA171 node wrote its node name into the PowerHA_NFS_Reserve file successfully.

Example 10-54 NFS file that is written with the node name

# hostname

PHA171

# pwd

/nfs_tiebreaker

# ls -l

total 8

-rw-r--r-- 1 nobody nobody 257 Nov 28 07:24 PowerHA_NFS_Reserve

drwxr-xr-x 2 nobody nobody 256 Nov 28 04:06 PowerHA_NFS_ReserveviewFilesDir

# cat PowerHA_NFS_Reserve

PHA171

10.9.5 Cluster merge

The steps are similar to 10.8.5, “Cluster merge” on page 366.

After CAA services start successfully, the PowerHA_NFS_Reserve file is cleaned up for the next cluster split event. Example 10-55 shows that the size of PowerHA_NFS_Reserve file is changed to zero after the CAA service is restored.

Example 10-55 NFS file zeroed out after the CAA is restored

# ls -l

total 0

-rw-r--r-- 1 nobody nobody 0 Nov 28 09:05 PowerHA_NFS_Reserve

drwxr-xr-x 2 nobody nobody 256 Nov 28 09:05 PowerHA_NFS_ReserveviewFilesDir

10.9.6 Scenario summary

When the NFS tiebreaker is set as a split and merge policy when a cluster split occurs, the TBGL has a higher priority to reserve NFS. Other nodes restart, and the RGs are online on the surviving node.

During the cluster merge process, the NFS tiebreaker reservations are released automatically.

10.10 Scenario: Split and merge policy is manual

This section presents a split and merge manual policy scenario.

10.10.1 Scenario description

Figure 10-38 shows the topology of this scenario.

Figure 10-38 Manual split merge cluster topology

10.10.2 Split and merge configuration in PowerHA

The fast path to set the split and merge policy is smitty cm_cluster_sm_policy_chk. The full path is running smitty sysmirror and then selecting Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy.

We select Manual for the split handling policy, as shown in Example 10-56.

Example 10-56 Manual split policy

Split Handling Policy

Move cursor to desired item and press Enter.

None

TieBreaker

Manual

After pressing Enter, the configuration panel opens, as shown in Example 10-57.

Example 10-57 Manual split and merge configuration menu

Split and Merge Management Policy

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Split Handling Policy Manual

Merge Handling Policy Manual

Notify Method []

Notify Interval (seconds) []

Maximum Notifications []

Split and Merge Action Plan Reboot

When selecting Manual as the split handling policy, the merge handling policy also is Manual. This setting is required and cannot be changed.

There are other options that can be changed. Table 10-3 shows the context-sensitive help for these items. This scenario keeps the default values.

Table 10-3 Information table to help explain the split handling policy

Name	Context-sensitive help (F1)	Associated list (F4)
Notify Method	A method to be invoked in addition to a message to /dev/console to inform the operator of the need to chose which site continues after a split or merge. The method is specified as a path name, followed by optional parameters. When invoked, the last parameter is either split or merge to indicate the event.	None.
Notify Interval (seconds)	The frequency of the notification (time, in seconds, between messages) to inform the operator of the need to chose which site continues after a split or merge.	10..3600 Default is 30s, and then increases in frequency.
Maximum Notifications	The maximum number of times that PowerHA SystemMirror prompts the operator to chose which site continues after a split or merge.	3..1000 Default is infinite.
Split and Merge Action Plan	1. Reboot: Nodes on the loosing partition restart. 2. Disable Applications Auto-Start and Reboot: Nodes on the loosing partition restart. The RGs cannot be brought online until the merge finishes. 3. Disable Cluster Services Auto-Start and Reboot: Nodes on the loosing partition restart. CAA does not start. After the split condition is healed, you must run clenablepostsplit to bring the cluster back to a stable state.	1. Reboot. 2. Disable Applications Auto-Start and Reboot. 3. Disable Cluster Services Auto-Start and Reboot.

Example 10-58 shows the summary after confirming the manual policy configuration.

Example 10-58 Manual split merge configuration summary

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

The PowerHA SystemMirror split and merge policies have been updated.

Current policies are:

Split Handling Policy : Manual

Merge Handling Policy : Manual

Notify Method :

Notify Interval (seconds) :

Maximum Notifications :

Split and Merge Action Plan : Reboot

The configuration must be synchronized to make this change known across the cluster.

The PowerHA clmgr command provides an option to display the cluster split and merge policy, as shown in Example 10-59.

Example 10-59 The clmgr output of split merge policies enabled

# clmgr view cluster SPLIT-MERGE

SPLIT_POLICY="manual"

MERGE_POLICY="manual"

ACTION_PLAN="reboot"

TIEBREAKER=""

NOTIFY_METHOD=""

NOTIFY_INTERVAL=""

MAXIMUM_NOTIFICATIONS=""

DEFAULT_SURVIVING_SITE=""

APPLY_TO_PPRC_TAKEOVER="n"

Synchronize the cluster. After the synchronization operation completes, the cluster can be activated.

10.10.3 Cluster split

Before simulating a cluster split, check its status, as described in 10.6.3, “Initial PowerHA service status for each scenario” on page 347.

In this case, we broke all communication between both nodes at 21:43:33.

Result and log on the PHA170 node

The following events occur:

•21:43:33: All communication between the two nodes is broken.

•21:43:43: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.

•21:44:13: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.

•21:44:13: The PowerHA triggers a split_merge_prompt split event.

Then, every console on the PHA170 node receives the message that is shown in Example 10-60.

Example 10-60 Manual split console confirmation message on the PHA170

Broadcast message from root@PHA170 (tty) at 21:44:14 ...

A cluster split has been detected.

You must decide if this side of the partitioned cluster is to continue.

To have it continue, enter

/usr/es/sbin/cluster/utilities/cl_sm_continue

To have the recovery action - Reboot - taken on all nodes on this partition, enter

/usr/es/sbin/cluster/utilities/cl_sm_recover

LOCAL_PARTITION 1 PHA170 OTHER_PARTITION 2 PHA171

Also, in the hacmp.out log of the PHA170 node, there is a notification that is logged about a prompt for a split notification, as shown in Example 10-61.

Example 10-61 The hacmp.out log shows a split notification

Fri Dec 2 21:44:13 CST 2016 cl_sm_prompt (19136930): EVENT START: split_merge_prompt split LOCAL_PARTITION 1 PHA170 OTHER_PARTITION 2 PHA171 1

Fri Dec 2 21:44:14 CST 2016 cl_sm_prompt (19136930): split = Manual merge = Manual which = split split = Manual merge = Manual which = split

Fri Dec 2 21:44:14 CST 2016 cl_sm_prompt (19136930): Received a split notification for which a manual response is required.

Fri Dec 2 21:44:14 CST 2016 cl_sm_prompt (19136930): In manual for a split notification with Reboot

Result and log on the PHA171 node

The following events occur:

•21:43:33: All communication between the two nodes is broken.

•21:43:43: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.

•21:44:13: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.

•21:44:13: PowerHA triggers the split_merge_prompt split event.

Every console of the PHA170 node also receives a message, as shown in Example 10-62.

Example 10-62 Manual split console confirmation message on PHA171

Broadcast message from root@PHA171 (tty) at 21:44:13 ...

A cluster split has been detected.

You must decide if this side of the partitioned cluster is to continue.

To have it continue, enter

/usr/es/sbin/cluster/utilities/cl_sm_continue

To have the recovery action - Reboot - taken on all nodes on this partition, enter

/usr/es/sbin/cluster/utilities/cl_sm_recover

LOCAL_PARTITION 2 PHA171 OTHER_PARTITION 1 PHA170

Note: When the cl_sm_continue command is run on one node, this node continues to survive and takes over the RG if needed. Typically, this command is run on only one of the nodes.

When the cl_sm_recover command is run on one node, this node restarts. Typically, you do not want to run this command on both nodes.

This scenario runs the cl_sm_recover command on the PHA170 node, as shown in Example 10-63. We also run the cl_sm_continue command on the PHA171 node.

Example 10-63 Running cl_sm recover on PHA170

# date

Fri Dec 2 21:44:25 CST 2016

/usr/es/sbin/cluster/utilities/cl_sm_recover

Resource Class Action Response for ResolveOpQuorumTie

Example 10-64 is the output of the errpt -c command. The PHA170 node restarts after running the cl_sm_recover command.

Example 10-64 The errpt output from the PHA170 post manual split

errpt -c

4D91E3EA 1202214416 P S cluster0 A split has been detected.

2B138850 1202214416 I O ConfigRM ConfigRM received Subcluster Split event

A098BF90 1202214416 P S ConfigRM The operational quorum state of the acti

<...>

B80732E3 1202214416 P S ConfigRM The operating system is being rebooted t

<...>

9DBCFDEE 1202214616 T O errdemon ERROR LOGGING TURNED ON

69350832 1202214516 T S SYSPROC SYSTEM SHUTDOWN BY USER

<...>

The ConfigRM service log that is shown in Example 10-65 indicates that this node restarts at 21:44:48.

Example 10-65 ConfigRM service log from PHA170

[32] 12/02/16 _CFD 21:44:48.386539 !!!!!!!!!!!!!!!!! PeerDomainRcp::haltOSExecute (method=1). !!!!!!!!!!!!!!!!!!!!!

[28] 12/02/16 _CFD 21:44:48.386540 ConfigRMUtils::log_error() Entered

[32] 12/02/16 _CFD 21:44:48.386911 logerr: In File=../../../../../src/rsct/rm/ConfigRM/PeerDomain.C (Version=1.99.22.299 Line=23992) :

CONFIGRM_REBOOTOS_ER

The operating system is being rebooted to ensure that critical resources are stopped so that another sub-domain that has operational quorum may recover these resources without causing corruption or conflict.

Note: To generate the IBM.ConfigRM service logs, run the following commands:

# cd /var/ct/IW/log/mc/IBM.ConfigRM

# rpttr -o dct trace.* > ConfigRM.out

Then, check the ConfigRM.out file to get the relevant logs.

After the PHA170 node restarts, run the cl_sm_continue command operation on the PHA171 node, as shown in Example 10-66.

Example 10-66 The cl_sm_continue command on the PHA171 node

# date

Fri Dec 2 21:45:08 CST 2016

# /usr/es/sbin/cluster/utilities/cl_sm_continue

Resource Class Action Response for ResolveOpQuorumTie

Then, the PHA171 node continues and proceeds to acquire the RG, as shown in the cluster.log file in Example 10-67.

Example 10-67 Cluster.log file from the PHA171 acquiring the resource group

Dec 2 21:45:26 PHA171 local0:crit clstrmgrES[10027332]: Fri Dec 2 21:45:26 Removing 1 from ml_idx

Dec 2 21:45:26 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: split_merge_prompt quorum YES@SEQ@145@QRMNT@9@DE@11@NSEQ@8@OLD@1@NEW@0

Dec 2 21:45:26 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: split_merge_prompt quorum YES@SEQ@145@QRMNT@9@DE@11@NSEQ@8@OLD@1@NEW@

0 0

Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: node_down PHA170

Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_down PHA170 0

Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_release PHA171 1

Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move PHA171 1 RELEASE

Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move PHA171 1 RELEASE 0

Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_release PHA171 1 0

Dec 2 21:45:28 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence PHA171 1

Dec 2 21:45:28 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence PHA171 1 0

Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence PHA171 1

Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence PHA171 1 0

Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_acquire PHA171 1

Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move PHA171 1 ACQUIRE

Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: acquire_takeover_addr

Dec 2 21:45:31 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: acquire_takeover_addr 0

Dec 2 21:45:33 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move PHA171 1 ACQUIRE 0

Dec 2 21:45:33 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_acquire PHA171 1 0

Dec 2 21:45:33 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_complete PHA171 1

Dec 2 21:45:34 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_complete PHA171 1 0

Dec 2 21:45:36 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: node_down_complete PHA170

Dec 2 21:45:36 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_down_complete PHA170 0

10.10.4 Cluster merge

In this case, the PHA170 restarts. After this restart operation completes, and when the heartbeat channel is restored, then you can merge this PowerHA cluster.

The steps are similar to the one that are described in 10.8.5, “Cluster merge” on page 366.

10.10.5 Scenario summary

If you want to decide when a cluster split occurs, then use the Manual policy for split and merge.

10.11 Scenario: Active node halt policy quarantine

This section presents a scenario for an ANHP quarantine.

10.11.1 Scenario description

Figure 10-39 shows the topology of this scenario.

Figure 10-39 Active node halt policy quarantine

There are two HMCs in this scenario. Each HMC has two network interfaces: One is used to connect to the server’s FSP adapter, and the other one is used to communicate with the PowerHA nodes. In this scenario, one node tries to shut down another node through the HMC by using the ssh protocol.

The two HMCs provide high availability functions. If one HMC fails, PowerHA uses another HMC to continue operations.

10.11.2 HMC password-less access configuration

Add the HMCs host names and their IP addresses into the /etc/hosts file on the PowerHA nodes:

172.16.15.55 HMC55

172.16.15.239 HMC239

Example 10-68 shows how to set up the HMC password-less access from the PHA170 node to one HMC.

Example 10-68 The ssh password-less setup of HMC55

# ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (//.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in //.ssh/id_rsa.

Your public key has been saved in //.ssh/id_rsa.pub.

The key fingerprint is:

64:f0:68:a0:9e:51:11:dc:e6:c5:fc:bf:74:36:72:cb root@PHA170

The key's randomart image is:

+--[ RSA 2048]----+

| .=+.o |

| o..o++ |

| o oo.+. |

| . o ..o . |

| o S . |

| + = |

| . B o |

| . E |

| |

+-----------------+

# KEY=`cat ~/.ssh/id_rsa.pub` && ssh hscroot@HMC55 mkauthkeys -a "$KEY"

Warning: Permanently added 'HMC55' (ECDSA) to the list of known hosts.

hscroot@HMC55's password: -> enter the password here

-> check if it is ok to access this HMC without password

# ssh hscroot@HMC55 lshmc -V

"version= Version: 8

Release: 8.4.0

Service Pack: 2

HMC Build level 20160816.1

","base_version=V8R8.4.0

Example 10-69 shows how to set up HMC password-less access from the PHA170 node to another HMC.

Example 10-69 The ssh password-less setup of HMC239

# KEY=`cat ~/.ssh/id_rsa.pub` && ssh hscroot@HMC239 mkauthkeys -a "$KEY"

Warning: Permanently added 'HMC239' (ECDSA) to the list of known hosts.

hscroot@HMC239's password: -> enter password here

(0) root @ PHA170: /.ssh

# ssh hscroot@HMC239 lshmc -V

"version= Version: 8

Release: 8.4.0

Service Pack: 2

HMC Build level 20160816.1

","base_version=V8R8.4.0

Note: The operation that is shown in Example 10-69 on page 383 is also repeated for the PHA171 node.

10.11.3 HMC configuration in PowerHA

Complete the following steps:

1. The SMIT fast path is smitty cm_cluster_quarintine_halt. The full path is to run smitty sysmirror and then select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy → Quarantine Policy → Active Node Halt Policy.

We choose the HMC Configuration, as shown in Example 10-70.

Example 10-70 Active node halt policy HMC configuration

Active Node Halt Policy

Move cursor to desired item and press Enter.

HMC Configuration

Configure Active Node Halt Policy

2. Select Add HMC Definition, as shown in Example 10-71 and press Enter. Then, the detailed definition menu opens, as shown in Example 10-72 on page 385.

Example 10-71 Adding an HMC

HMC Configuration

Move cursor to desired item and press Enter.

Add HMC Definition

Change/Show HMC Definition

Remove HMC Definition

Change/Show HMC List for a Node

Change/Show HMC List for a Site

Change/Show Default HMC Tunables

Change/Show Default HMC List

Example 10-72 HMC55 definition

Add HMC Definition

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* HMC name [HMC55]

DLPAR operations timeout (in minutes) []

Number of retries []

Delay between retries (in seconds) []

Nodes [PHA171 PHA170]

Sites []

Check connectivity between HMC and nodes Yes

Table 10-4 shows the help and information list for adding the HMC definition.

Table 10-4 Context-sensitive help and associated list for adding an HMC definition

Name	Context-sensitive help (F1)	Associated list (F4)
HMC name	Enter the host name for the HMC. An IP address is also accepted here. IPv4 and IPv6 addresses are supported.	Yes (single-selection). Obtained by running the following command: /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc –a IP
DLPAR operations timeout (in minutes)	Enter a timeout in minutes for DLPAR commands that are run on an HMC (use the -w parameter). This –w parameter exists only on the chhwres command when allocating or releasing resources. It is adjusted according to the type of resources (for memory, 1 minute per gigabytes is added to this timeout. Setting no value means that you use the default value, which is defined in the Change/Show Default HMC Tunables panel. When -1 is displayed in this field, it indicates that the default value is used.	None. This parameter is not used in an ANHP scenario.
Number of retries	Enter a number of times one HMC command is retried before the HMC is considered as non-responding. The next HMC in the list is used after this number of retries fails. Setting no value means that you use the default value, which is defined in the Change/Show Default HMC Tunables panel. When -1 is displayed in this field, it indicates that the default value is used.	None. The default value is 5.
Delay between retries (in seconds)	Enter a delay in seconds between two successive retries. Setting no value means that you use the default value, which is defined in Change/Show Default HMC Tunables panel. When -1 is displayed in this field, it indicates that the default value is used.	None. The default value is 10s.

3. Add the first HMC55 for the two PowerHA nodes and keep the default value for the other items. Upon pressing Enter, PowerHA checks whether the current node can access HMC55 without a password, as shown in Example 10-73.

Example 10-73 HMC connectivity verification

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

Checking HMC connectivity between "PHA171" node and "HMC55" HMC : success!

Checking HMC connectivity between "PHA170" node and "HMC55" HMC : success!

4. Then, add another HMC, HMC239, as shown in Example 10-74.

Example 10-74 HMC239 definition

Add HMC Definition

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* HMC name [HMC239]

DLPAR operations timeout (in minutes) []

Number of retries []

Delay between retries (in seconds) []

Nodes [PHA171 PHA170]

Sites []

Check connectivity between HMC and nodes Yes

You can use the clmgr commands to show the current setting of the HMC, as shown in Example 10-75.

Example 10-75 The clmgrn command displaying the HMC configurations

(0) root @ PHA170: /

# clmgr query hmc -v

NAME="HMC55"

TIMEOUT="-1" -> ‘-1’ means use default value

RETRY_COUNT="-1" -> ‘-1’ means use default value

RETRY_DELAY="-1" -> ‘-1’ means use default value

NODES="PHA171 PHA170"

STATUS="UP"

VERSION="V8R8.4.0.2"

NAME="HMC239"

TIMEOUT="-1"

RETRY_COUNT="-1"

RETRY_DELAY="-1"

NODES="PHA171 PHA170"

STATUS="UP"

VERSION="V8R8.6.0.0"

(0) root @ PHA170: /

# clmgr query cluster hmc

DEFAULT_HMC_TIMEOUT="10"

DEFAULT_HMC_RETRY_COUNT="5"

DEFAULT_HMC_RETRY_DELAY="10"

DEFAULT_HMCS_LIST="HMC55 HMC239"

10.11.4 Quarantine policy configuration in PowerHA

Complete the following steps:

The panel that is shown in Example 10-76 opens. Select the Configure Active Node Halt Policy.

Example 10-76 Configuring the active node halt policy

Active Node Halt Policy

Move cursor to desired item and press Enter.

HMC Configuration

Configure Active Node Halt Policy

2. The window in Example 10-77 is shown. Enable the Active Node Halt Policy and set the RG testRG as the Critical Resource Group.

Example 10-77 Enabling the active node halt policy and setting and critical resource group

Active Node Halt Policy

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* Active Node Halt Policy Yes +

* Critical Resource Group [testRG] +

In this scenario, there is only one RG, so we set it as the critical RG. For a description about the critical RG, see 10.3.1, “Active node halt quarantine policy” on page 328.

Example 10-78 shows the summary after pressing Enter.

Example 10-78 Cluster status summary

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

The PowerHA SystemMirror split and merge policies have been updated.

Current policies are:

Split Handling Policy : None

Merge Handling Policy : Majority

Split and Merge Action Plan : Reboot

The configuration must be synchronized to make this change known across the cluster.

Active Node Halt Policy : Yes

Critical Resource Group : testRG

Note: If the split and merge policy is tiebreaker or manual, then the ANHP policy does not take effect. Make sure to set the Split Handling Policy to None before setting the ANHP policy.

3. Use the clmgr command to check the current configuration, as shown in Example 10-79.

Example 10-79 Checking the current cluster configuration

# clmgr view cluster|egrep -i "quarantine|critical"

QUARANTINE_POLICY="halt"

CRITICAL_RG="testRG"

# clmgr q cluster SPLIT-MERGE

SPLIT_POLICY="none"

MERGE_POLICY="majority"

ACTION_PLAN="reboot"

4. When the HMC and ANHP configuration is complete, verify and synchronize the cluster.

During the verification and synchronization process, the LPAR name and system information of the PowerHA nodes are added into the HACMPdynresop ODM database. They are used when ANHP is triggered, as shown in Example 10-80.

Example 10-80 Information that is stored in the HACMPdynresop

# odmget HACMPdynresop

HACMPdynresop:

key = "PHA170_LPAR_NAME"

value = "T_PHA170" -> LPAR name can be different with hostname, hostname is PHA170

HACMPdynresop:

key = "PHA170_MANAGED_SYSTEM"

value = "8284-22A*844B4EW" -> This value is System Model * Machine Serial Number

HACMPdynresop:

key = "PHA171_LPAR_NAME"

value = "T_PHA171"

HACMPdynresop:

key = "PHA171_MANAGED_SYSTEM"

value = "8408-E8E*842342W"

Note: You can obtain the LPAR name from AIX by running either uname -L or lparstat -i.

The requirements are as follows:

•Hardware firmware level 840 or later

•AIX 7.1 TL4 or 7.2 or later

•HMC V8 R8.4.0 (PTF MH01559) with a mandatory interim fix (PTF MH01560)

Here is an example output:

(0) root @ PHA170: /

# hostname

PHA170

# uname -L

5 T_PHA170

10.11.5 Simulating a cluster split

Before simulating a cluster split, check the cluster’s status, as described in 10.6.3, “Initial PowerHA service status for each scenario” on page 347.

This scenario sets the Split Handling Policy to None and sets the Quarantine Policy to ANHP. The Critical Resource Group is testRG and is online on the PHA170 node at this time. When the cluster split occurs, it is expected that a backup node of this RG (PHA171) takes over the RG. During this process, PowerHA tries to shut down the PHA170 node through the HMC.

In this scenario, we broke all communication between two nodes at 02:44:04.

The main steps of CAA and PowerHA on the PHA171 node

The following events occur:

•02:44:04: All communication between the two nodes is broken.

•02:44:17: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.

•02:44:47: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.

•02:44:47: PowerHA triggers the split_merge_prompt split event.

•02:44:52: PowerHA triggers the split_merge_prompt quorum event, and then PHA171 takes over the RG.

•02:44:55: In the rg_move_acquire event, PowerHA shuts down PHA170 through the HMC.

•02:46:35: The PHA171 node completes the RG takeover.

The main steps of CAA and PowerHA on PHA170 node

The following events occur:

•02:44:04: All communication between the two nodes is broken.

•02:44:17: The PHA170 node marks REP_DOWN for the repository disk.

•02:44:17: The PHA170 node CAA marks ADAPTER_DOWN for the PHA171 node.

•02:44:47: The PHA170 node CAA marks NODE_DOWN for the PHA171 node.

•02:44:47: PowerHA triggers a split_merge_prompt split event.

•02:44:52: PowerHA triggers a split_merge_prompt quorum event.

•02:44:55: The PHA170 node halts.

Example 10-81 shows the PowerHA cluster.log file of the PHA171 node.

Example 10-81 PHA171 node cluster.log file information

Dec 3 02:44:47 PHA171 EVENT START: split_merge_prompt split

Dec 3 02:44:47 PHA171 EVENT COMPLETED: split_merge_prompt split

Dec 3 02:44:52 PHA171 local0:crit clstrmgrES[7471396]: Sat Dec 3 02:44:52 Removing 1 from ml_idx

Dec 3 02:44:52 PHA171 EVENT START: split_merge_prompt quorum

Dec 3 02:44:52 PHA171 EVENT COMPLETED: split_merge_prompt quorum

Dec 3 02:44:52 PHA171 EVENT START: node_down PHA170

Dec 3 02:44:52 PHA171 EVENT COMPLETED: node_down PHA170 0

Dec 3 02:44:52 PHA171 EVENT START: rg_move_release PHA171 1

Dec 3 02:44:53 PHA171 EVENT START: rg_move PHA171 1 RELEASE

Dec 3 02:44:53 PHA171 EVENT COMPLETED: rg_move PHA171 1 RELEASE 0

Dec 3 02:44:53 PHA171 EVENT COMPLETED: rg_move_release PHA171 1 0

Dec 3 02:44:53 PHA171 EVENT START: rg_move_fence PHA171 1

Dec 3 02:44:53 PHA171 EVENT COMPLETED: rg_move_fence PHA171 1 0

Dec 3 02:44:55 PHA171 EVENT START: rg_move_fence PHA171 1

Dec 3 02:44:55 PHA171 EVENT COMPLETED: rg_move_fence PHA171 1 0

Dec 3 02:44:55 PHA171 EVENT START: rg_move_acquire PHA171 1

-> At 02:44:58, PowerHA triggered HMC to shutdown PHA170 node

Dec 3 02:46:28 PHA171 EVENT START: rg_move PHA171 1 ACQUIRE

Dec 3 02:46:28 PHA171 EVENT START: acquire_takeover_addr

Dec 3 02:46:29 PHA171 EVENT COMPLETED: acquire_takeover_addr 0

Dec 3 02:46:31 PHA171 EVENT COMPLETED: rg_move PHA171 1 ACQUIRE 0

Dec 3 02:46:31 PHA171 EVENT COMPLETED: rg_move_acquire PHA171 1 0

Dec 3 02:46:31 PHA171 EVENT START: rg_move_complete PHA171 1

Dec 3 02:46:33 PHA171 EVENT COMPLETED: rg_move_complete PHA171 1 0

Dec 3 02:46:35 PHA171 EVENT START: node_down_complete PHA170

Dec 3 02:46:35 PHA171 EVENT COMPLETED: node_down_complete PHA170 0

Example 10-82 shows the PowerHA hacmp.out file on the PHA171 node. The log indicates that PowerHA triggers a shutdown of the PHA170 node command at 02:44:55. This operation is in the PowerHA rg_move_acquire event.

Example 10-82 The PHA171 node hacmp.out file

Dec 3 2016 02:44:55 GMT -06:00 EVENT START: rg_move_acquire PHA171 1

<...>

:clhmccmd[hmccmdexec:3707] : Start ssh command at Sat Dec 3 02:44:58 CST 2016

:clhmccmd[hmccmdexec:1] ssh <...> hscroot@HMC55 'chsysstate -m SVRP8-S822-08-SN844B4EW -r lpar -o shutdown --immed -n T_PHA170 2>&1

<...>

Note: PowerHA on the PHA171 node shuts down the PHA170 node before acquiring the service IP and varyonvg share VG. Only when this operation completes successfully does PowerHA continue other operations. If this operation fails, PowerHA is in the error state and does not continue. So, the data in the share VG is safe.

10.11.6 Cluster merge occurs

In this case, the PHA170 node halts after the cluster split occurs. When resolving cluster split issues, start PHA170 manually. After checking that the CAA service is up by running the lscluster -m command, you can start the PowerHA service on the PHA170 node.

The steps are similar to what is described in 10.8.5, “Cluster merge” on page 366.

10.11.7 Scenario summary

Except for the cluster split and merge policies, PowerHA provides the ANHP quarantine policy to keep high availability and data safe in the case of a cluster split scenario. The policy also takes effect in case of a sick but not dead node. For more information, see 10.1.1, “Causes of a partitioned cluster” on page 317.

10.12 Scenario: Enabling the disk fencing quarantine policy

This section describes the scenario when disk fencing is enabled as the quarantine policy.

10.12.1 Scenario description

Figure 10-40 shows the topology of this scenario.

Figure 10-40 Topology scenario for the quarantine policy

In this scenario, the quarantine policy is disk fencing. There is one RG (testRG) in this cluster, so this RG is also marked as a Critical in Disk Fencing in the configuration.

There is one VG (sharevg) in this RG, and there is one hdisk in this VG. You must set the parameter reserve_policy to no_reserve for all the disks if you want to enable the disk fencing policy. In our case, hdisk2 is used, so you must run the following command on each PowerHA node:

chdev -l hdisk2 -a reserve_policy=no_reserve

10.12.2 Quarantine policy configuration in PowerHA

This section describes the quarantine policy configuration in a PowerHA cluster.

Ensuring that the active node halt policy is disabled

Note: If the ANHP policy is also enabled, in case of a cluster split, ANHP takes effect first.

Complete the following steps:

1. Use the SMIT fast path smitty cm_cluster_quarintine_halt, or run smitty sysmirror and then select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy → Quarantine Policy → Active Node Halt Policy.

2. Example 10-83 shows the window. Select Configure Active Node Halt Policy.

Example 10-83 Configure the active node halt policy

Active Node Halt Policy

Move cursor to desired item and press Enter.

HMC Configuration

Configure Active Node Halt Policy

3. Example 10-84 shows where you can disable the ANHP.

Example 10-84 Disable the active node halt policy

Active Node Halt Policy

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* Active Node Halt Policy No +

* Critical Resource Group [testRG]

Enabling the disk fencing quarantine policy

Use the SMIT fast path smitty cm_cluster_quarantine_disk_dialog, or you can run smitty sysmirror and select Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy → Split and Merge Management Policy → Quarantine Policy → Disk Fencing.

Example 10-85 on page 393 shows that disk fencing is enabled and the Critical Resource Group is testRG.

Example 10-85 Disk fencing enabled and critical resource group selection

Disk Fencing

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* Disk Fencing Yes +

* Critical Resource Group [testRG]

After pressing Enter, Example 10-86 shows the summary of the split and merge policy setting.

Example 10-86 Split and merge policy setting summary

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

The PowerHA SystemMirror split and merge policies have been updated.

Current policies are:

Split Handling Policy : None

Merge Handling Policy : Majority

Split and Merge Action Plan : Reboot

The configuration must be synchronized to make this change known across the cluster.

Disk Fencing : Yes

Critical Resource Group : testRG

Note: If you want to enable only the disk fencing policy, you also must set the split handling policy to None.

Check the current settings

You can use the clmgr or the odmget command to check the current settings, as shown in Example 10-87 and Example 10-88.

Example 10-87 Checking the current cluster settings

# clmgr view cluster|egrep -i "quarantine|critical"

QUARANTINE_POLICY="fencing"

CRITICAL_RG="testRG"

Example 10-88 Checking the split and merge cluster settings

# odmget HACMPsplitmerge

HACMPsplitmerge:

id = 0

policy = "split"

value = "None"

HACMPsplitmerge:

id = 0

policy = "merge"

value = "Majority"

HACMPsplitmerge:

id = 0

policy = "action"

value = "Reboot"

HACMPsplitmerge:

id = 0

policy = "anhp"

value = "No" -->> Important, make sure ANHP is disable.

HACMPsplitmerge:

id = 0

policy = "critical_rg"

value = "testRG"

HACMPsplitmerge:

id = 0

policy = "scsi"

value = "Yes"

Performing a PowerHA cluster verification and synchronization

Note: Before you perform a cluster verification and synchronization, check whether the reserve_policy for the shared disks are set to no_reserve.

After the verification and synchronization, you can see that the reserve_policy of hdisk2 changed to PR_shared and also generated one PR_key_value on each node.

Example 10-89 shows the PR_key_value and reserve_policy setting in the PHA170 node.

Example 10-89 The PR_key_value and reserve_policy settings on PHA170 node

# hostname

PHA170

# lsattr -El hdisk2|egrep "PR|reserve_policy"

PR_key_value 0x10001472090686 Persistant Reserve Key Value True+

reserve_policy PR_shared Reserve Policy True+

# devrsrv -c query -l hdisk2

Device Reservation State Information

==================================================

Device Name : hdisk2

Device Open On Current Host? : NO

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 4503687439910534

Device Reservation State : NO RESERVE

Registered PR Keys : No Keys Registered

PR Capabilities Byte[2] : 0x11 CRH PTPL_C

PR Capabilities Byte[3] : 0x81 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

-> [HEX]0x10001472090686 = [DEC]4503687439910534

Example 10-90 shows the PR_key_value and the reserve_policy setting on the PHA171 node.

Example 10-90 PR_key_value and reserve_policy settings on the PHA171 node

# hostname

PHA171

# lsattr -El hdisk2|egrep "PR|reserve_policy"

PR_key_value 0x20001472090686 Persistant Reserve Key Value True+

reserve_policy PR_shared Reserve Policy True+

# devrsrv -c query -l hdisk2

Device Reservation State Information

==================================================

Device Name : hdisk2

Device Open On Current Host? : NO

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 9007287067281030

Device Reservation State : NO RESERVE

Registered PR Keys : No Keys Registered

PR Capabilities Byte[2] : 0x11 CRH PTPL_C

PR Capabilities Byte[3] : 0x81 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

-> [HEX]0x20001472090686 = [DEC]9007287067281030

10.12.3 Simulating a cluster split

Before simulating a cluster split, check the cluster’s status, as described in 10.6.3, “Initial PowerHA service status for each scenario” on page 347.

This scenario sets the split handling policy to None and sets the quarantine policy to disk fencing. The Critical Resource Group is testRG and is online on the PHA170 node at this time. When the cluster split occurs, it is expected that the backup node of this RG (PHA171) takes over the RG. During this process, PowerHA on the PHA171 node fences out PHA170 node from accessing the disk and allows itself to access it. PowerHA tries to use this method to keep the data safe.

In this case, we broke all communication between two nodes at 04:14:12.

Main steps of CAA and PowerHA on the PHA171 node

The following events occur:

•04:14:12: All communication between the two nodes is broken.

•04:14:24: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.

•04:14:54: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.

•04:14:54” PowerHA triggers a split_merge_prompt split event.

•04:15:04: PowerHA triggers a split_merge_prompt quorum event, and then PHA171 took over the RG.

•04:15:07: In the rg_move_acquire event, PowerHA preempts the PHA170 node from Volume Group sharevg.

•04:15:14: The PHA171 node completes the RG takeover.

Example 10-91 shows the output of the PowerHA cluster.log file.

Example 10-91 PowerHA cluster.log output

Dec 3 04:14:54 PHA171 EVENT START: split_merge_prompt split

Dec 3 04:15:04 PHA171 EVENT COMPLETED: split_merge_prompt split

Dec 3 04:15:04 PHA171 local0:crit clstrmgrES[19530020]: Sat Dec 3 04:15:04 Removing 1 from ml_idx

Dec 3 04:15:04 PHA171 EVENT START: split_merge_prompt quorum

Dec 3 04:15:04 PHA171 EVENT COMPLETED: split_merge_prompt quorum

Dec 3 04:15:04 PHA171 EVENT START: node_down PHA170

Dec 3 04:15:04 PHA171 EVENT COMPLETED: node_down PHA170 0

Dec 3 04:15:05 PHA171 EVENT START: rg_move_release PHA171 1

Dec 3 04:15:05 PHA171 EVENT START: rg_move PHA171 1 RELEASE

Dec 3 04:15:05 PHA171 EVENT COMPLETED: rg_move PHA171 1 RELEASE 0

Dec 3 04:15:05 PHA171 EVENT COMPLETED: rg_move_release PHA171 1 0

Dec 3 04:15:05 PHA171 EVENT START: rg_move_fence PHA171 1

Dec 3 04:15:05 PHA171 EVENT COMPLETED: rg_move_fence PHA171 1 0

Dec 3 04:15:07 PHA171 EVENT START: rg_move_fence PHA171 1

Dec 3 04:15:07 PHA171 EVENT COMPLETED: rg_move_fence PHA171 1 0

Dec 3 04:15:07 PHA171 EVENT START: rg_move_acquire PHA171 1

-> At 04:15:07, PowerHA preempted PHA170 node from Volume Group sharevg, and continue

Dec 3 04:15:08 PHA171 EVENT START: rg_move PHA171 1 ACQUIRE

Dec 3 04:15:08 PHA171 EVENT START: acquire_takeover_addr

Dec 3 04:15:08 PHA171 EVENT COMPLETED: acquire_takeover_addr 0

Dec 3 04:15:10 PHA171 EVENT COMPLETED: rg_move PHA171 1 ACQUIRE 0

Dec 3 04:15:10 PHA171 EVENT COMPLETED: rg_move_acquire PHA171 1 0

Dec 3 04:15:10 PHA171 EVENT START: rg_move_complete PHA171 1

Dec 3 04:15:11 PHA171 EVENT COMPLETED: rg_move_complete PHA171 1 0

Dec 3 04:15:13 PHA171 EVENT START: node_down_complete PHA170

Dec 3 04:15:14 PHA171 EVENT COMPLETED: node_down_complete PHA170 0

Example 10-92 shows the output of the PowerHA hacmp.out file. It indicates that PowerHA triggers the preempt operation in the cl_scsipr_preempt script.

Example 10-92 PowerHA hacmp.out file output

Dec 3 2016 04:15:07 GMT -06:00 EVENT START: rg_move_acquire PHA171 1

...

:cl_scsipr_preempt[85] PR_Key=0x10001472090686

:cl_scsipr_preempt[106] : Node PHA170 is down, preempt PHA170 from the Volume Groups,

:cl_scsipr_preempt[107] : which are part of any Resource Group.

:cl_scsipr_preempt[109] odmget HACMPgroup

:cl_scsipr_preempt[109] sed -n $'/group =/{ s/.*"$.*$"/\1/; h; } /nodes =/{ /[ "]PHA170[ "]/{ g; p; } }'

:cl_scsipr_preempt[109] ResGrps=testRG

:cl_scsipr_preempt[109] typeset ResGrps

:cl_scsipr_preempt[115] clodmget -n -q group='testRG and name like *VOLUME_GROUP' -f value HACMPresource

:cl_scsipr_preempt[115] VolGrps=sharevg

:cl_scsipr_preempt[115] typeset VolGrps

:cl_scsipr_preempt[118] clpr_ReadRes_vg sharevg

Number of disks in VG sharevg: 1

hdisk2

:cl_scsipr_preempt[120] clpr_verifyKey_vg sharevg 0x20001472090686

Number of disks in VG sharevg: 1

hdisk2

:cl_scsipr_preempt[124] : Node PHA170 is down, preempting that node from Volume Group sharevg.

:cl_scsipr_preempt[126] clpr_preempt_abort_vg sharevg 0x10001472090686

Number of disks in VG sharevg: 1

hdisk2

...

Main steps of CAA and PowerHA on the PHA170 node

The following events occur:

•04:14:12: All communication between the two nodes is broken.

•04:14:21: The PHA171 node CAA marks ADAPTER_DOWN for the PHA170 node.

•04:14:51: The PHA171 node CAA marks NODE_DOWN for the PHA170 node.

•04:14:51: PowerHA triggers the split_merge_prompt split event.

•04:14:56: Removing 2 from ml_idx.

•04:14:56: PowerHA triggers a split_merge_prompt quorum event.

•04:14:58: EVENT START: node_down PHA171.

•04:14:58: EVENT COMPLETED: node_down PHA171.

No other events occur on the PHA170 node.

After some time, at 04:15:16, the /sharefs file system is fenced out and the application on the PHA170 node cannot perform an update operation to it, but the application can still perform read operations from it.

Example 10-93 shows the PowerHA cluster.log file of the PHA171 node.

Example 10-93 PowerHA cluster.log file of the PHA171 node

PHA170:

4D91E3EA 1203041416 P S cluster0 A split has been detected.

2B138850 1203041416 I O ConfigRM ConfigRM received Subcluster Split event

...

A098BF90 1203041416 P S ConfigRM The operational quorum state of the acti

4BDDFBCC 1203041416 I S ConfigRM The operational quorum state of the acti

AB59ABFF 1203041416 U U LIBLVM Remote node Concurrent Volume Group fail

...

65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED

E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM

EA88F829 1203041516 I O SYSJ2 USER DATA I/O ERROR

65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED

E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM

52715FA5 1203041516 U H LVDD FAILED TO WRITE VOLUME GROUP STATUS AREA

F7DDA124 1203041516 U H LVDD PHYSICAL VOLUME DECLARED MISSING

CAD234BE 1203041516 U H LVDD QUORUM LOST, VOLUME GROUP CLOSING

E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM

52715FA5 1203041516 U H LVDD FAILED TO WRITE VOLUME GROUP STATUS AREA

CAD234BE 1203041516 U H LVDD QUORUM LOST, VOLUME GROUP CLOSING

65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED

E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM

78ABDDEB 1203041516 I O SYSJ2 META-DATA I/O ERROR

65DE6DE3 1203041516 P S hdisk2 REQUESTED OPERATION CANNOT BE PERFORMED

E86653C3 1203041516 P H LVDD I/O ERROR DETECTED BY LVM

C1348779 1203041516 I O SYSJ2 LOG I/O ERROR

B6DB68E0 1203041516 I O SYSJ2 FILE SYSTEM RECOVERY REQUIRED

Example 10-94 shows detailed information about event EA88F829.

Example 10-94 Showing event EA88F829

LABEL: J2_USERDATA_EIO

IDENTIFIER: EA88F829

Date/Time: Mon Dec 3 04:15:16 CST 2016

Sequence Number: 12629

Machine Id: 00FA4B4E4C00

Node Id: PHA170

Class: O

Type: INFO

WPAR: Global

Resource Name: SYSJ2

Description

USER DATA I/O ERROR

Probable Causes

ADAPTER HARDWARE OR MICROCODE

DISK DRIVE HARDWARE OR MICROCODE

SOFTWARE DEVICE DRIVER

STORAGE CABLE LOOSE, DEFECTIVE, OR UNTERMINATED

Recommended Actions

CHECK CABLES AND THEIR CONNECTIONS

INSTALL LATEST ADAPTER AND DRIVE MICROCODE

INSTALL LATEST STORAGE DEVICE DRIVERS

IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data

JFS2 MAJOR/MINOR DEVICE NUMBER

0064 0001

FILE SYSTEM DEVICE AND MOUNT POINT

/dev/sharelv, /sharefs

Example 10-95 shows the output of the devrsrv command on the PHA170 node. It indicates that hdisk2 was held by the 9007287067281030 PR key, and this key belongs to the PHA171 node.

Example 10-95 The devrsrv command output of the PHA170 node

# hostname

PHA170

# devrsrv -c query -l hdisk2

Device Reservation State Information

==================================================

Device Name : hdisk2

Device Open On Current Host? : YES

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 4503687439910534

Device Reservation State : PR SHARED

PR Generation Value : 34

PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)

PR Holder Key Value : 0

Registered PR Keys : 9007287067281030 9007287067281030

PR Capabilities Byte[2] : 0x11 CRH PTPL_C

PR Capabilities Byte[3] : 0x81 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

Example 10-96 shows the output of the devrsrv command on the PHA171 node.

Example 10-96 The devrsrv command output of the PHA171 node

# hostname

PHA170

# devrsrv -c query -l hdisk2

Device Reservation State Information

==================================================

Device Name : hdisk2

Device Open On Current Host? : YES

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 9007287067281030

Device Reservation State : PR SHARED

PR Generation Value : 34

PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)

PR Holder Key Value : 0

Registered PR Keys : 9007287067281030 9007287067281030

PR Capabilities Byte[2] : 0x11 CRH PTPL_C

PR Capabilities Byte[3] : 0x81 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

Note: From the above description, you can see that the PHA171 node takes over the RG and the data in /sharefs file system is safe, and the service IP is attached on PHA171 node too. But the service IP is also online in the PHA170 node. So there is a risk that there is an IP conflict. So, you need to do some manual operations to avoid this risk, including rebooting the PHA170 node manually.

10.12.4 Simulating a cluster merge

Restarting or shutting down the PHA170 node is one method to avoid a service IP conflict.

In this scenario, restart the PHA170 node and restore all communication between the two nodes. After checking that the CAA service is up by running the lscluster -m command, start the PowerHA service on the PHA170 node.

The steps are similar to 10.8.5, “Cluster merge” on page 366.

During the start of the PowerHA service, in the node_up event, PowerHA on the PHA170 node resets the reservation for the shared disks.

Example 10-97 shows the output of the PowerHA cluster.log file on the PHA170 node.

Example 10-97 PowerHA cluster.log file on the PHA170 node

Dec 3 04:41:05 PHA170 local0:crit clstrmgrES[10486088]: Sat Dec 3 04:41:05 HACMP: clstrmgrES: VRMF fix level in product ODM = 0

Dec 3 04:41:05 PHA170 local0:crit clstrmgrES[10486088]: Sat Dec 3 04:41:05 CLSTR_JOIN_AUTO_START - This is the normal start request

Dec 3 04:41:18 PHA170 user:notice PowerHA SystemMirror for AIX: EVENT START: node_up PHA170

-> PowerHA reseted reservation for shared disks

Dec 3 04:41:20 PHA170 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_up PHA170 0

Dec 3 04:41:22 PHA170 user:notice PowerHA SystemMirror for AIX: EVENT START: node_up_complete PHA170

Dec 3 04:41:22 PHA170 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_up_complete PHA170 0

Example 10-98 shows the output of the node_up event in PHA170. The log indicates that PowerHA registers its key into the shared disks of the sharevg.

Example 10-98 The node_up event output of the PHA170 node

Dec 3 2016 04:41:18 GMT -06:00 EVENT START: node_up PHA170

...

:node_up[node_up_scsipr_init:122] clpr_reg_res_vg sharevg 0x10001472090686

Number of disks in VG sharevg: 1

hdisk2

:node_up[node_up_scsipr_init:123] (( 0 != 0 ))

:node_up[node_up_scsipr_init:139] : Checking if reservation succeeded

:node_up[node_up_scsipr_init:141] clpr_verifyKey_vg sharevg 0x10001472090686

Number of disks in VG sharevg: 1

hdisk2

:node_up[node_up_scsipr_init:142] RC1=0

:node_up[node_up_scsipr_init:143] (( 0 == 1 ))

:node_up[node_up_scsipr_init:149] (( 0 == 0 ))

:node_up[node_up_scsipr_init:153] : Reservation success

Example 10-99 shows that the PR key value of PHA170 node is registered to hdisk2. Thus, it is ready for the next cluster split event.

Example 10-99 PHA170 PR key value

# hostname

PHA171

# devrsrv -c query -l hdisk2

Device Reservation State Information

==================================================

Device Name : hdisk2

Device Open On Current Host? : YES

ODM Reservation Policy : PR SHARED

ODM PR Key Value : 9007287067281030

Device Reservation State : PR SHARED

PR Generation Value : 38

PR Type : PR_WE_AR (WRITE EXCLUSIVE, ALL REGISTRANTS)

PR Holder Key Value : 0

Registered PR Keys : 4503687439910534 9007287067281030

9007287067281030 4503687439910534

PR Capabilities Byte[2] : 0x11 CRH PTPL_C

PR Capabilities Byte[3] : 0x81 PTPL_A

PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR

Sat Dec 3 04:41:22 CST 2016

10.12.5 Scenario summary

Except for the cluster split and merge policies, PowerHA provides a disk fencing quarantine policy to keep high availability and data safe in case of cluster split scenarios. It also takes effect in the case of sick but not dead. For more information, see 10.1.1, “Causes of a partitioned cluster” on page 317.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10. Cluster partitioning management update

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 10. Cluster partitioning management update