PowerHA: Live kernel update support
This appendix provides details about the PowerHA live kernel update support.
This appendix contains the following topics:
Live kernel update (LKU) support
Starting with AIX Version 7.2, the AIX operating system provides the AIX Live Update function, which eliminates the downtime that is associated with patching the AIX operating system.
PowerHA V7.2 recognizes and supports Live Update of cluster member nodes:
PowerHA is switched to an unmanage mode during the operation.
It allows workload and storage activities continue to be run without interruption.
Live update can be performed on one node in the cluster at a time.
The hardware requirement is as follows:
All devices in node should be virtual.
Each disk should have multi-path.
Four spare disks for LKU (Disks for mirrorvg, new rootvg, temporary paging space, and temporary dump device).
Example of LKU patching a kernel interim fix in a PowerHA environment
The test environment used has the following configuration:
Two nodes cluster environment
AIX 7.2.0.0
 – bos.mp64 7.2.0.0
 – bos.cluster.rte 7.2.0.0
 – bos.liveupdate.rte 7.2.0.0
PowerHA 7.2 SP1
 – cluster.es.server.rte 7.2.1.0
First, check the environment using the following steps:
1. Check that the PowerHA cluster service is UP and in a stable state on both nodes:
# clcmd lssrc -ls clstrmgrES | egrep "Current state"
Current state: ST_STABLE
Current state: ST_STABLE
2. Check that the CAA cluster is up and active:
# lscluster -c | grep Cluster
Cluster Name: CL102_103
Cluster UUID: 8e1409c6-a407-11e5-8002-c6d7ab283702
Number of nodes in cluster = 2
Cluster ID for node kern102.aus.stglabs.ibm.com: 1
Cluster ID for node kern103.aus.stglabs.ibm.com: 2
3. Check that the PowerHA RGs are online and available:
# clcmd clRGinfo -m
------------------------------------------------------------------------
NODE kern103.aus.stglabs.ibm.com
------------------------------------------------------------------------
Group Name Group State Application state Node
------------------------------------------------------------------------
RG1 ONLINE kern102
montest ONLINE MONITORED
---------------------------------------------------------------------------
NODE kern102.aus.stglabs.ibm.com
---------------------------------------------------------------------------
Group Name Group State Application state Node
---------------------------------------------------------------------------
RG1 ONLINE kern102
montest         ONLINE MONITORED
Then, to perform the Live Kernel Update, complete the following steps:
1. The HMC authentication is required to perform a live kernel update.
The hmcauth command is used to authenticate with a Hardware Management Console (HMC). For example, issue the following command:
# hmcauth
Enter HMC URI: dsolab134
Enter HMC user name: hscroot
Enter HMC password:
To list all the known HMC authentication tokens, use the following command:
# hmcauth -l
Address : 9.3.4.134
User name: hscroot
port : 12443
TTL : 23:59:55 left
2. The geninstall command is used to install this kernel interim fix. For more information about the command, see the following website:
The flags used in the geninstall command are explained as follows:
-p Performs a preview of an action by running all preinstallation checks for the specified action.
-d Device or directory specifies the device or directory containing the images to install.
-k Specifies that the AIX Live Update operation is to be performed. This is a new flag and for LKU.
3. Use the -p flag to preview first, the output will show if any action needs to be corrected before installing this interim fix package. For example, issue the following command:
# geninstall -p -k -d /home/ dummy.150813.epkg.Z
 
Validating live update input data.
Computing the estimated time for the live update operation:
-------------------------------------------------------
LPAR: kern102
Blackout_time(s): 37
Global_time(s): 939
 
Checking mirror vg device size:
------------------------------------------
Required device size: 15104 MB
Given device size: 32767 MB
PASSED: device size is sufficient.
 
Checking new root vg device size:
------------------------------------------
Required device size: 15104 MB
Given device size: 32767 MB
PASSED: device size is sufficient.
 
Checking temporary storage size for original LPAR:
------------------------------------------
Required device size: 1024 MB
Given device size: 32767 MB
PASSED: device size is sufficient.
 
Checking temporary storage size for surrogate LPAR:
------------------------------------------
Required device size: 1024 MB
Given device size: 20479 MB
PASSED: device size is sufficient.
 
Validating the adapters and their paths:
------------------------------------------
PASSED: adapters can be divided into two sets so that each has paths to all disks.
 
Checking lpar minimal memory size:
------------------------------------------
Required memory size: 2048 MB
Current memory size: 8192 MB
PASSED: memory size is sufficient.
 
Checking other requirements:
------------------------------------------
PASSED: sufficient space available in /var.
PASSED: sufficient space available in /.
PASSED: sufficient space available in /home.
PASSED: no existing altinst_rootvg.
PASSED: rootvg is not part of a snapshot.
PASSED: pkcs11 is not installed.
PASSED: DoD/DoDv2 profile is not applied.
PASSED: Advanced Accounting is not on.
PASSED: Virtual Trusted Platform Module is not on.
PASSED: multiple semid lists is not on.
PASSED: The trustchk Trusted Execution Policy is not on.
PASSED: The trustchk Trusted Library Policy is not on.
PASSED: The trustchk TSD_FILES_LOCK policy is not on.
PASSED: the boot disk is set to the current rootvg.
PASSED: the mirrorvg name is available.
PASSED: the rootvg is uniformly mirrored.
PASSED: the rootvg does not have the maximum number of mirror copies.
PASSED: the rootvg does not have stale logical volumes.
PASSED: all of the mounted file systems are of a supported type.
PASSED: this AIX instance is not diskless.
PASSED: no Kerberos configured for NFS mounts.
PASSED: multibos environment not present.
PASSED: Trusted Computing Base not defined.
PASSED: no local tape devices found.
PASSED: live update not executed from console.
PASSED: the execution environment is valid.
PASSED: enough available space for /var to dump Component Trace buffers.
PASSED: enough available space for /var to dump Light weight memory Trace buffers.
PASSED: all devices are virtual devices.
PASSED: No active workload partition found.
PASSED: nfs configuration supported.
PASSED: HMC token is present.
PASSED: HMC token is valid.
PASSED: HMC requests successful.
PASSED: A virtual slot is available.
PASSED: RSCT daemons are active.
PASSED: no Kerberos configuration.
PASSED: lpar is not remote restart capable.
PASSED: no virtual log device configured.
PASSED: lpar is using dedicated memory.
PASSED: the disk configuration is supported.
PASSED: no Generic Routing Encapsulation (GRE) tunnel configured.
PASSED: Firmware level is supported.
PASSED: vNIC resources available.
PASSED: Consolidated system trace buffers size is within the limit of 64 MB.
PASSED: SMT number is valid.
INFO: Any system dumps present in the current dump logical volumes will not be available after live update is complete.
4. Update the /var/adm/ras/liveupdate/lvupdate.data file:
# cat /var/adm/ras/liveupdate/lvupdate.data
--- start ---
software:
 
single = /home/dummy.150813.epkg.Z
--- EOF ---
5. Edit this file and add the following fields:
general:
kext_check =
 
disks:
nhdisk = <hdisk#>
mhdisk = <hdisk#>
tohdisk = <hdisk#>
tshdisk = <hdisk#>
 
hmc:
lpar_id =
management_console = dsolab134
user = hscroot
 
Note: For the disks description, the /var/adm/ras/liveupdate/lvupdate.template file has provided the following information:
disks:
nhdisk =
mhdisk =
tohdisk =
tshdisk =
# disk:
# nhdisk = <disk1,disk2,...> The disk names to be used to make a copy
# of the original rootvg which will be used to boot disk the           Surrogate
# (surr-boot-rootvg). The capacity needs to match the capacity of
# the “required” file systems (/, /var, /opt, /usr, /etc) from the
# orig-rootvg. (If preview mode, size checking will be performed)
# mhdisk = <disk1,disk2,...> The disk names to be used for the mirrored
# rootvg (surr-mir-rootvg) on the Surrogate. The capacity needs to
# match the capacity of orig-rootvg. (If preview mode, size checking
# will be performed.)
# tohdisk = <disk1,disk2,...> The name of disks to be used as temporary
# storage for the Original. This is only required if paging space is
# present on a non-rootvg disk, or if a dump device is present           (either
# on rootvg or non-rootvg). The capacity needs to match the total           capacity
# of paging devices and dump devices defined for the original           partition.
# (If preview mode, size checking will be performed.)
# tshdisk = <disk1,disk2,...> The name of disks to be used as temporary
# storage for the Surrogate. This is only required if paging space           is
# present on a non-rootvg disk, or if a dump device is present           (either
# on rootvg or non-rootvg). It must have the same capacity as           tohdisk.
# (If preview mode, size checking will be performed.)
For example, you might receive the following information:
general:
kext_check =
 
disks:
nhdisk = hdisk1
mhdisk = hdisk2
tohdisk = hdisk3
tshdisk = hdisk7
 
hmc:
lpar_id =
management_console = dsolab134
user = hscroot
software:
single = /home/dummy.150813.epkg.Z
6. Install the interim fix.
The flags used in the commands are described as follows:
-d Device or Directory Specifies the device or directory containing the images to install.
-k Specifies that the AIX Live Update operation is to be performed. This is a new flag and for LKU.
# geninstall -k -d /home/ dummy.150813.epkg.Z
Validating live update input data.
Computing the estimated time for the liveupdate operation:
-------------------------------------------------------
LPAR: kern102
Blackout_time(s): 82
Global_time(s): 415
 
Checking mirror vg device size:
------------------------------------------
Required device size: 7808 MB
Given device size: 32767 MB
PASSED: device size is sufficient.
 
Checking new root vg device size:
------------------------------------------
Required device size: 7808 MB
Given device size: 32767 MB
PASSED: device size is sufficient.
 
Checking temporary storage size for the original LPAR:
------------------------------------------
Required device size: 1024 MB
Given device size: 32767 MB
PASSED: device size is sufficient.
 
Checking temporary storage size for the surrogate LPAR:
------------------------------------------
Required device size: 1024 MB
Given device size: 20479 MB
PASSED: device size is sufficient.
 
Validating the adapters and their paths:
------------------------------------------
PASSED: adapters can be divided into two sets so that each has paths to all disks.
Checking lpar minimal memory size:
------------------------------------------
Required memory size: 2048 MB
Current memory size: 8192 MB
PASSED: memory size is sufficient.
 
Checking other requirements:
------------------------------------------
PASSED: sufficient space available in /var.
PASSED: sufficient space available in /.
PASSED: sufficient space available in /home.
PASSED: no existing altinst_rootvg.
PASSED: rootvg is not part of a snapshot.
PASSED: pkcs11 is not installed.
PASSED: DoD/DoDv2 profile is not applied.
PASSED: Advanced Accounting is not on.
PASSED: Virtual Trusted Platform Module is not on.
PASSED: The trustchk Trusted Execution Policy is not on.
PASSED: The trustchk Trusted Library Policy is not on.
PASSED: The trustchk TSD_FILES_LOCK policy is not on.
PASSED: the boot disk is set to the current rootvg.
PASSED: the mirrorvg name is available.
PASSED: the rootvg is uniformly mirrored.
PASSED: the rootvg does not have the maximum number of mirror copies.
PASSED: the rootvg does not have stale logical volumes.
PASSED: all of the mounted file systems are of a supported type.
PASSED: this AIX instance is not diskless.
PASSED: no Kerberos configured for NFS mounts.
PASSED: multibos environment not present.
PASSED: Trusted Computing Base not defined.
PASSED: no local tape devices found.
PASSED: live update not executed from console.
PASSED: the execution environment is valid.
PASSED: enough available space for /var to dump Component Trace buffers.
PASSED: enough available space for /var to dump Light weight memory Trace buffers.
PASSED: all devices are virtual devices.
PASSED: No active workload partition found.
PASSED: nfs configuration supported.
PASSED: HMC token is present.
PASSED: HMC token is valid.
PASSED: HMC requests successful.
PASSED: A virtual slot is available.
PASSED: RSCT services are active.
PASSED: no Kerberos configuration.
PASSED: lpar is not remote restart capable.
PASSED: no virtual log device configured.
PASSED: lpar is using dedicated memory.
PASSED: the disk configuration is supported.
PASSED: no Generic Routing Encapsulation (GRE) tunnel configured.
PASSED: Firmware level is supported.
PASSED: vNIC resources available.
PASSED: Consolidated system trace buffers size is within the limit of 64 MB.
PASSED: SMT number is valid.
INFO: Any system dumps present in the current dump logical volumes will not be available after live update is complete.
 
Non-interruptable live update operation begins in 10 seconds.
Broadcast message from root@kern102 (pts/3) at 22:20:18 ...
 
Live AIX update in progress.
 
....................................
Initializing live update on original LPAR.
 
Validating original LPAR environment.
 
Beginning live update operation on original LPAR.
 
Requesting resources required for live update.
............
Notifying applications of impending live update.
....
Creating rootvg for boot of surrogate.
....................................................
Starting the surrogate LPAR.
............................................
 
Broadcast message from root@kern102 (tty) at 22:26:02 ...
 
PowerHA SystemMirror on kern102 shutting down. Please exit any cluster applications...
 
Creating mirror of original LPAR's rootvg.
................................................
Moving workload to surrogate LPAR.
............
Blackout Time started.
................................................................................................................................
Blackout Time end.
 
Workload is running on surrogate LPAR.
....................................................
Shutting down the Original LPAR.
....................The live update operation succeeded.
 
 
Broadcast message from root@kern102 (pts/3) at 22:33:05 ...
 
Live AIX update completed.
 
File /etc/inittab has been modified.
 
One or more of the files listed in /etc/check_config.files have changed.
See /var/adm/ras/config.diff for details.
During Live Kernel Update, the PowerHA switches into an unmanaged mode:
# lssrc -ls clstrmgrES
Current state: ST_STABLE
sccsid = "@(#)36 1.135.1.125 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,71haes_r721,1532A_hacmp721 7/31/15"
build = "Dec 2 2015 04:17:07 1549A_hacmp721"
i_local_nodeid 1, i_local_siteid -1, my_handle 2
ml_idx[1]=0 ml_idx[2]=1
Forced down node list: kern102
AIX Live Update operation in progress on node list: kern102
...
 
# clRGinfo -m
------------------------------------------------------------
Group Name Group State Application state Node
------------------------------------------------------------
RG1 UNMANAGED kern102
montest OFFLINE
 
RG1 UNMANAGED kern103
montest OFFLINE
AIX Live Update is automatically enabled at PowerHA 7.2.0 and AIX 7.2.0 and later versions. The AIX live update is not supported on any AIX 7.1.X with PowerHA 7.2.0 installed. However, if you are upgrading the AIX to V7.2.0 or later, you must enable the AIX Live Update function in PowerHA in order to use the Live update support of AIX:
AIX live Update activation / deactivation
# smitty sysmirror
Cluster Nodes and Networks
Manage Nodes
Change/Show a Node
A new field “Enable AIX Live Update operation” can be set to Yes or No (to enable or disable the AIX Live update operation). This needs to be performed on each node in the cluster, one node at a time.
AIX Live Update and PowerHA tips and logs:
 – lssrc -ls clstrmgrES shows the list of nodes in the cluster that are processing a Live Update operation.
 – Logs generates by cluster script during AIX Live Update operation:
/var/hacmp/log/lvupdate_orig.log
/var/hacmp/log/lvupdate_surr.log
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset