Chapter 8. Clusterware

Oracle Clusterware is a software-based cluster manager that allows a group of physically separate servers to be combined into one logical server. The physical servers are connected together by a dedicated private network and are attached to shared storage. Oracle Clusterware consists of a set of additional processes and daemons that run on each node in the cluster and that utilize the private network and shared storage to coordinate activity between the servers.

This chapter describes Oracle Clusterware, discussing the components and their functionality. It also covers advanced topics such as the HA Framework and server-side callouts. Oracle has renamed the foundation for RAC and the high availability framework to Grid Infrastructure. Throughout the chapter, the terms Clusterware and Grid Infrastructure are used interchangeably.

Introducing Clusterware

The goal of Clusterware is to manage local and cluster resources. Oracle 11.2 has many different types of resources, including the following:

  • Networks

  • VIP addresses

  • Local listeners

  • SCAN listeners

  • ASM instances

  • Databases

  • Database instances

  • Services

  • User-defined resources

Oracle Clusterware is responsible for determining the nodes on which resources should run. It can start, stop, and monitor resources; and it can optionally relocate resources to other nodes. Clusterware can also restart any processes that fail on their current node.

Oracle Clusterware protects against hardware and software failures by providing failover capabilities. In the event that a node or resource fails, Clusterware can be configured to relocate resources to other nodes in the cluster. Some resources that are tied to a specific node (an ASM instance, for example) cannot be relocated.

In addition to protecting against unplanned outages, Clusterware can be used to reduce or eliminate planned downtime for hardware and software maintenance.

For many applications, Oracle Clusterware can increase overall throughput by enabling the application to run on multiple nodes concurrently.

Oracle Clusterware is also responsible for monitoring which nodes are currently members of the cluster. When a node joins or leaves the cluster this event will be detected by Oracle Clusterware and reported to all other nodes in the cluster. Clusterware allows the number of nodes in a cluster to be increased or decreased dynamically, thereby providing application scalability.

In Oracle 10.1 and later, Oracle Clusterware is mandatory for RAC deployments. On Linux platforms, Oracle Clusterware is typically the only cluster manager that is deployed. On legacy UNIX platforms, however, Oracle Clusterware is often combined with third-party clustering solutions, such as IBM HA-CMP, HP ServiceGuard, Sun Cluster, and Veritas Cluster Manager. Oracle Clusterware and Veritas both operate in user-mode; proprietary vendor clusterware such as HA-CMP, ServiceGuard, and Sun Cluster operate in kernel-mode and can, therefore, potentially provide slightly higher levels of availability at the expense of increased support complexity.

You might recall from Chapter 3 that Oracle Grid Infrastructure uses two components on shared storage: the Oracle Cluster Registry (OCR) and the voting disk. In Oracle 10.2 and later, Clusterware can be configured to maintain mirrored copies of these files, protecting against media failure. The Oracle Cluster Registry (OCR) stores the cluster configuration of the including the current state of all resources. The voting disk maintains node membership information. In Oracle 11.2 and later, a copy of a third component known as the Oracle Local Registry (OLR) is stored on each node in the cluster. The OLR manages configuration information for the local node.

Examining the Hardware and Software Requirements

All servers in the cluster must contain at least two network interfaces: one for the public network and one for the private network. To ensure redundancy, Oracle recommends that both networks be bonded. In other words, there are at least two physical network adapters attached to separate physical networks. These are bonded together at operating system level for each logical network (you can learn more about this subject in in Chapter 6).

Each cluster must also be attached to shared storage. For multinode clusters, Storage Area Network (SAN), Network Attached Storage (NAS) and iSCSI are supported. For single-instance clusters, Direct Attached Storage (DAS) and NFS are additionally supported. Oracle recommends that there be at least two host bus adapters (HBAs) or network cards for each storage device attached to separate physical paths. These adapters should be managed using either the operating system or third-party multipathing software. When using Ethernet to connect to the storage (as in iSCSI and Fibre Channel over Ethernet (FCoE)), then it is strongly advised that you separate the interconnect traffic from the storage traffic to avoid collisions on the network layer. Additional storage connections may be required for applications with high I/O requirements.

Most contemporary servers are supplied with at least one physical disk or, alternatively, two mirrored disks. In a RAC environment, the physical disk is typically used to store the operating system, swap space, Grid Infrastructure home, and optionally, the Oracle RAC home. Although centralizing binaries on shared storage would appear to be more efficient, most users of clustered environments prefer to maintain separate copies of binaries on each node because this reduces the possibility of corruptions affecting multiple nodes. This approach also facilitates rolling upgrades that can reduce downtime.

Each node in the cluster should run similar operating system releases. In all cases, the operating system architecture and distributor should be identical. In most cases, the operating system version and patch level should also be identical. When adding new nodes to the cluster, occasionally it is necessary to install newer versions of the operating system that containing appropriate device drivers and other functionality. However, in this case, we would recommend upgrading the operating system on the older nodes, if possible. Doing so enables all nodes in the cluster to run similar versions. Adopting this approach will simplify problem resolution and reduce the possibility of human errors in the future.

Using Shared Storage with Oracle Clusterware

As mentioned in the preceding section, Oracle Clusterware uses two components on shared storage: the Oracle Cluster Registry and the voting disk. In Oracle 10gR2 and later, both components can be mirrored by Oracle and all copies of the associated files should be located on shared storage and accessible by all nodes. With Oracle 11.2, two new files have been introduced: the local registry OLR and the Grid Plug and Play profile (GPnP profile).

Storing Cluster Information with the Oracle Cluster Registry

The Oracle Cluster Registry (OCR) is used to store cluster configuration information. The OCR contains information about the resources controlled by Oracle Clusterware, including the following:

  • ASM disk groups, volumes, file systems, and instances

  • RAC databases and instances

  • SCAN listeners and local listeners

  • SCAN VIPs and local VIPs

  • Nodes and node applications

  • User-defined resources

  • Its own backups

Logically, the OCR represents a tree structure; physically, each element of data is stored in a separate 4096 byte physical block.

The data in the OCR is essential to the operation of the cluster. It is possible to back up the contents of the OCR online, but restoring this data will almost always result in an interruption of service. Since Oracle 10gR2, it has been possible to configure two OCR mirrors that are automatically maintained by Oracle Clusterware. This helps ensure against media failure or human error. In Oracle 11gR2 and later, it is possible to configure up to five mirrored copies of the OCR. If there is more than one OCR mirror, then it is possible to replace a failed OCR mirror without an outage.

In 11gR2, the OCR can be stored in an ASM disk group or a cluster file system. The OCR can only be stored on raw devices or block devices if the cluster has been upgraded from an earlier version of Oracle. Also, Oracle has announced that raw devices and block devices are deprecated for Oracle 12—it is a good idea to migrate the OCR (and voting disks for that matter) off block devices and to either a supported cluster file system such as GFS or OCFS2 or follow the best practice to move these files into ASM.

The OCR should only be updated by Oracle Clusterware processes, Enterprise Manager, supported utilities such as crsctl, srvctl, ocrconfig; and configuration tools such as the OUI, dbca, and netca.

Storing Information in the Oracle Local Registry

The Oracle Local Registry is the OCR's local counterpart and a new feature introduced with Grid Infrastructure. The information stored in the OLR is needed by the Oracle High Availability Services daemon (OHASD) to start; this includes data about GPnP wallets, Clusterware configuration, and version information. Comparing the OCR with the OLR reveals that the OLR has far fewer keys; for example, ocrdump reported 704 different keys for the OCR vs. 526 keys for the OLR on our installation.

If you compare only the keys again, you will notice that the majority of keys in the OLR deal with the OHASD process, whereas the majority of keys in the OCR deal with the CRSD. This confirms what we said earlier: you need the OLR (along with the GPnP profile) to start the High Availability Services stack. In contrast, the OCR is used extensively by CRSD. The OLR is maintained by the same command-line utilities as the OCR, with the appended -local option. Interestingly, the OLR is automatically backed up during an upgrade to Grid Infrastructure, whereas the OCR is not.

Fencing with the Voting Disk

The voting disk is used to provide fencing and to determine cluster-node membership. During normal operations, the OCSSD daemon on each node in the cluster updates the voting disk once a second with the current status of that node. It then reads back the status structures of all other nodes in the cluster. In the event of an interconnect failure, all nodes in the cluster attempt to place a lock in the voting disk. If a node can lock a majority of the voting disks, then it gains control of the cluster.

In 11gR2, the voting disk can be stored in an ASM disk group or a cluster file system. As with the OCR, the voting disk can only be stored on raw devices or block devices if the cluster has been upgraded from an earlier version of Oracle. In any configuration, there should always be an odd number of voting disks. In the event of an interconnect failure in a two node-cluster, this ensures that one node always secures a majority of voting disks. For clusters containing three or more nodes, a more complex algorithm is used to determine which node ultimately controls the cluster.

If ASM is used to store the voting disks, then Oracle Clusterware automatically performs the mirroring. All copies are stored in the same ASM disk group. The number of mirrored voting disks copies stored in the disk group is dependent on the redundancy of the disk group (see Table 8-1).

Table 8.1. The Number of Voting Disk Mirrrors When Stored in ASM

Redundancy of the ASM Disk Group

The Number of Voting Disks Mirrored and the Minimum Number of Failure groups

External Redundancy

1

Normal Redundancy

3

High Redundancy

5

Within the ASM disk group, each voting disk must be stored in a separate failure group on physically separate storage. By default, each disk within an ASM disk group is a separate failure group. This translates into the requirement that, for normal redundancy, you need to define at least three failure groups. For high redundancy, at least five failure groups must use ASM for storing the voting disks. Also, the disk group attribute must be set to compatible.asm=11.2.

If a cluster file system is used to store the voting disks, then Oracle recommends that at least three copies be maintained on physically separate storage. This approach eliminates the possibility of a single point of failure. In Oracle 11gR2, up to 15 voting disk copies can be stored. However, Oracle recommends configuring no more than five copies.

Recording Information with the Grid Plug and Play Profile

The GPnP profile is an important part of the new 11.2 Grid Infrastructure, and it records a lot of important information about the cluster itself. The file is signed to prevent modifications, and administrators generally should not edit it by administrators. The profile is an XML document, which is the main reason why adding nodes requires a lot less input from the administrator. Here is a sample profile, shortened for readability:

<?xml version="1.0" encoding="UTF-8"?>
<gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile"
   xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile"
   xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd"
   ProfileSequence="3" ClusterUId="002c207a7175cf38bffcea7bea5b3a49"
   ClusterName="ocfs2" PALocation="">
  <gpnp:Network-Profile>
    <gpnp:HostNetwork id="gen" HostName="*">
      <gpnp:Network id="net1" IP="172.17.1.0" Adapter="bond0" Use="public"/>
      <gpnp:Network id="net2" IP="192.168.1.0" Adapter="bond1"
        Use="cluster_interconnect"/>
    </gpnp:HostNetwork>
  </gpnp:Network-Profile>
  <orcl:CSS-Profile id="css"
   DiscoveryString="/u03/oradata/grid/vdsk1,/u03/oradata/grid/vdsk2,/u03/oradata/grid/vdsk3"
   Lease Duration="400"/>
  <orcl:ASM-Profile id="asm"
   DiscoveryString="++no-value-at-profile-creation--never-updated-through-ASM++"
   SPFile=""/>
  <ds:Signature...>...</ds:Signature>
</gpnp:GPnP-Profile>

The preceding profile can be obtained by invoking gpnptool with the get option. This dumps the XML file into standard output. For better readability, we saved this to a local PC and opened it with Firefox, which displays the XML a lot more nicely. The cluster this profile is taken from uses OCFS2 for storing the voting disk and OCR.

We have also removed the signature information from the profile because it didn't provide information needed for this discussion. The example illustrates that the cluster has a unique identifier, and that the name is also recorded. The next item in the profile is the network profile, an important piece of information. Oracle uses the GPnP profile and the information stored in the OCR when adding a node to the cluster. The fact that the network interfaces are stored in the GPnP profile allows the administrator to specify less information on the command line. This is also a convenient place to look up which network interface is used for what purpose; here, the Use attribute on the Network tag gives the interface's purpose away.

Another important piece of information is recorded in the CSS-Profile tag, which records the location of the CSS voting disks. In the preceding example, there are three copies of the voting disks on a cluster file system /u03 in directory /u03/oradata/grid/. Because the voting disks are stored outside of ASM, the ASM information is not populated. Unless your database uses ASM specifically, no ASM instance will be brought online by Grid Infrastructure because it simply is not needed. If ASM is used, the relevant tags <orcl:CSS-Profile/> and <orcl:ASM-Profile/> will look something like the following:

<orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400" />
<orcl:ASM-Profile id="asm" DiscoveryString=""
   SPFile="+DATA/prod/asmparameterfile/registry.253.699915959" />

The only supported case where the file should be modified is when the ASM instance's spfile is corrupted and must be changed.

Using Background Processes

Chapter 2 touched on the background processes used by Grid Infrastructure; the following sections pick up that thread and explain that process in much more detail. The all-important relationship between processes could not be discussed in the concepts chapter. In the authors' opinion, the documentation about Grid Infrastructure in general, but especially the startup sequence, is incomplete. This makes it very hard for anyone trying to adopt, implement, and troubleshoot Grid Infrastructure. Also, it took a while for Oracle Support to publish additional information in My Oracle Support that addressed the worst shortcomings. The following sections will fill in many of the important details that the official documentation neglects to cover or explains incorrectly.

Grid Infrastructure Software Stacks

Oracle re-architected Grid Infrastructure into two different stacks. The official documentation refers to them as the High Availability Services stack and the Cluster Ready Services stack. Other sources divide the software stack into the lower and upper stack. We will stick to Oracle's terminology in this chapter, to minimize confusion. Figure 8-1 provides an overview of the processes and their position in the stack.

The Grid Infrastructure software stack and process dependencies

Figure 8.1. The Grid Infrastructure software stack and process dependencies

The High Availability Services stack consists of daemons that communicate with their peers on the other nodes. As soon as the High Availability Services stack is up, the cluster node can join the cluster and use the shared components (e.g., the OCR). The startup sequence of the High Availability Services stack is stored partly in the Grid Plug and Play profile, but that sequence also depends on information stored in the OLR.

Next, we will take a closer look at each of these processes in the context of their position in the software stack, beginning with the lower part of the software stack.

Drilling Down on the High Availability Stack

The high availability stack is based on the Oracle High Availability Services (OHAS) daemon. In previous versions of Oracle Clusterware, three processes were started by the operating system init process: CRSD, OCSSD, and EVMD. In Oracle 11gR2 and later, the only process under direct control of the init process is the Oracle High Availability Services (OHAS) daemon. This daemon is responsible for starting all other Oracle Clusterware processes. It is also responsible for managing and maintaining the OLR. In a cluster, the OHAS daemon runs as the root user; in an Oracle Restart environment, it runs as the oracle user.

In Unix systems, the Oracle High Availability Services Stack is managed by the init process. On Linux, the following entry is appended to /etc/inittab on each node in the cluster during Grid Infrastructure installation:

h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

In previous releases, entries were added to /etc/inittab for the CRS daemon, CSS daemon, and EVM daemon. These background processes are now indirectly controlled by the OHAS daemon. The advantage of having the OHAS daemon do this is that the administrator can now issue cluster-wide commands. The OHAS daemon will start even if Grid Infrastructure is disabled explicitly. Indeed, you can use this to restart the cluster layer in a single command. In Grid Infrastructure 11.2, OHAS is comprised of the following daemons and services:

  • The Grid Plug And Play (GPnP) daemon: This daemon maintains the Grid Plug and Play profile and coordinates distribution of updates across all nodes in the cluster. This helps ensure that all nodes have the current profile. The Grid Plug and Play profile essentially contains sufficient information to start Clusterware. In previous versions, this information could be stored directly in the OCR. However, in Oracle 11gR2, the OCR can optionally be stored in ASM, which means sufficient configuration information must be available to start ASM and to access the OCR. (The GPNP daemon is sometimes referred to by the initials, GPNPD).

  • The Grid Interprocess Communication (GIPC) daemon: This daemon is new in Oracle 11gR2. The daemon process is gipcd, and it supports Grid Infrastructure Communication. Although this daemon is created when High Availability Services stack is started, GIPC has no functionality in this release. There are plans to activate GIPC in a future release.

  • The multicast DNS (mDNS) service: This service manages name resolution and service discovery within the cluster. It resolves DNS requests on behalf of the Grid Naming Service (GNS). A separate daemon is created within the High Availability Services stack to support multicast DNS.

  • The Grid Naming Service (GNS): This service performs name resolution within the cluster. It is implemented by the gnsd daemon, and it provides a gateway between the cluster mDNS service and the external DNS servers.

Drilling Down on the Cluster Ready Services Stack

The Cluster Ready Services stack builds on the services provided by the High Availability Services stack and requires that its services to be up and running. It is hinged on the Cluster Ready Services daemon. The following daemons and services are part of the Cluster Ready Services stack:

  • Cluster Ready Services (CRS): This service manages high availability cluster operations. CRS is responsible for starting, stopping, monitoring, and failing over resources. The CRS service is implemented by the crsd daemon, which runs as the root user and is restarted automatically if it fails. The crsd daemon maintains the configuration information in the OCR, and it manages cluster resources based on that information. The OCR resource configuration includes definitions of dependencies on other cluster resources, timeouts, retries, assignment, and failover policies.

    If the cluster includes Oracle RAC, then CRS is responsible for monitoring Oracle RAC database instances, listeners, and services. It is also responsible for restarting these components in the event of a failure. In a single-instance Oracle Restart environment, application resources are managed by the ohasd daemon and crsd is not used.

  • Cluster Synchronization Services (CSS) service: This service monitors and manages node membership within the cluster. It is implemented using the ocssd daemon. CSS monitors cluster membership of existing nodes and notifies all cluster members when a node joins or leaves the cluster. By default, CSS interfaces with Oracle Clusterware libraries. These libraries provide node membership and interprocess communication services. Alternatively, CSS can interface with a third-party cluster manager such as Sun Cluster or IBM HA-CMP; in these cases, a different set of cluster libraries must be linked into the cluster executables.

  • The Cluster Synchronization Services Agent (cssdagent): The cssdagent process starts, stops, and monitors the status of the ocssd daemon. If the cssdagent discovers that the ocssd daemon has stopped, then it shuts down the node to guarantee data integrity.

  • The Cluster Synchronization Services Monitor (cssdmonitor) process: The cssdmonitor process monitors the scheduling of CPU on the node. If the cssdmonitor process discovers that the node has not hung, then it shuts down the node to guarantee data integrity. This behavior is known as I/O fencing. In Oracle 10.2.0.4 and later on the Linux platform, these services were provided by the Oracle Process Monitor daemon (oprocd). In Oracle 11.2 and later, oprocd is no longer implemented. The hangcheck-timer kernel module has been made obsolete in Oracle 10.2.0.4.

  • The Disk Monitor (diskmon) daemon: The diskmon daemon monitors and performs I/O fencing for Exadata storage. On the basis that Exadata storage could be added to any RAC node at any time, the diskmon daemon is always started when ocssd starts. Therefore, the diskmon daemon will always be present, regardless of whether the cluster uses Exadata storage.

  • The Oracle Clusterware Kill (oclskd) daemon: The oclskd daemon is used by CSS to stop processes associated with CSS group members for which stop requests have been received from other nodes in the cluster.

  • The Cluster Time Synchronization Service (CTSS): This service was introduced in Oracle 11gR2, and it provides time synchronization between all nodes in the cluster. CTSS can operate in two modes; observer mode and active mode. Most sites use Network Time Protocol to synchronize all servers in their estate to an external time source. If Oracle Clusterware detects that NTP has already been configured, then CTSS will automatically run in observer mode, obtaining time synchronization information from NTP. If NTP is not configured, then CTSS will run in active mode, in which case the first node to start in the cluster will become the master clock reference. All nodes subsequently joining the cluster will become slaves, adjusting their system date and time settings to those of the master. The mode in which the CTSS service is running is logged in the alert log when the service starts. This feature will probably be particularly useful in virtualized environments, where system clocks are notoriously inaccurate.

  • The Event Manager (EVM) service: This service is implemented by the evmd daemon. It distributes information about some cluster events to all members of the cluster. Events of interest include the ability to start and stop nodes, instances, and services.

  • The Event Manager Logger (EVMLOGGER) daemon: The evmlogger daemon is started by the evmd daemon. It subscribes to a list of events read from a configuration file, and it runs user-defined actions when those events occur. This daemon is intended to maintain backward compatibility.

  • The Oracle Notification Service (ONS, eONS): This publish-and-subscribe service distributes Fast Application Notification (FAN) events to interested clients in the environment. Oracle plans to merge ONS and eONS into the evmd daemon with one of the next patchsets for 11.2.

Using Grid Infrastructure Agents

In Oracle 11gR2 and later, there are two new types of agent processes: the Oracle Agent and the Oracle Root Agent. These processes interface between Oracle Clusterware and managed resources. In previous versions of Oracle Clusterware, this functionality was provided by the RACG family of scripts and processes.

To slightly complicate matters, there are two sets of Oracle Agents and Oracle Root Agents, one for the High Availability Services stack and one for the Cluster Ready Services stack.

The Oracle Agent and Oracle Root Agent that belong to the High Availability Services stack are started by ohasd daemon. The Oracle Agent and Oracle Root Agent pertaining to the Cluster Ready Services stack are started by the crsd daemon. In systems where the Grid Infrastructure installation is not owned by Oracle—and this is-probably the majority of installations—there is a third Oracle Agent created as part of the Cluster Ready Services stack. Similarly, the Oracle Agent spawned by OHAS is owned by the Grid Infrastructure software owner.

In addition to these two processes, there are agents responsible for managing and monitoring the CSS daemon, called CSSDMONITOR and CSSDAGENT. CSSDAGENT, the agent process responsible for spawning CSSD is created by the OHAS daemon. CSSDMONITOR, which monitors CSSD and the overall node health (jointly with the CSSDAGENT), is also spawned by OHAS.

You might wonder how CSSD, which is required to start the clustered ASM instance, can be started if voting disks are stored in ASM? This sounds like a chicken-and-egg problem: without access to the voting disks there is no CSS, hence the node cannot join the cluster. But without being part of the cluster, CSSD cannot start the ASM instance. To solve this problem the ASM disk headers have new metadata in 11.2: you can use kfed to read the header of an ASM disk containing a voting disk. The kfdhdb.vfstart and kfdhdb.vfend fields tell CSS where to find the voting file. This does not require the ASM instance to be up. Once the voting disks are located, CSS can access them and joins the cluster.

The high availability stack's Oracle Agent runs as the owner of the Grid Infrastructure stack in a clustered environment, as either the oracle or grid users. It is spawned by OHAS directly as part of the cluster startup sequence, and it is responsible for starting resources that do not require root privileges. The list of processes Oracle Agent starts includes the following:

  • EVMD and EVMLOGGER

  • the gipc daemon

  • the gpnp daemon

  • The mDNS daemon

The Oracle Root Agent that is spawned by OHAS in turn starts all daemons that require root privileges to perform their programmed tasks. Such tasks include the following:

  • CRS daemon

  • CTSS daemon

  • Disk Monitoring daemon

  • ACFS drivers

Once CRS is started, it will create another Oracle Agent and Oracle Root Agent. If Grid Infrastructure is owned by the grid account, a second Oracle Agent is created. The grid Oracle Agent(s) will be responsible for:

  • Starting and monitoring the local ASM instance

  • ONS and eONS daemons

  • The SCAN listener, where applicable

  • The Node listener

There can be a maximum of three SCAN listeners in the cluster at any given time. If you have more than three nodes, then you can end up without a SCAN listener on a node. Likewise, in the extreme example where there is only one node in the cluster, you could end up with three SCAN listeners on that node.

The oracle Oracle Agent will only spawn the database resource if account separation is used. If not—i.e., if you didn't install Grid Infrastructure with a different user than the RDBMS binaries—then the oracle Oracle Agent will also perform the tasks listed previously with the grid Oracle Agent.

The Oracle Root Agent finally will create the following background processes:

  • GNS, if configured

  • GNS VIP if GNS enabled

  • ACFS Registry

  • Network

  • SCAN VIP, if applicable

  • Node VIP

The functionality provided by the Oracle Agent process in Oracle 11gR2 was provided by the racgmain and racgimon background processes in Oracle 11gR1 and earlier.

Initiating the Startup Sequence

The startup sequence in Grid Infrastructure is not completely documented in the Oracle Clusterware Administration Guide 11.2. Therefore, this section will elaborate on some of the more elaborate aspects of managing that sequence.

The init process, father of all processes in Linux, spawns OHAS. This occurs in two stages: first, the /etc/init.d/init.ohasd script is invoked with the run argument. Second, this script then calls the /etc/init.d/ohasd script, which starts $GRID_HOME/bin/ohasd.bin. The init scripts log potential problems using the syslog facility present on all Linux systems. The ohasd.bin executable uses $GRID_HOME/log/hostname/ohasd/ohasd.log to report its activities. All processes mentioned so far run with root privileges. If the administrator has disabled the Grid Infrastructure, then the Grid Infrastructure high availability stack must be started manually; otherwise, the startup sequence continues.

The ohasd.bin process then spawns four processes, all of which are located in $GRID_HOME/bin. Note that that Grid Infrastructure can be installed to an operating system account other than oracle. This chapter refers to the owner of the software stack as the Grid software owner. This chapter also assumes that the RAC binaries were installed as the oracle user. Finally, this chapter assumes that $LOG_HOME points to $ORACLE_HOME/log/hostname

  • oraagent.bin, started as the Grid software owner

  • cssdmonitor, started as root

  • cssdagent, started as root

  • orarootagent.bin, started as root

It is important to remember that these processes are created by ohasd.bin, rather than the CRS daemon process, which has not been created yet. Next, the Oracle Root Agent starts the following executables, which are also located in $GRID_HOME/bin:

  • crsd.bin: started as root

  • diskmon.bin: started as the Grid software owner

  • octssd.bin: started as root

The Oracle Agent (started as the Grid software owner) in turn will start these executables:

  • evmd.bin: started as the Grid software owner

  • evmlogger.bin: started as the Grid software owner

  • gipcd.bin: started as the Grid software owner

  • gpnpd.bin: started as the Grid software owner

  • mdnsd.bin: started as the Grid software owner

The cssdagent executable is responsible for starting ocssd.bin, which runs as the Grid software owner. The cssdagent executable doesn't spawn additional processes.

Once the CRS daemon is created by the OHAS's Oracle Root Agent, the Cluster Ready Services stack will be started. The following actions depend on CRS to create additional Oracle Agents (owned by the Grid software owner and oracle) and another Oracle Root Agent. Again, it is important to note the distinction between these agents and the ones created by OHAS. You will also see that they are different because their log files are located in $LOG_HOME/agent/crsd/ rather than $LOG_HOME/agent/ohasd. You will see the following processes spawned by crsd.bin:

  • oraagent.bin: started as the Grid software owner

  • oraagent.bin: started as oracle

  • oraarootgent.bin: started as root

These agents are henceforth responsible for continuing the start process. The Grid software owner's Oracle Agent (oraagent.bin) will start the following infrastructure components, all as the Grid Software owner:

  • Clustered ASM instance

  • ons

  • enhanced ONS (eONS), a Java process

  • tnslsnr: a SCAN listener

  • tnslsnr: a node listener

The ONS binary is an exception because it is not started from $GRID_HOME/bin, but from $GRID_HOME/opmn/bin. The enhanced ONS service is a Java process. Therefore, it doesn't start from $GRID_HOME/bin, either; however, its JAR files are located there.

The oracle Oracle Agent oraagent.bin will start the database and services associated with the database resource, as defined in the Oracle Cluster Registry.

The Oracle Root Agent will start the following resources:

  • The network resource

  • The SCAN virtual IP address

  • The Node virtual IP address

  • The GNS virtual IP address if GNS is configured

  • The GNS daemon if the cluster is configured to use GNS

  • The ASM Cluster File System Registry

Managing Oracle Clusterware

Oracle provides a comprehensive set of tools that can be used to manage Oracle Grid Infrastructure, including the following:

  • Enterprise Manager

  • The crsctl utility

  • The srvctl utility

  • Cluster Verification Utility

  • The oifcfg utility

  • The ocrconfig utility

  • The ocrcheck utility

  • The ocrdump utility

The following sections will explain what these tools are and how to use them.

Using the Enterprise Manager

Both Enterprise Manager Database Control and Enterprise Manager Grid Control can be used to manage Oracle Clusterware environments. The functionality of Enterprise Manager Database Control is restricted to managing a single database that may have multiple instances. If Enterprise Manager Database Control is deployed, then the management repository must be stored in the target database. Enterprise Manager Database Control is often configured in test systems; it is less frequently deployed in production environments.

Enterprise Manager Grid Control provides a much more flexible management solution, and many Oracle sites now use this tool to manage their entire Oracle estate. The Enterprise Manager Grid Control management repository can be stored in a separate database, outside the cluster. Enterprise Manager Grid Control supports a wider range of administrative tasks, such as the ability to configure and maintain Data Guard.

If you do not already use Enterprise Manager Grid Control to manage your database environment, then we strongly recommend that you investigate this tool, which is subject to ongoing development. At the time of writing, a new version of the tool with a more user-friendly interface was planned for Oracle 11gR2.

Note

Although Oracle promotes Enterprise Manager as the primary database management tool, the fact is that most users still use command-line tools to manage their clustered environments.

Using the Clusterware Control Utility

The Clusterware Control Utility crsctl is the primary command-line tool for managing Oracle Clusterware. It has existed since Oracle 10gR1; however, its functionality has been significantly enhanced in Oracle 11gR2.

In Oracle 11gR2, crsctl has been extended to include cluster-aware commands that can be used to start and stop Clusterware on some or all nodes in the cluster. It can also be used to monitor and manage the configuration of the voting disks and to configure and manage individual cluster resources. The crsctl utility also supports new functionality, such as the configuration of administrative privileges to ensure role separation.

As you saw in the startup section, the OHAS daemon always starts when the system boots. If the Oracle High Availability Services stack is disabled, then none of the daemons for the High Availability Services stack will start, and that node cannot join the cluster. To manually start and stop Oracle Clusterware on all nodes in the cluster, execute the following commands as the root user:

[root@london1]# crsctl start cluster -all
[root@london1]# crsctl stop cluster -all

Alternatively, you could use the -n switch to start Grid Infrastructure on a specific (not local) node. To check the current status of all nodes in the cluster, execute the following command:

[root@london1]# crsctl check cluster -all
**************************************************************
london1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
london2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

The documentation suggests that crsctl start cluster and crsctl start crs are identical when executed on the same node. We found that this is not the case. The start cluster option successfully starts any daemon that has not yet started. In other words, in cases where all daemons but the CRS daemon have started, crsctl start crs fails with an error that states: "High Availability Services is already active." When invoked with the start cluster option, crsctl detected that crsd was not started, and it brought it back online.

In Oracle 11gR2 and later, the crs_stat utility has been deprecated. However, this utility is still shipped to provide backwards compatibility. The functionality of crs_stat has been integrated into the crsctl utility. You can use crsctl status resource -t to list the current status of all resources, as in the following example:

[root@london1]# crsctl status resource -t
----------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
----------------------------------------------------------------------------
ora.ACFS1.dg
               ONLINE  ONLINE       london1
               ONLINE  ONLINE       london2
               ONLINE  ONLINE       london3
ora.DATA.dg
               ONLINE  ONLINE       london1
               ONLINE  ONLINE       london2
               ONLINE  ONLINE       london3
ora.LISTENER.lsnr
               ONLINE  ONLINE       london1
               ONLINE  ONLINE       london2
               ONLINE  ONLINE       london3
ora.RECO.dg
               ONLINE  ONLINE       london1
               ONLINE  ONLINE       london2
               ONLINE  ONLINE       london3
ora.acfs1.acfs1_vol1.acfs
               ONLINE  ONLINE       london1
               ONLINE  ONLINE       london2
               ONLINE  ONLINE       london3
...

In Oracle 11gR1 and earlier, similar output can be generated using crs_stat -t. Note that in Oracle 11gR2, this report no longer truncates resource names, and the output for each node appears on a separate line. An interesting fact about the state details field: If there is a severe problem with the database—if the online redo logs can't be archived, for example—then this will be reported in that column.

Note

You should not use crsctl to modify the status of resources prefixed with ora. unless directed to do so by Oracle Support. The correct tool to modify such resources with is srvctl; this tool will be covered in its own section.

The output of the crsctl status resource command does not list the daemons of the High Availability Services stack! You must use the initially undocumented -init option to accomplish this:

[root@london1 ˜]# crsctl status resource -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       london1                  Started
ora.crsd
      1        ONLINE  ONLINE       london1
ora.cssd
      1        ONLINE  ONLINE       london1
ora.cssdmonitor
      1        ONLINE  ONLINE       london1
ora.ctssd
      1        ONLINE  ONLINE       london1                  OBSERVER
ora.diskmon
      1        ONLINE  ONLINE       london1
ora.drivers.acfs
      1        ONLINE  ONLINE       london1
ora.evmd
      1        ONLINE  ONLINE       london1
ora.gipcd
      1        ONLINE  ONLINE       london1
ora.gpnpd
      1        ONLINE  ONLINE       london1
ora.mdnsd
      1        ONLINE  ONLINE       london1

The functionality of the deprecated crs_start, crs_stop, and crs_relocate utilities has been integrated into crsctl. User-defined resources are covered in more detail in the "Protecting Applications with Clusterware" section later in this chapter; for now, you can initiate a resource using crsctl start resource resourceName and stop it using crsctl stop resource resourceName. If Oracle Support recommends doing so, you can stop resources of the High Availability Services stack by appending the -init parameter to crsctl start/stop resource resourceName. Bear in mind that this command merely instructs the high availability daemons to perform the task; you still need to check the log file to see whether it succeeded past the start request.

Many of the remaining options of the crsctl utility will be discussed in different sections later this chapter. For example, to make applications highly available, crsctl commands can be used to create user-defined resources. Again, refer to the "Protecting Applications with Clusterware" section for more information on this topic. This section will cover how to access control lists, as well as how to set and get resource permissions. This section also covers how the framework works to protect resources.

crsctl is also required to move voting disks into ASM or to other storage options (see the "Maintaining Voting Disks" section for more information on this topic).

Managing Resources with srvctl

The srvctl utility is a command-line tool that manages Oracle resources configured in the cluster. srvctl has been available since Oracle 10gR1. Like crsctl, srvctl has been significantly enhanced in Oracle 11gR2. In Oracle 11gR1 and earlier, srvctl managed six object types: asm, database, instances, services, node applications, and listeners. Oracle 11gR2 adds an additional ten object types: GNS, VIP Addresses, SCAN VIP addresses, SCAN listeners, Oracle homes, OC4J, servers, server pools, ASM disk groups, and ASM file systems.

The same options are available in Oracle 11gR2 that were available in previous releases: enable, disable, start, stop, relocate, status, add, remove, modify, config, getenv, setenv, and unsetenv. Note that all options have not been implemented for all of the objects. Table 8-2 summarizes srvctl options available in Oracle 11gR2 for each object type.

Table 8.2. Options for the srvctl Utility

 

enable

disable

start

stop

relocate

status

add

remove

modify

config

getenv

setenv

unsetenv

database

 

instance

 

    

service

   

nodeapps

 

vip

 

 

asm

 

diskgroup

 

 

     

listener

 

scan

   

scan_ listener

   

srvpool

     

   

server

    

       

oc4j

   

home

  

 

       

filesystem

 

   

gns

   

You need to keep a couple points in mind when shutting down an ASM instance using srvctl stop asm. In Oracle 11.1 and earlier, you could stop an ASM instance, which resulted in a dismount command and a shutdown of the instance. In Oracle 11.2 the srvctl stop asm command does not work, especially if voting disks and OCR are located in ASM itself. To stop ASM in Oracle RAC 11.2, you need to shut down all clients of ASM, including CSSD. The only way to do this is to stop the High Availability Services stack on the node. The same applies to the srvctl stop home command when the Oracle home is the ASM home (you can find more information about this topic in Chapter 9).

Another useful command to know is srvctl's config option. This option retrieves information about a system, reporting all database resources if no additional arguments are provided. If an object such as a database, asm, scans, or scan listener is passed, then this option provides detailed information about the specified resource. This functionality is essential for deployments that GNS where DHCP is responsible for assigning network addresses. For example, you could use the following to find out which IP addresses are used for the SCAN:

[oracle@london1 ˜]$ srvctl config scan
SCAN name: cluster1, Network: 1/172.17.1.0/255.255.255.0/
SCAN VIP name: scan1, IP: /cluster1.example.com/172.17.1.205
SCAN VIP name: scan2, IP: /cluster1.example.com/172.17.1.206
SCAN VIP name: scan3, IP: /cluster1.example.com/172.17.1.207

The srvctl stop home command simplifies patching the RDBMS. This command records which resources are currently started from the current node. This information is stored in a state file that the administrator specifies. Upon patch completion, you can use the srvctl start home command when the Grid Infrastructure restarts, and then use the state file and all resources that were active before the patch application is brought online again.

Verifying the Cluster with the CVU

The Cluster Verification Utility (CVU) is a command-line tool that was introduced in Oracle 10gR2. The CVU checks the configuration of a cluster and reports whether each component is successfully configured. The CVU checks operating system versions and patches, kernel parameters, user limits, operating system groups and users, secure shell configuration, networking configuration, and shared storage devices. The CVU comes in three versions; be sure to run cluvfy.sh on the staging area of the Grid Infrastructure media. You can download this file from Oracle's Technet website; you will find the cluvfy file in $GRID_HOME/bin after a successful installation.

The CVU can be invoked at a number of stages during the installation and configuration process. Each step represents a set of components that should be configured at that time. In Oracle 11gR2 and later, the OUI automatically executes the CVU as part of the installation process. In our experience, however, it is still more efficient to run the CVU prior to starting the installer to address any oversights or errors in the initial configuration. The CVU should also be run following the completion of administrative tasks, such as node addition and deletion.

Table 8-3 shows the stages when you should run the CVU in Oracle 11gR2.

Table 8.3. Stages at which to Execute the cluvfy Utility

Stage

Description

-post hwos

After hardware and operating system configuration

-pre cfs

Before CFS setup

-post crs

After CFS setup

-pre crsinst

Before CRS installation

-post crsinst

After CRS installation

-pre hacfg

Before HA configuration

-post hacfg

After HA configuration

-pre dbinst

Before database installation

-pre acfscfg

Before ACFS configuration

-post acfscfg

After ACFS configuration

-pre dbcfg

Before database configuration

-pre nodeadd

Before node addition

-post nodeadd

After node addition

-post nodedel

After node addition

You can also invoke the CVU to check individual components. This functionality can be useful for isolating a problem. Table 8-4 lists the CVU components in Oracle 11gR2.

Table 8.4. The CVU Component Checks

Component

Description

nodereach

Reachability between nodes

nodecon

Node connectivity

cfs

Cluster file system integrity

ssa

Shared storage accessibility

space

Space availability

sys

Minimum space requirements

clu

Cluster manager integrity

ocr

OCR integrity

olr

OLR integrity

ha

HA integrity

crs

CRS integrity

nodeapp

Node application existence

admprv

Administrative privileges

peer

Compares properties with peers

software

Software distribution

asm

ASM integrity

acfs

ACFS integrity

gpnp

GPNP integrity

gns

GNS integrity

scan

SCAN configuration

ohasd

OHASD integrity

clocksync

Clock synchronization

vdisk

Voting disk udev settings

The CVU is implemented as a set of Java classes. On Linux, the CVU can be executed using the runcluvfy.sh shell script. This script configures the Java environment prior to starting the CVU off the installation media. After the installation has completed, the cluvfy file can be found in $GRID_HOME/bin (like any other executable).

The default form of the command includes a list of nodes that should be checked. For example, the following snippet verifies the configuration of the hardware and the operating system on nodes london1 and london2:

[oracle@london1 ˜]$ cluvfy stage -pre hwos -n london1,london2

You can append the -verbose switch to generate more detailed output:

[oracle@london1 ˜]$ cluvfy stage -pre hwos -n london1,london2 -verbose

It is good practice to keep the output of the execution as part of the installation documentation. In Oracle 11gR2 and later, the CVU can optionally generate fixup scripts to resolve a limited subset of errors. In Linux, fixup scripts can be generated to adjust kernel parameters and modify user limits. The resulting scripts must be executed by the root user.

The CVU is supplied with the Oracle Clusterware software distribution. Alternatively, it can be downloaded from the downloads section of the Oracle website at this location: www.oracle.com/technology/products/database/clustering/cvu/cvu_download_homepage.html.

The CVU must be executed by a regular user (e.g., grid or oracle; it cannot be executed by the root user).

Occasionally it is necessary to gather more than the verbose output from the CVU, especially before the software has been successfully instantiated on the cluster nodes. In this case, a little modification to the script is necessary. By default, the runcluvfy.sh script on the Grid Infrastructure staging location removes all the intermediate trace output in line 129:

128 # Cleanup the home for cluster verification software
129 $RM -rf $CV_HOME

To preserve the output, it is necessary to comment this line out. The script has hard-coded dependencies on files in the staging location. If your staging location is read-only (e.g., it is an NFS mount or DVD), then you can still make the change quite easily. Copy the runcluvfy.sh file to a writable location, such as /u01/software. Next, create cluster verification home and export this as an environment variable. Now change the script in line 129 and comment out this line:

RM -rf $CV_HOME

Also change the variable EXEC_DIR in line 29 so it points to the location where your Grid Infrastructure software is staged. Make sure to set CV_TRACE to another writable directory for the trace output, and then execute runcluvfy.sh.

...

Note that the CVU is only designed to verify that a configuration meets minimum installation requirements; in other words, it isn't foolproof. For example, it occasionally reports failures that can be safely ignored. In other cases, it misses some configuration errors that may subsequently cause an installation to fail.

Configuring Network Interfaces with oifcfg

The Oracle Interface Configuration Tool (oifcfg) is a command-line tool that can be used to configure network interfaces within Oracle Clusterware. The oifcfg utility can be used to add new public or private interfaces, as well as to modify existing subnet information.

Prior to Oracle 11gR2, oifcfg updated the OCR only. In Oracle 11gR2, oifcfg has been extended to update the OLR and the GPNP profile. Please refer back to the "Recording Information with the Grid Plug And Play Profile" section for more information on where the network information is stored in the profile.

Administering the OCR and OLR with ocrconfig

The Oracle Cluster Registry Configuration tool ocrconfig is a command-line utility that can be used to administer the OCR and OLR. The ocrconfig tool has a number of options, including the ability to add and delete OCR mirrors, perform manual backups of the OCR, restore OCR backups, and export and import OCR configuration data. Many of the options for ocrconfig are also applicable to maintaining the OLR; options that target the OLR use the -local switch.

Checking the State of the OCR and its Mirrors with ocrcheck

The Oracle Cluster Registry Check tool, ocrcheck, checks the state of the OCR and its mirrors. The behavior of ocrcheck is determined by the user that invokes it. When invoked by a regular user, such as grid or oracle, ocrcheck checks the accessibility of all OCR mirror copies. It also reports on the current size and free space in the OCR. When invoked by the root user, ocrcheck also performs a structural check on the contents of the OCR and reports any errors. The ocrcheck command is most useful when trying to determine logical corruption in the OCR.

Dumping Contents of the OCR with ocrdump

The Oracle Cluster Registry Dump tool, ocrdump, can be used to dump the contents of the OCR to a text or XML file. ocrdump can only be executed by the root user. If requested, ocrdump can also dump the OLR to a file. The dump file name will default to OCRDUMP; however, this name can be changed by specifying an alternative file name on the command line. If desired, ocrdump can write to standard output. A very useful option is to extract the contents of a backed up OCR or OLR. This enables you to perform before and after comparisons when applying patchsets, for example. If you are unsure where your backup files are located, consult the output of ocrconfig -showbackup [-local]. For very specific troubleshooting needs, ocrdump offers the option to print only a specific key from the registry.

Defining Server-Side Callouts

Grid Infrastructure and Clusterware offer the ability to define callout scripts. These are usually small shell scripts or other executables used for very specific tasks that are stored in $GRID_HOME/racg/usrco. Such scripts are invoked by Grid Infrastructure whenever a FAN event is published by the framework affecting the local node.

FAN events are generated in response to a state change, which can include the following:

  • Starting or stopping the database

  • Starting or stopping an instance

  • Starting or stopping a service

  • A node leaving the cluster

The changes can either be user-initiated or caused by failures. Example uses of these scripts are to relocate services to preferred instances after a failed node restarts, to send a page to the on-call DBA, or even to automatically raise a ticket with a trouble ticketing system. The recommendation in this case is to keep the scripts short and to ensure they execute quickly.

Each script deployed in $GRID_HOME/racg/usrco will be invoked and passed the specified parameters as name-value pairs (see Table 8-5).

Table 8.5. Parameters Passed to a User Callout Script

Parameter

Description

event_typ

The Node Event. Possible values are NODE, SERVICE, SERVICEMEMBER.

version

The event protocol. The only valid value is 1.0.

hostname

The host name the event applies to.

status

The status of the node, service, or service member. Valid values are UP, DOWN, NODEDOWN, RESTART_FAILED, NOT_RESTARTING.

timestamp

The date and time the event was published.

cardinality

The number of nodes providing service.

reason

The reason for the change in state. Values are USER, FAILOVER, FAILURE.

It is up to the script to determine which of these parameters are needed for it to perform the action. For example, the following perl script sends the on-call DBA an e-mail when prodserv service of PROD database goes down:

#!/usr/bin/perl

use strict;
use warnings;
use Sys::Hostname;
use Net::SMTP::OneLiner;

# constants
my $MONITORED_DB      = "PROD";
my $MONITORED_SERVICE = "prodserv";
my $LOGDIR            = "/tmp";
my $LOGFILENAME       = $MONITORED_SERVICE . "_" . hostname() . ".log";

my $eventType;
my $service;
my $database;
my $instance;
my $host;
my $status;
my $reason;
my $timestamp;
my $dummy;            # needed for split()

# parse command line
$eventType           = $ARGV[0];
($dummy, $service)   = split /=/, $ARGV[2];
($dummy, $database)  = split /=/, $ARGV[3];
($dummy, $instance)  = split /=/, $ARGV[4];
($dummy, $host)      = split /=/, $ARGV[5];
($dummy, $status)    = split /=/, $ARGV[6];
($dummy, $reason)    = split /=/, $ARGV[7];
$timestamp           = $ARGV[8];

# Notify the DBA our monitored service of our monitored database is down
if ($database eq $MONITORED_DB && $service eq $MONITORED_SERVICE
    && $status eq "down" && $eventType eq "SERVICE")
{
  # we still want to receive an email even if the logging failed. Using the
  # eval {} block allows the script to continue even if the call to open()
  # failed
  eval {
    open FILE, ">>", "$LOGDIR/$LOGFILENAME";
    print FILE "service:   $service
";
    print FILE "database:  $database
";
    print FILE "status:    $status
";
    print FILE "reason:    $reason
";
    print FILE "timestamp: $timestamp
";

    close FILE;
  };
  # uses Net::SMTP::OneLiner to send email. Call format is
  # send_mail($from, $to, $subj, $msg);
send_mail("oracle@" . hostname() ,
    "[email protected]",
    "Service $MONITORED_SERVICE on database $MONITORED_DB host " .
       $host is down",
    "The service went down at $timestamp");
}

The code in preceding example parses the command-line arguments passed to the perl script as the array, @ARGV. This script uses the split() function to extract the values from the name-value pairs. If the monitored service of the monitored database is down completely (the SERVICE event type indicates this), then the script logs the incident into the defined log directory. Finally, the script sends an email to the on-call DBA's pager at this email alias: .

Protecting Applications with Clusterware

Few users of Oracle Grid Infrastructure or Clusterware are aware of the fact that, in addition to providing a foundation for RAC, Grid Infrastructure is also a full-blown high availability framework. Essentially, this boils down to the fact that resources are monitored by a framework. If a resource is detected to have failed, then the framework has the necessary information about what to do with a failed resource. In most cases, it is enough to restart the failed resource on the same node. For Oracle supplied resources such as services and database instances, restart attempts are predefined by a respective resource type. If the restarts of a resource fail and if a resource can relocate, then Grid Infrastructure will try and relocate the resource to another available node. Resources that can be relocated are services and virtual IP addresses (node VIPs). Some resources are tied to a node and cannot be relocated. Such resources include most of the daemons in the high availability framework, as well as ASM instances.

Managing Resource Profiles

The clusterwide shared OCR contains the information about a system's resources, called a resource profile. In pre-11.2 installations, the crs_stat utility was used to dump a resource profile. Grid Infrastructure uses the crsctl status resource resourceName -p command to achieve the same result. Resource profiles have changed significantly with Oracle 11.2 (you can see a few examples that illustrate these differences in this chapter's "Configuring Active/Passive Clustering for Oracle Database" and "Configuring Active/Passive Clustering for Apache Tomcat" sections). Depending on the complexity of your application, it can be made up of a single or multiple resources. You can also define dependencies between the resources, ensuring they all start up in the correct order.

The changes to the resource profiles in Oracle 11.2 are so significant that Oracle dedicated a new appendix—Appendix B—to its release guide, "Clusterware Administration and Deployment Guide 11g Release 2 (11.2)." While the Oracle-supplied resources come with dedicated resource profiles that do not need to be touched, user-defined resources need to include a resource profile to work correctly.

In addition to a resource profile, user-defined resources need a controller, which is called an action script in the documentation. Grid Infrastructure uses agent programs (agents) to manage resources, and these agents invoke the action script's functions. The most common agent used to protect user-defined resources is the scriptagent. This agent allows developers to use simple shell scripts as action scripts for a resource. When used with scriptagent, an action script must implement the following callbacks and interfaces between the user resource and the framework:

  • Start: Starts the resource

  • Stop: Tries to gracefully stop the resource

  • Check: Checks for the resource's health

  • Clean: Forcefully terminates the resource

  • Abort: Called when any of the preceding callbacks and interfaces hang and do not return; defaults to terminating the process

For specific application needs, you can develop your own agent and provide callback functions for those entry points. If an agent does not provide a callback function for the entry point, the action script of the resource will be invoked to perform the required action.

Note

The agent framework is beyond the scope of this chapter. Interested readers can learn more about this subject by reading Chapter 5 of the "Oracle Clusterware Administration and Deployment Guide 11g Release 2."

The high availability framework will invoke the action script (or the agent directly) with the start option whenever crsctl start resource is called with the resource name. Similarly, you can stop, clean, or check by invoking the desired option. How well you code these options has a direct impact on the overall stability and responsiveness of the resource.

Resources sharing common attributes can be grouped into resource types. Predefined resource types are local resources or cluster resources; however, developers can also write their own types. In previous versions of Clusterware, Oracle supported another resource type, called an application resource; however, that resource type has been deprecated. It is still implemented for backward compatibility, but it should no longer be used.

The distinction between the resource types is visible in the output of crsctl status resource. Note how there are "cluster resources" and "local resources" sections. As their names suggest, local resources are tied to the node they are defined for, and they do not fail over. Cluster resources, on the other hand, can fail over and do not necessarily execute on all cluster nodes.

Grid Infrastructure continuously monitors resources as defined in their resource profiles. Any given resource is assigned a state, which can one of the following:

  • ONLINE

  • OFFLINE

  • PARTIAL

  • UNKNOWN

Configuring Active/Passive Clustering for Oracle Database

The Oracle database can easily be configured to use the Clusterware framework for high availability. Using Grid Infrastructure to protect a database resource is a very cost effective way of setting up an active/passive cluster. As an added advantage, using only one vendor's software stack to implement the cluster can make troubleshooting easier. Staff already familiar with RAC will easily be able to set up and run this configuration because it uses an identical software stack: all commands and log files are in familiar locations, and troubleshooting does not rely on the input from other teams.

To set up an active/passive cluster with 11g Release 2, you need to initially install Grid Infrastructure on all nodes of the cluster. Grid Infrastructure will provide the user with a cluster logical volume manager: ASM. If for some reason another file system is required, you have the option of choosing from the supported cluster file systems, including ACFS. Using a cluster file system that is mounted concurrently to both cluster nodes offers the advantage of not having to remount the database files and binaries in case of a node failure. Some configurations we saw suffered from extended failover periods caused by required file system checks before the file system could be remounted.

On top of the Grid Infrastructure build, you perform a local installation of the RDBMS. It is important that you do not choose a cluster installation when prompted so; otherwise, you risk violating your license agreement with Oracle.

When completed, the software stack consists of the components listed in Figure 8-2.

Oracle Software for an Active/Passive Cluster

Figure 8.2. Oracle Software for an Active/Passive Cluster

After the binaries are installed and patched according to your standards, you need to create an ASM disk group or OCFS2/GFS mount point to store the database files. Next, start up the database configuration assistant from the first node to create a database. Please ensure that you store all the data files in ASM or the clustered file system. The same applies for the Fast Recovery Area: it should be on shared storage, as well.

After the database is created by dbca, it is automatically registered in the OLR. The profile of the resource is a good reference for the resource profile to be created in the next step. It is a good idea to save the resource profile in a safe location. You can extract the profile with the crsctl status resource ora. databaseName.db -p command. Next, remove the database resource from the OLR, as shown in the following example (this example assumes the database is named TEST):

[oracle@london1 ˜]$ srvctl remove database -d TEST

Next, we need an action script that allows the framework to start, stop, and check the database resource. A possible action script might look like this:

#!/bin/bash

export ORACLE_SID=TEST
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1
export PATH=/usr/local/bin:$ORACLE_HOME/bin:$PATH
export ORACLE_OWNER=oracle

case $1 in
'start')
  su - $ORACLE_OWNER <<EOF
export ORACLE_SID=$ORACLE_SID
export ORACLE_HOME=$ORACLE_HOME
$ORACLE_HOME/bin/sqlplus /nolog
conn / as sysdba
startup
exit
EOF
   RET=0
    ;;
'stop')
  su - $ORACLE_OWNER <<EOF
export ORACLE_SID=$ORACLE_SID
export ORACLE_HOME=$ORACLE_HOME
$ORACLE_HOME/bin/sqlplus /nolog
conn / as sysdba
shutdown immediate
exit
EOF
   RET=0
    ;;
'check')
   # check for the existance of the smon process for $ORACLE_SID
   # this check could be improved, but was kept short on purpose
   found=`ps -ef | grep smon | grep $ORACLE_SID | wc -l`
   if [ $found = 0 ]; then
     RET=1
   else
     RET=0
   fi
   ;;
*)
   RET=0
    ;;
esac
# A 0 indicates success, return 1 for an error.
if [ $RET -eq 0 ]; then
exit 0
else
  exit 1
fi

The preceding action script defines environment variables in the header section, setting the Oracle owner, the Oracle SID, and the Oracle home. It then implements the required start, stop, clean, and check entry points. The check could be more elaborate—for example, it could check for a hung instance—however, this example was kept short and simple for the sake of clarity.

The action script needs to be deployed to the other cluster node, and it must be made executable. Whenever there is a change to the script, the action script needs to be synchronized with the other cluster nodes! After defining the action script, you need to create a new cluster resource. Securing an Oracle database instance with Clusterware is simplified by the availability of the SCAN; users of the database do not need to worry about which node the database is currently started on because the SCAN abstracts this information from them. The communication of the SCAN with the local listener also makes a floating virtual IP (which other cluster stacks require) unnecessary. Using some values of the resource profile saved earlier, you need to configure the resource profile for the cluster resource next. It is easier to use a configuration file than to supply all the resource parameters on the command line in name-value pairs. To recreate the database cluster resource, you could use the following configuration file, which is saved as TEST.db.config:

PLACEMENT=restricted
HOSTING_MEMBERS=london1 london2
CHECK_INTERVAL=30
CARDINALITY=1
ACTIVE_PLACEMENT=0
AUTO_START=restore
DEGREE=1
DESCRIPTION="Oracle Database Resource"
RESTART_ATTEMPTS=1
ACTION_SCRIPT=/u01/app/crs/hadaemon/hacluster.sh

The preceding configuration file can be read as follows. Placement and hosting members go hand-in-hand; the restricted policy only allows executing the resource on the hosting members london1 and london2. The check interval of 30 seconds determines the frequency of checks, and setting active placement 0 prevents Oracle from relocating the resource to a failed node; a failed node could indicate a second outage in the system, and it would be better to let the DBAs perform the switch back to the primary node. The cardinality specifies that there will always be one instance of this resource in the cluster (never more or less); similarly, the degree of 1 indicates that there cannot be more than one instance of the resource on the same node. The parameters restart attempts and action script are self-explanatory in this context. Please note that the directory $GRID_HOME/hadaemon/ did not exist; it was created for this example.

Next, use the following command to register the resource in Grid Infrastructure:

$ crsctl add resource TEST.db -type cluster_resource -file TEST.db.config

If you get a CRS-2518 (invalid directory path) error while executing this command, you most likely forgot to deploy the action script to the other node.

Note

You might be tempted to use the ora.database.type resource type here. Unfortunately, using this resource type repeatedly caused core dumps of the agent process monitoring the resource. These were in $GRID_HOME/log/hostname/agent/crsd/oraagent_oracle.

The permissions on the resource at this moment are too strict. At this moment, only root can effectively modify the resource at this time. Trying to start the resource as the oracle account results in a failure, as in this example:

[oracle@london1 ˜]$ crsctl start resource TEST.db
CRS-0245:  User doesn't have enough privilege to perform the operation
CRS-4000: Command Start failed, or completed with errors.

You can confirm the cause of this failure checking the permissions:

[root@london1 ˜]# crsctl getperm resource TEST.db
Name: TEST.db
owner:root:rwx,pgrp:root:r-x,other::r--

You would like the oracle user to also be able to start and stop the resource; you can enable this level of permission using the crsctl setperm command, as in the following example:

[root@london1 ˜]# crsctl setperm resource TEST.db -o oracle
[root@london1 ˜]# crsctl getperm resource TEST.db
Name: TEST.db
owner:oracle:rwx,pgrp:root:r-x,other::r--

The preceding snippet allows users logging in as oracle (or using sudo su - oracle) to start and stop the database, effectively transferring ownership of the resource to the oracle account. You need to ensure that the oracle account can execute the action script; otherwise, you will get an error when trying to start the resource. Members of the oinstall group have the same rights. All other users defined on the operating system level with privileges to execute the binaries in $GRID_HOME can only read the resource status.

Note

All the resource attributes—and especially the placement options and start/stop dependencies—are documented in Appendix B of the "Oracle Clusterware Administration and Deployment Guide 11g Release 2 (11.2)" document.

The final preparation steps are to copy the password file and pfile pointing to the ASM spfile into the passive node's $ORACLE_HOME/dbs directory. You are in 11.2, so the ADR will take care of all your diagnostic files. You need to create the directory for your audit files however, which normally is in $ORACLE_BASE/admin/$ORACLE_SID/adump.

From this point on, you need to use crsctl {start|stop|relocate} to manipulate the database. Relocating the resource from london1 to london2 is easy after you implement the preceding steps, as the next example demonstrates:

[root@london1 ˜]# crsctl status resource TEST.db
NAME=TEST.db
TYPE=cluster_resource
TARGET=ONLINE
STATE=ONLINE on london1
[root@london1 ˜]# crsctl relocate resource TEST.db
CRS-2673: Attempting to stop 'TEST.db' on 'london1'
CRS-2677: Stop of 'TEST.db' on 'london1' succeeded
CRS-2672: Attempting to start 'TEST.db' on 'london2'
CRS-2676: Start of 'TEST.db' on 'london2' succeeded
[root@london1 ˜]# crsctl status resource TEST.db
NAME=TEST.db
TYPE=cluster_resource
TARGET=ONLINE
STATE=ONLINE on london2

Configuring Active/Passive Clustering for Apache Tomcat

A slightly more complex example involves making Apache Tomcat or another web-accessible application highly available. The difference in this setup compared to the database setup described in the previous chapter lies in the fact that you need to use a floating virtual IP address. Floating in this context means that the virtual IP address moves jointly with the application. Oracle calls its implementation of a floating VIP an application VIP. Application VIPs were introduced in Oracle Clusterware 10.2. Previous versions only had a node VIP.

The idea behind application VIPs is that, in the case of a node failure, both VIP and the application migrate to the other node. The example that follows makes Apache Tomcat highly available, which is accomplished by installing the binaries for version 6.0.26 in /u01/tomcat on two nodes in the cluster. The rest of this section outlines the steps you must take to make Apache Tomcat highly available.

Oracle Grid Infrastructure does not provide an application VIP by default, so you have to create one. A new utility, called appvipcfg, can be used to set up an application VIP, as in the following example:

[root@london1 ˜]# appvipcfg
Production Copyright 2007, 2008, Oracle.All rights reserved

  Usage: appvipcfg create -network=<network_number> -ip=<ip_address> -vipname=<vipname>
                          -user=<user_name>[-group=<group_name>]
                   delete -vipname=<vipname>
[root@london1 ˜]# appvipcfg create -network=1 
> -ip 172.17.1.108 -vipname httpd-vip -user=root
Production Copyright 2007, 2008, Oracle.All rights reserved
2010-06-18 16:07:12: Creating Resource Type
2010-06-18 16:07:12: Executing cmd: /u01/app/crs/bin/crsctl add type app.appvip.type -basetype cluster_resource -file /u01/app/crs/crs/template/appvip.type
2010-06-18 16:07:13: Create the Resource
2010-06-18 16:07:13: Executing cmd: /u01/app/crs/bin/crsctl add resource httpd-vip -type app.appvip.type -attr USR_ORA_VIP=172.17.1.104,START_DEPENDENCIES=hard(ora.net1.network)
pullup(ora.net1.network),STOP_DEPENDENCIES=hard(ora.net1.network),ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x'

The preceding output shows that the new resource has been created, and it is owned by root exclusively. You could use crsctl setperm to change the ACL, but this is not required for this process. Bear in mind that no account other than root can start the resource at this time. You can verify the result of this operation by querying the resource just created. Note how the httpd-vip does not have an ora. prefix:

[root@london1 ˜]# crsctl status resource httpd-vip
NAME=httpd-vip
TYPE=app.appvip.type
TARGET=OFFLINE
STATE=OFFLINE

Checking the resource profile reveals that it matches the output of the appvipcfg command; the output has been shortened for readability, and it focuses only on the most important keys (the other keys were removed for the sake of clarity):

[root@london1 ˜]# crsctl stat res httpd-vip -p
NAME=httpd-vip
TYPE=app.appvip.type
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=1
DEGREE=1
DESCRIPTION=Application VIP
RESTART_ATTEMPTS=0
SCRIPT_TIMEOUT=60
SERVER_POOLS=*
START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network)
STOP_DEPENDENCIES=hard(ora.net1.network)
USR_ORA_VIP=172.17.1.108
VERSION=11.2.0.1.0

The dependencies on the network ensure that, if the network is not started, it will be started as part of the VIP start. The resource is controlled by the CRSD orarootagent because changes to the network configuration require root privileges in Linux. The status of the resource revealed it was stopped; you can use the following command to start it:

[root@london1 ˜]# crsctl start res httpd-vip
CRS-2672: Attempting to start 'httpd-vip' on 'london2'
CRS-2676: Start of 'httpd-vip' on 'london2' succeeded
[root@london1 ˜]#

In this case, Grid Infrastructure decided to start the resource on server london2.

[root@london1 ˜]# crsctl status resource httpd-vip
NAME=httpd-vip
TYPE=app.appvip.type
TARGET=ONLINE
STATE=ONLINE on london2

You can verify this by querying the network setup, which has changed. The following output is again shortened for readability:

[root@london2 source]# ifconfig
...
eth0:3   Link encap:Ethernet  HWaddr 00:16:36:2B:F2:F6
          inet addr:172.17.1.108  Bcast:172.17.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Next, you need an action script that controls the Tomcat resource. Again, the requirement is to implement start, stop, clean, and check functions in the action script. The Oracle documentation lists C, C++, and shell scripts as candidate languages for an action script. We think that the action script can be any executable, as long as it returns 0 or 1, as required by Grid Infrastructure. A sample action script that checks for the Tomcat webserver could be written in plan bash, as in the following example:

#!/bin/bash

export CATALINA_HOME=/u01/tomcat
export ORA_CRS_HOME=/u01/app/crs
export JAVA_HOME=$CRS_HOME/jdk
export CHECKURL="http://172.17.1.108:8080/tomcat-power.gif"

case $1 in
'start')
   $CATALINA_HOME/bin/startup.sh
   RET=$?
    ;;
'stop')
   $CATALINA_HOME/bin/shutdown.sh
   RET=$?
    ;;
'clean')
   $CATALINA_HOME/bin/shutdown.sh
   RET=$?
    ;;
'check')
   # download a simple, small image from the tomcat server
   /usr/bin/wget -q --delete-after $CHECKURL
   RET=$?
    ;;
*)
   RET=0
    ;;
esac
# A 0 indicates success, return 1 for an error.
if [ $RET -eq 0 ]; then
exit 0
else
  exit 1
fi

In our installation, we created a $GRID_HOME/hadaemon/ directory on all nodes in the cluster to save the Tomcat action script, tomcat.sh.

The next step is to ensure that the file is executable, which you can accomplish by running test to see whether the file works as expected. Once you are confident that the script is working, you can add the Tomcat resource.

The easiest way to configure the new resource is by creating a text file with the required attributes, as in this example:

[root@london1 hadaemon]# cat tomcat.profile
ACTION_SCRIPT=/u01/app/crs/hadaemon/tomcat.sh
PLACEMENT=restricted
HOSTING_MEMBERS=london1 london2
CHECK_INTERVAL=30
RESTART_ATTEMPTS=2
CHECK_INTERVAL=30
RESTART_ATTEMPTS=2
START_DEPENDENCIES=hard(httpd-vip)
STOP_DEPENDENCIES=hard(httpd-vip)

The following command registers the resource tomcat in Grid Infrastructure:

[root@london1 ˜]# crsctl add resource tomcat -type cluster_resource -file tomcat.profile

Again, the profile registered matches what has been defined in the tomcat.profile file, plus the default values:

[root@london1 hadaemon]# crsctl status resource tomcat -p
NAME=tomcat
TYPE=cluster_resource
ACL=owner:root:rwx,pgrp:root:r-x,other::r--
ACTION_SCRIPT=/u01/app/crs/hadaemon/tomcat.sh
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/scriptagent
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=30
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION=
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=london1 london2
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=restricted
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=2
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_DEPENDENCIES=hard(httpd-vip)
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(httpd-vip)
STOP_TIMEOUT=0
UPTIME_THRESHOLD=1h

This example includes a hard dependency on the httpd-vip resource, which is started now. If you try to start the Tomcat resource, you will get the following error:

[root@london1 hadaemon]# crsctl start resource tomcat
CRS-2672: Attempting to start 'tomcat' on 'london1'
CRS-2674: Start of 'tomcat' on 'london1' failed
CRS-2527: Unable to start 'tomcat' because it has a 'hard' dependency
on 'httpd-vip'
CRS-2525: All instances of the resource 'httpd-vip' are already running;
relocate is not allowed because the force option was not specified
CRS-4000: Command Start failed, or completed with errors.

To get around this problem, you need begin by shutting down httpd-vip and then trying again:

[root@london1 hadaemon]# crsctl stop res httpd-vip
CRS-2673: Attempting to stop 'httpd-vip' on 'london1'
CRS-2677: Stop of 'httpd-vip' on 'london1' succeeded
[root@london1 hadaemon]# crsctl start res tomcat
CRS-2672: Attempting to start 'httpd-vip' on 'london1'
CRS-2676: Start of 'httpd-vip' on 'london1' succeeded
CRS-2672: Attempting to start 'tomcat' on 'london1'
CRS-2676: Start of 'tomcat' on 'london1' succeeded

The Tomcat servlet and JSP container is now highly available. However, please bear in mind that the session state of an application will not fail over to the passive node in the case of a node failure. The preceding example could be further enhanced by using a shared cluster logical ACFS volume to store the web applications used by Tomcat, as well as and the Tomcat binaries themselves.

Using Oracle Restart

Oracle Restart is a new feature in 11g Release 2. This feature could probably be best described as a single-instance version of the RAC command-line interface. To install Oracle Restart, you need to install Grid Infrastructure for a single node. Unlike clustered Grid Infrastructure, there is no prohibition against installing the software in the same Oracle base. Also, the administrator is free to install the software with different accounts for Grid Infrastructure and the database binaries. However, we've found that many administrators choose not do this, instead, they typically opt to use the oracle account to install both Oracle Restart and the database binaries.

ASM is only available as part of Grid Infrastructure. If you want to use it, then you must install Oracle Restart as well, in which case you might as well make most of it. From a database administrator's point of view, Oracle Restart is a blessing because it offers a single command-line interface into Oracle single-instance deployments and RAC. The automatic startup of resources registered with Oracle Restart solves a very old problem of having to provide startup scripts. This is a great advantage for companies running Oracle on many different hardware platforms. It is literally impossible to find a listener not started after a database server rebooted. Also, the administrator does not have to worry about the ASM instance or disk group not being mounted—this is also accomplished automatically.

After a fresh installation of Grid Infrastructure for a single server, you will find resources that are automatically managed by Oracle Restart. The resources managed by Oracle Restart after a fresh installation include the following:

  • The CSS daemon

  • The diskmon daemon

  • Any ASM disk group(s)

  • The listener

  • The ASM instance

Identical to what you find in a full RAC installation, the aforementioned resources will automatically be protected from failure by the Grid Infrastructure software stack. In contrast to a RAC installation, Oracle Restart does not create resources for ONS and eONS daemons by default. If you would like to benefit from FAN events in an Oracle Restart environment, then you need to manually add and start these two processes, as in the following example:

[root@london1 ˜]# srvctl add ons
[root@london1 ˜]# srvctl start ons
[root@london1 ˜]# srvctl add eons
[root@london1 ˜]# srvctl start eons

Note that you can use server callouts the same way you do with RAC once the ONS/eONS processes are started (please refer to the "Defining Server-Side Callouts" section earlier in this chapter for more information about server callouts).

As with RAC, the Oracle High Availability Services daemon starts through the init process at system boot. And again, as with RAC, the OHAS daemon does not adhere to the standards of Red Hat's rc system in the 11.2.0.1 base release. This means that, even though a kill script is configured to stop OHASD in the relevant run level, the absence of a lock file in /var/lock/subsys/ means that the kill script is never invoked, which results in a database crash. The startup process from this point on differs from clustered Grid Infrastructure. For example, there are no cluster components started by CRSD (in fact there is no CRSD at all); instead, all you need to do is start ASM and the defined database resources, as well as any potentially registered services. In comparison with the clustered start sequence in RAC, you do not see a cssdmonitor process; rather, cssdagent will have to monitor ocssd.bin without the help of its clustered cousin. The OHAS daemon uses the OLR to read configuration information and resource dependencies; however, Oracle Restart does not use a GPnP profile.

All resources managed by Oracle Restart should preferably be controlled by the srvctl command-line tool. This ensures that the dependencies defined in the resource profile can be respected. This especially applies to the listener, database, and services. The main assistants—dbca, netca, and asmca—are Oracle Restart aware, and they will modify resource profiles. Any database created with dbca will automatically be registered in Oracle Restart, and it will have a dependency on the ASM disk group(s) it uses. With Oracle Restart, it is finally no longer necessary to modify the database service_name initialization parameter; indeed, it is actually no longer recommended. Instead, database services should be defined using the srvctl add service command. Neither should you use the deprecated CREATE_SERVICE in DBMS_SERVICE procedure.

You can query Oracle Restart meta-information much as you can with RAC. Many of the commands mentioned in the "Managing Oracle Clusterware" section can also be used for Oracle Restart. For example, the crsctl and srvctl command-line utilities support Oracle Restart with a non-clustered subset of their command-line arguments.

Troubleshooting Oracle Restart is almost identical to troubleshooting RAC; the daemons started in Oracle Restart use the same log file locations as their RAC counterparts. Thus, the primary location for troubleshooting log files is $GRID_HOME/log/hostname.

Troubleshooting

Troubleshooting is an art every database administrator should master. When working with Grid Infrastructure, administrators typically find themselves trying to troubleshoot startup and initialization issues.

Oracle provides a useful tool to collect troubleshooting information, called diagcollection.sh. It is located on all cluster nodes in $GRID_HOME/bin/diagcollection.sh. When called, this tool creates zip files that you can analyze yourself or attach to a service request.

Since Oracle 10.2, a central location is available for almost all log files related to the operation of Clusterware/Grid Infrastructure. This location is at $GRID_HOME/log/hostname. The alerthostname.log file is usually the first point of call in case of problems. SCAN listeners comprise one exception to this rule; their log files are stored in the ADR. The ADR_BASE for the SCAN listeners is either $GRID_HOME/log/ or $ORACLE_BASE/diag/.

The following sections provide more detail about ways to troubleshoot problems.

Resolving Startup Issues

Startup issues in RAC are usually caused by a change in the hardware setup and configuration, and they can sometimes be difficult to diagnose. For example, changes to routers and firewalls on the network layer can have a negative impact on the health of the cluster. Operating system upgrades should also not be underestimated. A simple upgrade of the running kernel can have devastating effects for a system using ASMLib if the system administrator forgets to install the matching kernel modules.

In our experience, the most common startup problem in Clusterware prior to version 11.2 issues concerned permissions on the raw (block) devices for the OCR and voting disk. Oracle 11.2 addresses this problem by storing OCR and voting disks in ASM. Should the cluster stack not start after a reboot or after stopping the cluster stack on a node, then there are a few locations to look for errors.

Oracle's cluster verification tool recommends the following settings for the OCR when block devices are used:

  • Owner: the account under which Clusterware/Grid Infrastructure was installed

  • Group: oinstall

  • Permissions: 0640

    Similarly, the following settings are expected for the voting files:

  • Owner: the account under which Clusterware/Grid Infrastructure was installed

  • Group: oinstall

  • Permissions: 0640

How you should set the permissions and user/group attributes depends on how you defined the devices in the first place. That said, the most commonly used scripts to do this are udev or the /etc/rc.local scripts.

Clusters configured to use GNS will not start completely until communication with the DHCP and DNS servers is established.

A good first indication of the status of the cluster node is the output from the crsctl check crs command. Typically, you define your troubleshooting strategy based on its output. The following output shows what the command should return for a healthy cluster node:

[oracle@london1 ˜]$ $GRID_HOME/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Several possible problems can be reported at this stage, including the following:

  • CRS-4639: Could not contact Oracle High Availability Services

  • CRS-4535: Cannot communicate with Cluster Ready Services

  • CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

  • CRS-4534: Cannot communicate with Event Manager

Systematic troubleshooting of Grid Infrastructure follows a bottom-up approach because the resources have clearly defined dependencies (please refer to the "Startup Sequence" section for more information about the starting Grid Infrastructure in general). The generic troubleshooting document Oracle provides to resolve such issue is "My Oracle Support note 1050908.1 How to Troubleshoot Grid Infrastructure Startup Issues"; the following sections are loosely modeled on that support note.

Failing to Start OHAS

The first daemon to start in a Grid Infrastructure environment is OHAS. This process relies on the init process to invoke /etc/init.d/init.ohasd, which starts /etc/rc.d/init.d/ohasd, which in turn executes $GRID_HOME/ohasd.bin. Without a properly working ohasd.bin process, none of the other stack components will start. The entry in /etc/inittab defines that /etc/init.d/init.ohasd is started at runlevels 3 and 5. Runlevel 3 in Linux usually brings the system up in networked, multi-user mode; however, it doesn't start X11. Runlevel 5 is normally used for the same purpose, but it also starts the graphical user interface. If the system is at a runlevel other than 3 or 5, then ohasd.bin cannot be started, and you need to use a call to init to change the runlevel to either 3 or 5. You can check /var/log/messages for output from the scripts under /etc/rc.d/init.d/; ohasd.bin logs information into the default log file destination at $GRID_HOME/log/hostname in the ohasd/ohasd.log subdirectory.

The administrator has the option to disable the start of the High Availability Services stack by calling crsctl disable crs. This call updates a flag in /etc/oracle/scls_scr/hostname/root/ohasdstr. The file contains only one word, either enable or disable, and no carriage return. If set to disable, then /etc/rc.d/init.d/ohasd will not proceed with the startup. Call crsctl start crs to start the cluster stack manually in that case.

Many Grid Infrastructure background processes rely on sockets created in /var/tmp/.oracle. You can check which socket is used by a process by listing the contents of the /proc/pid/fd directory, where pid is the process id of the program you are looking at. In some cases, permissions on the sockets can become garbled; in our experience, moving the .oracle directory to a safe location and rebooting solved the cluster communication problems.

Another reason ohasd.bin might fail to start: the file system for $GRID_HOME could be either corrupt or otherwise not mounted. Earlier, it was noted that ohasd.bin lives in $GRID_HOME/bin. If $GRID_HOME isn't mounted, then it is not possible to start the daemon.

We introduced the OLR as an essential file for starting Grid Infrastructure. If the OLR has become corrupt or is otherwise not accessible, then ohasd.bin cannot start. Successful initialization of the OLR is recorded in the ohasd.log, as in the following example (the timestamps have been removed for the sake of clarity):

[ default][3046704848] OHASD Daemon Starting. Command string :reboot
[ default][3046704848] Initializing OLR
[  OCRRAW][3046704848]proprioo: for disk 0
 (/u01/app/crs/cdata/london1.olr),
 id match (1), total id sets, (1) need recover (0), my votes (0),
 total votes (0), commit_lsn (15), lsn (15)
[  OCRRAW][3046704848]proprioo: my id set: (2018565920, 1028247821, 0, 0, 0)
[  OCRRAW][3046704848]proprioo: 1st set: (2018565920, 1028247821, 0, 0, 0)
[  OCRRAW][3046704848]proprioo: 2nd set: (0, 0, 0, 0, 0)
[  CRSOCR][3046704848] OCR context init CACHE Level: 0xaa4cfe0
[ default][3046704848] OHASD running as the Privileged user

Interestingly, the errors pertaining to the local registry have the same numbers as those for the OCR; however, they have been prefixed by PROCL. The L can easily be missed, so check carefully! If the OLR cannot be read, then you will see the error messages immediately under the Initializing OLR line. This chapter has covered two causes so far: the OLR is missing or the OLR is corrupt. The first case is much easier to diagnose because, in that case, OHAS will not start:

[root@london1 ˜]# crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services

In the preceding example, ohasd.log will contain an error message similar to this one:

[ default][1381425744] OHASD Daemon Starting. Command string :restart
[ default][1381425744] Initializing OLR
[  OCROSD][1381425744]utopen:6m':failed in stat OCR file/disk
 /u01/app/crs/cdata/london1.olr,
 errno=2, os err string=No such file or directory
[  OCROSD][1381425744]utopen:7:failed to open any OCR file/disk, errno=2,
 os err string=No such file or directory
[  OCRRAW][1381425744]proprinit: Could not open raw device
[  OCRAPI][1381425744]a_init:16!: Backend init unsuccessful : [26]
[  CRSOCR][1381425744] OCR context init failure.  Error: PROCL-26: Error
 while accessing the physical storage Operating System
 error [No such file or directory] [2]
[ default][1381425744] OLR initalization failured, rc=26
[ default][1381425744]Created alert : (:OHAS00106:) :  Failed to initialize
 Oracle Local Registry
[ default][1381425744][PANIC] OHASD exiting; Could not init OLR

In this case, you should restore the OLR, which you will learn how to do in the "Maintaining Voting Disk and OCR/OLR" section.

If the OLR is corrupted, then you will slightly different errors. OHAS tries to read the OLR; while it succeeds for some keys, it fails for some others. Long hex dumps will appear in the ohasd.log, indicating a problem. You should perform an ocrcheck -local in this case, which can help you determine the root cause. The following output has been taken from a system where the OLR was corrupt:

[root@london1 ohasd]# ocrcheck -local
Status of Oracle Local Registry is as follows :
    Version                  :          3
    Total space (kbytes)     :     262120
    Used space (kbytes)      :       2232
    Available space (kbytes) :     259888
    ID                       : 1022831156
    Device/File Name         : /u01/app/crs/cdata/london1.olr
                               Device/File integrity check failed

   Local registry integrity check failed

    Logical corruption check bypassed

If the utility confirms that the OLR is corrupted, then you have no option but to restore it. Again, please refer to the "Maintaining Voting Disk and OCR/OLR" section for more information on how to do this.

Failing to Start Agents Created by OHAS

With ohasd.bin confirmed to be started and alive, you can proceed with checking the agents spawned by ohasd.bin, namely CSSDAGENT, CSSDMONITOR, ORAAGENT, and ORAAGENT_ROOT. We, the authors, have not encountered any problems with these agents yet; however, the generic piece of advice is to check file system permissions and the agent log files for clues. Do not confuse these agents with the ones created at a later stage by CRS. The log files are in $GRID_HOME/log/hostname/agent/ohasd/agentname. Note that the Oracle documentation and My Oracle Support do not take into account the fact that Grid Infrastructure can be installed with an account other than the oracle account. On systems where the grid user owns the software stack, you will find the following agent log directories in $GRID_HOME/log/hostname/agent/ohasd:

  • oraagent_grid/

  • oracssdagent_root/

  • oracssdmonitor_root/

  • orarootagent_root/

Failing to Start the Cluster Synchronization Services Daemon

The ocssd.bin process is spawned by the cssdagent process. You can check the agent's log file for any potential issues in $GRID_HOME/log/hostname/ohasd/oracssdagent_root. Once the initialization of the ocssd.bin process has started, you need to check its own log file, which is recorded in $GRID_HOME/log/hostname/cssd/ocssd.log. The log file is very verbose, so you should try to find the last occurrence of the following line after the initialization code:

[    CSSD][3412950784]clssscmain: Starting CSS daemon, version 11.2.xxx, in
(clustered) mode with uniqueness value xxx

Also note that occsd.bin will be started as often as defined in the resource profile's restart attempts. Successful initialization of the CSS daemon depends on the GPnP profile, the discovery and accessibility of the voting disks, and a functional network.

The GPnP profile is queried using interprocess communication IPC, as shown in this excerpt of the ocssd.log file (line breaks have been introduced for readability):

2010-06-25 09:43:19.843: [    GPnP][3941465856]clsgpnpm_exchange:
 [at clsgpnpm.c:1175] Calling "ipc://GPNPD_london1", try 4 of 500...
2010-06-25 09:43:19.853: [    GPnP][3941465856]clsgpnp_profileVerifyForCall:
 [at clsgpnp.c:1867] Result: (87) CLSGPNP_SIG_VALPEER. Profile verified.
 prf=0x113478c0
2010-06-25 09:43:19.853: [    GPnP][3941465856]clsgpnp_profileGetSequenceRef:
 [at clsgpnp.c:841] Result: (0) CLSGPNP_OK. seq of p=0x113478c0 is '7'=7
2010-06-25 09:43:19.853: [    GPnP][3941465856]clsgpnp_profileCallUrlInt:
 [at clsgpnp.c:2186] Result: (0) CLSGPNP_OK. Successful get-profile CALL to
 remote "ipc://GPNPD_london1" disco ""
2010-06-25 09:43:19.853: [    GPnP][3941465856]clsgpnp_getProfileEx:
 [at clsgpnp.c:540] Result: (0) CLSGPNP_OK. got profile 0x113478c0

This time it took four attempts to get the profile when CSSD started. In this case, the listening endpoint was probably not in place. If you get error messages indicating the call to clsgpnp_getProfile failed, you should check whether the GPnP daemon is up and running. You might see something similar to the following error message:

2010-06-25 10:25:17.057: [ GPnP][7256921234]clsgpnp_getProfileEx:
 [at clsgpnp.c:546] Result:
 (13) CLSGPNP_NO_DAEMON. Can't get GPnP service profile from local GPnP daemon
2010-06-25 10:25:17.057: [ default][7256921234]Cannot get GPnP profile. Error
 CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2010-06-25 10:25:17.057: [ CSSD][7256921234]clsgpnp_getProfile failed, rc(13)

CSSD will abort if ocssd.bin cannot find a voting disk, if it is not accessible because of wrong file permissions, as well as for a handful of other reasons. If the voting disks are in ASM, then you will see the following messages in the log file, indicating successful detection of the disks:

2010-06-23 16:47:49.651: [    CLSF][1158797632]Opened hdl:0x32d5770 for
  dev:ORCL:OCRVOTE1:
2010-06-23 16:47:49.661: [    CSSD][1158797632]clssnmvDiskVerify: Successful discovery
for disk ORCL:OCRVOTE1, UID 84b990fb-73234f2b-bf82b7ae-c4d2e0a2,
Pending CIN 0:1276871678:0, Committed CIN 0:1276871678:0
2010-06-23 16:47:49.661: [    CLSF][1158797632]Closing handle:0x32d5770
2010-06-23 16:47:49.661: [   SKGFD][1158797632]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x3280cc0
for disk :ORCL:OCRVOTE1:
2010-06-23 16:47:49.661: [    CSSD][1158797632]clssnmvDiskVerify: discovered a
potential voting file

If errors occur, then you will probably see the following string in the ocssd.log file:

2010-06-23 16:47:49.703: [    CSSD][1158797632]clssnmvDiskVerify: Successful discovery
of 0 disks

You will also see that the GPnP profile's discovery string has been used to scan available locations for voting files. If all possible options for the voting disk location have been exhausted and no disks were found, then CSSD will stop. You should check whether the file system permissions on the voting disks are correct if they are outside of ASM; this will solve this problem in the majority of cases.

The ocssd.bin daemon requires gipcd to be up and running for intercluster communication. The gipcd process in turn requires a working network. If that network is not started, then the gipcd process will bail out at this point. The failure message will look something like this: clssscmain: failed to open gipc endp. The cluvfy command-line utility can be used to verify whether the network is working.

Failing to Start the Cluster Ready Services Daemon

The node cannot join the cluster until the CRS daemon has successfully started. Therefore, CRSD is a milestone in the startup process. As noted earlier in this chapter's "Initiating the Startup Sequence" section, CRSD will create a new set of ORAAGENT and ORAROOTAGENT processes to create the virtual network resources, as well as to start the database and its associated services. The main log file for CRSD is found in the $GRID_HOME/log/hostname/crsd directory.

The successful start of CRSD depends on the discovery of the OCR and a fully functional CSS daemon. It also requires access to the GPnP profile (as discussed in the preceding section).

As with CSSD, a functional network and GIPC daemon are required for crsd.bin to work properly. If ocssd.bin has not been created, then the log file will show this as in the following example:

2010-04-30 10:34:01.183: [ CRSMAIN][3257705040] Checking the OCR device
2010-04-30 10:34:01.183: [ CRSMAIN][3257705040] Connecting to the CSS Daemon
2010-04-30 10:34:01.184: [ CSSCLNT][1111550272]clssnsquerymode: not connected to CSSD
2010-04-30 10:34:01.204: [ CSSCLNT][3257705040]clssscConnect: gipc request failed with 29         (0x16)
2010-04-30 10:34:01.204: [ CSSCLNT][3257705040]clsssInitNative: connect failed, rc 29
2010-04-30 10:34:01.204: [  CRSRTI][3257705040] CSS is not ready. Received status 3 from CSS. Waiting for good status ..

Problems related to file-system permissions should not happen with the OCR stored inside ASM; the user simply has no influence on these. If the OCR is stored outside ASM, then the permissions should be 0640, and the file should be owned by root, and the group should be oinstall. Whenever something goes wrong with the OCR, the system will return the PROT-26 error, which indicates an unsuccessful backend initialization. The error messages reported here are similar to the OLR error messages discussed previously in the "Failing to Start OHAS" section; however, this time the prefix L (as in PROTL) is missing. If the OCR is lost completely, then you will see the following error messages in the crsd.log file:

2010-06-30 22:13:40.251: [  CRSOCR][968725072] OCR context init failure.  Error:
  PROC-26: Error while accessing the physical storage ASM error
[SLOS: cat=8, o
pn=kgfoOpenFile01, dep=15056, loc=kgfokge
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +OCRVOTE.255.4294967295
ORA-17503: ksfdopn:2 Failed to open file +OCRVOTE.255.4294967295
ORA-15001: diskgrou
] [8]
2010-06-30 22:13:40.251: [    CRSD][968725072][PANIC] CRSD exiting: Could not init
 OCR, code: 26
2010-06-30 22:13:40.251: [    CRSD][968725072] Done.

In this preceding example, the PROC-26 error will be reported.

Failing to Start the GPnP Daemon

The Grid Plug and Play Daemon plays a central role in the startup process. If name resolution is not working, the daemon won't start. That poses severe problems, especially for systems using GNS, because CSSD and CRSD also will not start.

You can use the nslookup or host command line utilities to troubleshoot DNS setup. If you're using a dedicated name server separate from your corporate DNS server, you can issue the nslookup command as in the following example:

[oracle@london1 ˜]$ nslookup - dnsServerName

This command enables the interactive mode of the nslookup utility. You can then interrogate the name server given on the command line by issuing the host hostname command in interactive mode. You can type exit at any time to leave the utility. If name resolution fails for a given host, then check with your networking team for help in resolving the problem.

Agents spawned by CRSD

Agents started by CRSD include ORAAGENT and ORAROOTAGENT. Note that these agents are different from the ones we discussed earlier in the chapter under the "Failing to Start OHAS" section. Depending on whether you chose to use a different operating system account to install Grid Infrastructure, then you may find two ORAAGENTS: one created by the Grid Infrastructure owner, and the other owned by the owner of the RDBMS binaries.

Failure to start these agents is rare, and you should consult the log files. The log files locations are all relative to $GRID_HOME/log/hostname/:

  • ./agent/crsd/oraagent_grid/oraagent_grid.log

  • ./agent/crsd/oraagent_oracle/oraagent_oracle.log

  • ./agent/crsd/orarootagent_root/orarootagent_root.log

The authors have seen very few problems in this stage of the startup sequence, and most of these could be solved by checking the daemon log files. If the preceding does not help, then checking the started resource's log file(s) usually reveals the source of the problem.

Note

Two components listed in the output of crsctl status resource are different from the ones mentioned in the preceding section. The ora.gsd resource is OFFLINE because it provides backward compatibility with Oracle 9i. We cannot imagine a case where 9i and 11.2 databases would coexist in the same cluster. The other resource that is always offline at the time of this writing is ora.oc4j. This resource is part of a new feature called Database Workload Management, and it has not yet been implemented.

Resolving Problems with Java Utilities

When troubleshooting problems, it might be necessary to enable additional output. Some command-line tools support tracing and debugging output. To enable additional output, set the SRVM_TRACE environment variable to true, as in the following example:

[oracle@london1 ˜]$ export SRVM_TRACE=true

To disable tracing, use the built-in bash, unset SRVM_TRACE. A search in the $GRID_HOME/bin directory shows that several important utilities support additional trace information, including the following:

  • cluvfy

  • netca

  • srvctl

  • srvconfig

Even if it is impossible to get the meaning of the trace onscreen, this is always helpful information that can be sent to Oracle Support to speed up problem resolution.

Patching Grid Infrastructure

Applying one-off patches to Grid Infrastructure homes is a little different than patching an RDBMS home. The files in $GRID_HOME are protected with special permissions; if you try to apply a patch as the owner of the binaries, then you will be surprised to find that opatch fails with error code 73. Therefore, you need to follow this sequence of steps to perform a rolling patch:

  1. Stop any resources started from the RDBMS home(s) on the local node to be patched. These resources might include the database and other services that can be stopped either by the Grid software owner or the RDBMS software owner in homogenous 11.2-only deployments. In heterogeneous deployments, the pre-11.2 database resources have to be stopped as the RDBMS owner.

  2. Unlock the Grid Infrastructure home as root on the local node.

  3. Apply the rolling patch as the Grid software owner. Do not roll the patch over to the other node yet! Sometimes, it might be necessary to specify the -local flag; however, that depends on the patch. The PSU 11.2.0.1.1 for Grid Infrastructure required the local patching mode.

  4. Lock the local Grid Infrastructure software home as root. This will bring the cluster stack back up.

  5. Restart the resources that were running from the local RDBMS home(s).

  6. Connect to the next node and repeat steps 1-5; i.e., stop the RDBMS resources and unlock the Grid Infrastructure home for each node.

  7. When Grid Infrastructure has been unlocked on all nodes—and not before then—answer Y to the question that asks whether the next node is ready for patching.

  8. Lock the Grid Infrastructure home and start all services of the patched node.

  9. Repeat until all nodes are done.

This preceding sequence also applies for single-instance Oracle Restart deployments, although the commands you need to execute are slightly different. And you don't get a rolling patch, either. An example session demonstrates the application of the following patch: 8898852 "DATAFILE CORRUPTION WHEN FILE CREATED WITH COMPATIBLE.ASM LESS THAN 11 RESIZED."

The patch addresses a serious problem that occurs when a datafile created with a pre-11.2 ASM instance could become corrupted during a resize operation after its compatibility level was raised to 11.2. The example assumes that the patch is extracted to /mnt/db11.2/patches, a central patch repository.

The readme document supplied with the patch recommends checking that the version of $ORACLE_HOME/bin/perl/bin is greater than perl 5.00503. Oracle 11.2 is shipped with perl 5.10x, so this should not be a problem. Use this command to verify your version of perl is recent enough:

[oracle@london1 ˜]$ $ORACLE_HOME/perl/bin/perl -v

On the authors' system, the preceding command returned the following:

This is perl, v5.10.0 built for x86_64-linux-thread-multi

Copyright 1987-2007, Larry Wall
...

It is always a good idea to check for inventory corruption before applying a patch. Oracle RAC maintains two inventories: a local one and a global one. Do not proceed with the application of the patch if your inventory is in any way corrupted! To check your inventory, use the lsinventory option with opatch and ensure that the last line of the command output reads "OPatch succeeded." Next, invoke opatch with the lsinventory -detail option against all installed Oracle homes. You can use the -oh argument to supply an Oracle home, as in the following example:

[˜]$ $ORACLE_HOME/OPatch/opatch lsinventory -detail -oh $ORACLE_HOME

If all is well, proceed to patch the installation. In the first step, you record the status of the resources started from the RDBMS home(s) for later use. In this example, there is only one Oracle home; if there were more than one, then you'd need to invoke this command for all RDBMS homes:

[oracle@london1:]$ srvctl stop home -o $ORACLE_HOME -s /tmp/statusRDBMS -n london1

The state file in /tmp/statusRDBMS records all the resources executing out of $ORACLE_HOME. Later, you will see that its counterpart uses the same file to start the resources (you'll learn more about this later).

Next, you need to login as root to prepare for the process of stopping the local cluster stack. Export GRID_HOME to your Grid Infrastructure home, change the directory to $GRID_HOME/crs/install, and then invoke rootcrs.pl, as in the following example:

[root@london1 install]# ./rootcrs.pl -unlock -crshome $GRID_HOME

The output of the rootcrs.pl script is very similar to the output of the crsctl stop crs command; however, this script also changes permissions on the files in $GRID_HOME, so opatch can apply the patch. Consider the following output, which has been shortened for clarity:

2010-06-23 16:34:11: Parsing the host name
2010-06-23 16:34:11: Checking for super user privileges
2010-06-23 16:34:11: User has super user privileges
Using configuration parameter file: ./crsconfig_params
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on
'london1'
CRS-2673: Attempting to stop 'ora.crsd' on 'london1'
...
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'london1'
...
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'london1' has completed
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'london1' has
completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully unlock /u01/app/crs

Successfully unlocking $GRID_HOME prompts the beginning of the next stage. Now switch the user to the grid owner and apply the patch:

[oracle@london1 8898852]$ /u01/app/crs/OPatch/opatch apply
Invoking OPatch 11.2.0.1.2
Oracle Interim Patch Installer version 11.2.0.1.2
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
Oracle Home       : /u01/app/crs
Central Inventory : /u01/app/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.2
...
OPatch detected the node list and the local node from the inventory.  OPatch will patch the local system
then propagate the patch to the remote nodes.
This node is part of an Oracle Real Application Cluster.
Remote nodes: 'london2'
Local node: 'london1'
...
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/crs')
Is the node ready for patching [y/n]: y
...
The local system has been patched.  You can restart Oracle instances on it.
Patching in rolling mode.
The node 'london2' will be patched next.
Please shutdown Oracle instances running out of this ORACLE_HOME on 'london2'.
(Oracle Home = '/u01/app/crs')
Is the node ready for patching? [y|n]

Note

If you have not updated the base release's opatch to version 11.2, then you should do so before applying the first patch! Almost all patches released on My Oracle Support now require opatch 11.2.

At this point, it is critical that you stop do not continue further. Remember: Only the local node has been prepared for patching, so $GRID_HOME is still locked. Also, do not close the terminal session. To make this a rolling patch, you need to start the local resources before taking the remote node's resources down. Now log in as root again and patch $GRID_HOME:

[root@london1 ˜]# cd $GRID_HOME/crs/install/
[root@london1 install]# ./rootcrs.pl -patch
2010-06-23 16:47:09: Parsing the host name
2010-06-23 16:47:09: Checking for super user privileges
2010-06-23 16:47:09: User has super user privileges
Using configuration parameter file: ./crsconfig_params
CRS-4123: Oracle High Availability Services has been started.

Wait a short while, check the cluster state (crsctl check crs), and then proceed when the software stack is up again. At this point, you are ready to start the RDBMS resources. The previously created state file plays a major role in this. Now log in as the RDBMS software owner again and start the resources with the following snippet:

[oracle@london1 ˜]$ $ORACLE_HOME/bin/srvctl start home -o $ORACLE_HOME 
> -s /tmp/statusRDBMS -n london1

After all local resources are back up and running, you can shut down the resources on the remote node, london2. The sequence of commands to accomplish this is very similar to the sequence you used to patch the first node:

  1. As the RDBMS owner, stop all resources from all RDBMS homes in use. Do so using this command: srvctl stop home. Be sure to pass the correct node name!

  2. As root, execute rootcrs.pl -unlock -crshome $GRID_HOME.

  3. Now answer Y to the question in london1's terminal session that asks whether the remote node is ready for patching. Again, don't continue with the next node yet.

  4. After a successful patch application, lock $GRID_HOME again by invoking rootcrs.pl with the -patch option as root.

  5. As the RDBMS user, start all the resources recorded in the state file using srvctl start home. Again, please ensure that you pass the correct hostname to srvctl.

  6. Repeat steps 1-5 with any remaining nodes until all nodes are patched.

When all the nodes are patched, you should again verify the state of the inventory for the Grid software home by invoking $ORACLE_HOME/OPatch/opatch lsinventory -detail. In this example, the following result indicates a successful application of the one-off patch:

[oracle@london1 ˜]$ /u01/app/crs/OPatch/opatch lsinventory
Invoking OPatch 11.2.0.1.2
...
Oracle Home       : /u01/app/crs
...
--------------------------------------------------------------------------------
Installed Top-level Products (1):
Oracle Grid Infrastructure                                           11.2.0.1.0
There are 1 products installed in this Oracle Home.
Interim patches (1) :
Patch  8898852      : applied on Wed Jun 23 16:41:01 BST 2010
Unique Patch ID:  11998663
   Created on 3 Dec 2009, 01:58:47 hrs PST8PDT
   Bugs fixed:
     8898852
Rac system comprising of multiple nodes
  Local node = london1
  Remote node = london2 --------------------------------------------------------------------------------OPatch succeeded.

Adding and Deleting Nodes

Adding and deleting nodes from a cluster is one of the features that make RAC stand out from its competition. Nodes can be added to and removed from the cluster while all resources on the initial set of nodes are still up and running.

Adding Nodes

This special type of node maintenance is always performed in two steps:

  1. The node is added to/removed from the Grid Infrastructure layer. All commands are to be executed as the Grid software owner (grid/oracle) or as root where specifically prompted.

  2. The instance is added or removed from the cluster database. These steps must be performed by the owner of the RDBMS binaries, which is usually oracle.

When adding nodes into the cluster, it is imperative to ensure that all cluster nodes are properly cabled, the operating system is installed and patched, and that all user IDs across the cluster have the same value. Additionally, you need to ensure user equivalence for the Grid Infrastructure software owner and the RDBMS software owner across all nodes in the cluster. If you are using ASMLib, you should ensure that the driver is loaded and that all disks are discovered on the new nodes.

Note that the steps are slightly different depending on whether you decided to use a GNS setup or a more conventional configuration. In any case, you should run the cluster verification tool to ensure the cluster is ready for the node addition or deletion.

Oracle has made cluster maintenance a high priority in Oracle 11g Release 2. GNS enables dynamic provisioning of IP addresses for use as virtual IP addresses and SCAN VIPs. GNS is attractive when corporate DNS cannot easily be changed, or when administrators wish to assume more responsibility for their software stack. Without GNS, node additions require changes to the corporate DNS servers, and such changes can sometimes be difficult because of separate management chains in the organization.

Checking the Prerequisites

Before proceeding with the next set of steps, please ensure that the new node(s) to be added meet the stated requirements by running the cluster verification utility from one of the existing nodes of the cluster. The following example assumes that the cluster currently consists of nodes london1, london2, and london3; and that you want to extend the cluster by another node, london4. The following example also assumes that the Grid Infrastructure software stack is owned by the oracle user:

[oracle@london1]$ $ORACLE_HOME/bin/cluvfy stage -post hwos -n london4 -verbose

Again, it is good practice to keep the output of this command with the installation or maintenance documentation. The preceding command checks whether the operating system requirements on the new node to be added (london4) are fulfilled. We recommend running cluvfy one more time. This time, you're using it to check whether the steps before the node addition have been completed:

[oracle@london1 ˜]$ cluvfy stage -pre nodeadd -n london4  -fixup -fixupdir /tmp

If any problems are reported on the london4 server, then you should run the generated fixup script as root. The exact path to the script will be printed by the cluvfy utility:

Fixup information has been generated for following node(s):
London4
Please run the following script on each node as "root" user to execute the fixups:
'/tmp/CVU_11.2.0.1.0_oracle/runfixup.sh'

Using the fixup scripts is a very convenient way to quickly fix the kernel parameters required for Grid Infrastructure to work. When this task has completed successfully, and the command output has been stored safely for later reference; then you are ready to begin the node addition. The actual node addition has always been performed using the $GRID_HOME/oui/bin/addNode.sh script.

Executing the addNode.sh Script

In previous versions of Oracle, the addNode.sh launched a graphical user interface, using a slightly modified Oracle Universal Installer. The administrator used this tool to specify the new public, private, and virtual IP addresses for the node(s) to be added, and OUI would finish the rest with the familiar interface. After the OUI completed the remote operations, it prompted the administrator to run a number of scripts as root. Oracle 11.2 has changed how this procedure works. The new addNode.sh script is headless; that is, it doesn't use a graphical interface. Thus, it resembles the pre-11.2 addNode.sh script when invoked with the silent option. The options to be passed to the addNode.sh script depend on whether GNS is in use in the cluster.

The Oracle Grid Infrastructure documentation states that executing the following command on one of the existing nodes in $GRID_HOME/oui/bin was enough to add a node to a cluster with GNS enabled:

[oracle@london1 bin]$ ./addNode.sh -silent "CLUSTER_NEW_NODES={newNodePublicIP}"

In the field, this proved to be untrue for the base release, 11.2.0.1.0. Bug 8865943 was filed against the problem, and a fix is expected. The preceding script consistently returned the following error:

SEVERE:Number of new nodes being added are not equal to number of new virtual nodes.
Silent install cannot continue.

Using the syntax you would use for a non-GNS setup works equally for GNS-enabled systems as a workaround to the aforementioned bug:

[oracle@london1 bin] $ ./addNode.sh -silent "CLUSTER_NEW_NODES={london4}" 
> "CLUSTER_NEW_VIRTUAL_HOSTNAMES={london4-vip}"

The output from this command will look something like the following:

Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 3699 MB    Passed
Oracle Universal Installer, Version 11.2.0.1.0 Production
Copyright (C) 1999, 2009, Oracle. All rights reserved.
Performing tests to see whether nodes london2,london3,london4 are available
............................................................... 100% Done.
...
-----------------------------------------------------------------------------
Cluster Node Addition Summary
Global Settings
   Source: /u01/app/crs
   New Nodes
Space Requirements
   New Nodes
      london4
         /: Required 3.28GB : Available 48.00GB
Installed Products
   Product Names
      Oracle Grid Infrastructure 11.2.0.1.0
      Sun JDK 1.5.0.17.0
      ...
      Oracle Database 11g 11.2.0.1.0
-----------------------------------------------------------------------------
Instantiating scripts for add node (Friday, November 6, 2009 9:14:02 PM GMT)
.                                                                 1% Done.
Instantiation of add node scripts complete
Copying to remote nodes (Friday, November 6, 2009 9:14:08 PM GMT)
......................................................................
...............                                 96% Done.
Home copied to new nodes
Saving inventory on nodes (Friday, November 6, 2009 9:34:52 PM GMT)
.                                                               100% Done.
Save inventory complete
WARNING:A new inventory has been created on one or more nodes in this session.
However, it has not yet been registered as the central inventory of this
system. To register the new inventory please run the script at
'/u01/app/oraInventory/orainstRoot.sh' with root privileges on nodes 'london4'.
If you do not register the inventory, you may not be able to update or patch
the products you installed.
The following configuration scripts need to be executed as the "root" user
in each cluster node.

/u01/app/oraInventory/orainstRoot.sh #On nodes london4
/u01/app/crs/root.sh #On nodes london4
To execute the configuration scripts:
    1. Open a terminal window
    2. Log in as "root"
    3. Run the scripts in each cluster node

The Cluster Node Addition of /u01/app/crs was successful.
Please check '/tmp/silentInstall.log' for more details.

Finishing the Node Addition

The output of the addNode.sh script indicates that two more scripts need to be run: orainstRoot.sh and then root.sh. The execution of the orainstRoot.sh script is not very remarkable because it only changes the permissions on the Oracle Inventory. The root.sh script performs the actual node addition, and its output is similar to the root.sh script you ran during the Clusterware installation.

Executing the aforementioned scripts on the new London4 node will show output like the following:

[root@london4 ˜]# /u01/app/oraInventory/orainstRoot.sh
Creating the Oracle inventory pointer file (/etc/oraInst.loc)
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.
Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.

The real action takes place in the root.sh script, which must also be executed on the new node:

[root@london4 ˜]# /u01/app/crs/root.sh
Running Oracle 11g root.sh script...
The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/crs
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2009-11-06 21:41:31: Parsing the host name
2009-11-06 21:41:31: Checking for super user privileges
2009-11-06 21:41:31: User has super user privileges
Using configuration parameter file:
/u01/app/crs/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-4402: The CSS daemon was started in exclusive mode but found an
active CSS daemon on node london1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to
join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'london4'
CRS-2676: Start of 'ora.mdnsd' on 'london4' succeeded
[some output has been removed for clarity]
CRS-2672: Attempting to start 'ora.asm' on 'london4'
CRS-2676: Start of 'ora.asm' on 'london4' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'london4'
CRS-2676: Start of 'ora.crsd' on 'london4' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'london4'
CRS-2676: Start of 'ora.evmd' on 'london4' succeeded
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
london4     2009/11/06 21:46:12     /u01/app/crs/cdata/london4/backup_20091106_214612.olr
Preparing packages for installation...
cvuqdisk-1.0.7-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
Updating inventory properties for clusterware
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 4095 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.

The important information here is recorded in the detection of an existing configuration, the addition of the necessary keys in the OCR, the creation of the ASM instance, and finally, the confirmation that the configuration of Grid Infrastructure for a cluster succeeded. The main part of the output is similar to the output of the crsctl start cluster command. The root.sh script automatically creates a backup of the OLR, but not of the OCR.

You can find more troubleshooting information in the $GRID_HOME/cfgtoollogs/crsconfig directory. With Grid Infrastructure, you no longer needed to execute root.sh with the -x switch to enable debugging output; the log files in the just mentioned directory are much more useful than anything that was available before.

The Oracle Universal Installer is invoked once more at the end of the root.sh script to update the central inventory. The installer adds the new node to the list of nodes in the inventory.xml file, which is shown in bold in the following example:

<HOME NAME="Ora11g_gridinfrahome1" LOC="/u01/app/crs"
 TYPE="O" IDX="1" CRS="true">
   <NODE_LIST>
      <NODE NAME="london1"/>
      <NODE NAME="london2"/>
      <NODE NAME="london3"/>
      <NODE NAME="london4"/>
   </NODE_LIST>
</HOME>

After the root.sh script finishes executing, /etc/oratab is updated with the necessary information for the n-th ASM instance. It is good practice to run a last cluvfy check from one of the initial nodes with the following arguments.

[oracle@london1 bin] ./cluvfy stage -post nodeadd -n london4

And with that, you've successfully completed adding a node to your Grid Infrastructure.

Adding the RDBMS Software

Next, you need to add the RDBMS home. This process is similar to the process for adding a node because it also uses the now headless addNode.sh script from $ORACLE_HOME/oui/bin. Remaining in the previous example, execute addNode.sh from one of the existing nodes, as shown in the following snippet:

[oracle@london1 bin]$ cd $ORACLE_HOME/oui/bin
[oracle@london1 bin]$ ./addNode.sh -silent "CLUSTER_NEW_NODES={london4}"

The output from this command will look something like the following:

Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 3622 MB    Passed
Oracle Universal Installer, Version 11.2.0.1.0 Production
Copyright (C) 1999, 2009, Oracle. All rights reserved.
Performing tests to see whether nodes london2,london3,london4 are available
............................................................... 100% Done.
...
-----------------------------------------------------------------------------
Cluster Node Addition Summary
Global Settings
   Source: /u01/app/oracle/product/11.2.0/dbhome_1
   New Nodes
Space Requirements
   New Nodes
      london4
         /: Required 3.62GB : Available 44.95GB
Installed Products
   Product Names
      Oracle Database 11g 11.2.0.1.0
      Sun JDK 1.5.0.17.0
      Enterprise Manager Common Core Files 10.2.0.4.2
[some output removed]
      Enterprise Edition Options 11.2.0.1.0
-----------------------------------------------------------------------------
Instantiating scripts for add node (Friday, November 6, 2009 9:54:36 PM GMT)
.                                                                 1% Done.
Instantiation of add node scripts complete
Copying to remote nodes (Friday, November 6, 2009 9:54:46 PM GMT)
..                             96% Done.
Home copied to new nodes
Saving inventory on nodes (Friday, November 6, 2009 10:28:22 PM GMT)
Save inventory complete
WARNING:
The following configuration scripts need to be executed as the "root" user in each cluster node.
/u01/app/oracle/product/11.2.0/dbhome_1/root.sh #On nodes london4
To execute the configuration scripts:
    1. Open a terminal window
    2. Log in as "root"
    3. Run the scripts in each cluster node
The Cluster Node Addition of /u01/app/oracle/product/11.2.0/dbhome_1
was successful.
Please check '/tmp/silentInstall.log' for more details.

You should now execute the remaining root.sh script on the new node, london4. The output of that script does not differ from single-instance Oracle deployments, so it is not listed here.

With the RDBMS home successfully extended to the new cluster nodes, you can either manually add another database instance to the node or use the graphical dbca utility for this.

Deleting Nodes

Node deletion is an operation that few sites undertake lightly. In our experience, nodes were deleted from a cluster after catastrophic failures of multiple components of the underlying hardware. We have also seen unrecoverable driver updates on non-Unix platforms be responsible for having to remove the node from the cluster. The nodes that were removed are most often re-imaged by the system administrators and then added back into the cluster. The steps that follow show how to remove software from the RDBMS and Grid homes; however, in many cases, these tasks cannot be performed. From a DBA's point of view, it is important to have a clean OCR that does not reference the deleted node, and this can be achieved even without the node you need to delete being online.

Before a node can be removed from the cluster, you need to ensure that no database instance or other custom resource type uses that node. Also, you need to ensure that you have a backup of the OCR before continuing. Create one, if necessary, using this command:

ocrconfig -manualbackup

The dbca utility can help you remove database instances from a node, a task that beyond the scope of this chapter. By the time dbca finishes, it has updated the OCR and removed the redo thread and undo tablespace of that instance. It has also updated the tnsnames.ora files if there were no errors during the execution of the operation. Once the database instances and custom resources are deconfigured, you can proceed to removing the node from the cluster.

The following subsections take you through a node deletion process. In this example, the london2 node will be removed from a cluster that initially consists of two nodes: london1 and london2. We assume that the node to be removed from the cluster is still actively forming a part of the cluster. The node deletion process is the inverted equivalent of the node addition:

  1. Remove the clustered RDBMS home.

  2. Remove the node from the cluster layer.

Removing the Clustered RDBMS Home

The upcoming sections walk you through how to remove a node. You begin by removing disabling and removing the targeted node's listener, as in this example:

[oracle@london1 ˜]$ srvctl config listener -a
Name: LISTENER
Network: 1, Owner: oracle
Home: <CRS home>
  /u01/app/crs/ on node(s) london2,london1
End points: TCP:1521

Now disable and stop the listener on london2 as the owner of the resource:

[oracle@london1 ˜]$ srvctl disable listener -l LISTENER -n london2
[oracle@london1 ˜]$ srvctl stop listener -l LISTENER -n london2

Next, you need to update inventory on the node to be deleted. You can skip this step if the node to be deleted no longer exists. The inventory.xml file showed this content before the updateNodeList command (this example has been shortened to show the RDBMS home only):

[oracle@london2 ContentsXML]$ cat inventory.xml
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2009, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<HOME NAME="OraDb11g_home1" LOC="/u01/app/oracle/product/11.2.0/dbhome_1"
TYPE="O" IDX="2">
   <NODE_LIST>
      <NODE NAME="london1"/>
      <NODE NAME="london2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
</INVENTORY>

As part of the process for preparing a node for deletion, you then execute the update Node List command, as shown in the following example (note the local flag):

[oracle@london2 bin]$ ./runInstaller -updateNodeList 
ORACLE_HOME=$ORACLE_HOME CLUSTER_NODES={london2} -local
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 1023 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.
[oracle@london2 bin]$

Running the preceding snippet changes the inventory so that it now reads as follows:

[oracle@london2 ˜]$ cat /u01/app/oraInventory/ContentsXML/inventory.xml
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2009, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
...
<HOME NAME="OraDb11g_home1" LOC="/u01/app/oracle/product/11.2.0/dbhome_1"
   TYPE="O" IDX="2">
   <NODE_LIST>
      <NODE NAME="london2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
</INVENTORY>

The node london1 node is removed at this point. To physically remove the RDBMS home on the node to be deleted, you need to use the deinstall tool—OUI no longer has a deinstall option in version 11.2. You can either download the deinstall tool from Oracle's OTN website or find it in $ORACLE_HOME. As the RDBMS software owner, execute the deinstall command with the -local switch:

[oracle@london2 deinstall]$ ./deinstall -local
Checking for required files and bootstrapping ...
Please wait ...
Location of logs /u01/app/oraInventory/logs/

############ ORACLE DEINSTALL & DECONFIG TOOL START ############
[...]
A log of this session will be written to:
'/u01/app/oraInventory/logs/deinstall_deconfig2010-07-02_10-40-54-PM.out'
Any error messages from this session will be written to:
'/u01/app/oraInventory/logs/deinstall_deconfig2010-07-02_10-40-54-PM.err'

############# ORACLE DEINSTALL & DECONFIG TOOL END #############

We, the authors, have found that the deinstall tool does not always reliably clean out the RDBMS home on the node to be deleted.

Note

If you installed the RDBMS binaries on a shared home, then you need to detach it using $ORACLE_HOME/oui/bin/runInstaller -detachHome ORACLE_HOME=$ORACLE_HOME.

The failure to deinstall the RDBMS binaries on the node to be deleted does not matter. However, it is important not to leave erroneous inventory entries on the nodes not deleted. The following snippet updates the node list for the remaining nodes, passing a comma separated list of nodes that remained in the cluster to the CLUSTER_NODES parameter:

[oracle@london1 ˜]$ $ORACLE_HOME/oui/bin/runInstaller -updateNodeList 
> ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={london1}"
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 1023 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.

You specify the remaining cluster nodes in the CLUSTER_NODES parameter (i.e., you specify all nodes except the one(s) you are deleting in this parameter). This operation concludes the node removal from the RDBMS home.

Removing the Node from the Cluster

Next, you need to update the OCR and remove the node from Grid Infrastructure. We recommend making another manual OCR backup at this time! You need to connect as root to the node you want to delete and execute the rootcrs.pl script. This script is located in $GRID_HOME/crs/install and requires the -deconfig -force options, as the following output demonstrates:

The Oracle base for ORACLE_HOME=/u01/app/crs/ is /u01/app/oracle
[root@london2 ˜]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@london2 ˜]# cd /u01/app/crs/crs/install/
[root@london2 install]# ./rootcrs.pl -deconfig -force
2010-07-03 10:16:08: Parsing the host name
2010-07-03 10:16:08: Checking for super user privileges
2010-07-03 10:16:08: User has super user privileges
Using configuration parameter file: ./crsconfig_params
VIP exists.:london1
VIP exists.: /london1-vip/172.17.1.201/255.255.255.0/eth0
VIP exists.:london2
VIP exists.: /london2-vip/172.17.1.202/255.255.255.0/eth0
GSD exists.
ONS daemon exists. Local port 6100, remote port 6200
eONS daemon exists. Multicast port 18904, multicast IP address 234.187.14.127,
listening port 2016
ACFS-9200: Supported
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'london2'
CRS-2677: Stop of 'ora.registry.acfs' on 'london2' succeeded
CRS-2791: Starting shutdown of Oracle High Availability Services-managed
resources on 'london2'
CRS-2673: Attempting to stop 'ora.crsd' on 'london2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources
on 'london2'
CRS-2673: Attempting to stop 'ora.OCRVOTE.dg' on 'london2'
CRS-2677: Stop of 'ora.OCRVOTE.dg' on 'london2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'london2'
CRS-2677: Stop of 'ora.asm' on 'london2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources
on 'london2' has completed
CRS-2677: Stop of 'ora.crsd' on 'london2' succeeded
[some output removed for clarity]
CRS-2677: Stop of 'ora.gipcd' on 'london2' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'london2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources
on 'london2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node
[root@london2 install]#

At this point, the Grid Infrastructure stack has been deconfigured on the local node.

Warning

Be careful not to specify the -lastnode option unless you completely deconfigure all nodes of the cluster.

We have occasionally seen problems from removing the Clusterware home when custom resources (e.g., an active/passive Tomcat server) were executing on the node to be deleted. Therefore, we recommend removing the node to be deleted from the custom resource profile.

Finishing the Node Removal

Now switch to one of the remaining cluster nodes. Next, connect as root and remove the node to be deleted from the configuration, as in the following example:

[root@london1 ˜]# crsctl delete node -n london2
CRS-4661: Node london2 successfully deleted.

As soon as you query the resource status, you will see that the node you deleted is no longer listed.

Next, you can delete the Grid software from the node to be removed. Back on the node removed from the cluster, execute the following steps to update the inventory and remove the grid home from the host. Begin this process by executing the following command as the owner of the Grid software stack:

[oracle@london2 ˜]$ /u01/app/crs/oui/bin/runInstaller 
> -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=london2" CRS=TRUE 
> -local
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 1023 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.

You can now use the deinstall tool to remove the software from the node in exactly the same way that you removed the RDBMS binaries. The deinstall command-line utility is located in $ORACLE_HOME/deinstall/. The following command illustrates how to run this utility:

[oracle@london2 deinstall]$ ./deinstall -local

On any one of the remaining nodes, you need to update the node list as well. You do this by passing a comma-separated list of nodes to the CLUSTER_NODES parameter list that forms the cluster (minus the one you deleted), as in the following example:

[oracle@london2 bin]$  ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={london1,london3}" CRS=TRUE

Don't forget the CRS=TRUE parameter; this serves as an important indication of the Grid software home to the Oracle Universal Installer.

Exploring More Advanced Topics

We have saved some of the more advanced topics for the rest of this chapter. These topics include tasks administrators don't usually carry out every day, such as changing the listener ports for node listener and SCAN listeners. We also look at how to change the SCAN address after the installation has completed.

Grid Infrastructure will not run if there are problems with the OCR, OLR, or voting disks. We explained the importance of these files in the "Initiating the Startup Sequence" section earlier in this chapter. In this part of the chapter, we will explain how to recover from severe problems with the essential Clusterware files. Finally, we will cover how to move the OCR and voting disks into ASM.

Selecting non-Default Listener Ports

Some sites require using a different port than 1521 for the listener. When initially installing Grid Infrastructure, the administrator is not given the choice of which port she wants to use. This means that changing the listener port must happen after the installation has completed successfully.

The netca network configuration assistant provides the easiest way to change the listener port. Start netca as the owner of the Grid Infrastructure home. The following series of screen shots documents how to change the listener port from 1521 to 1526. Figure 8-3 shows the Welcome screen.

The Welcome screen

Figure 8.3. The Welcome screen

Next, select the Listener Configuration option and click Next. This brings up the options screen shown in Figure 8-4.

The Listener options

Figure 8.4. The Listener options

Now select Reconfigure and click Next to advance to the Listener Selection screen (see Figure 8-5).

The Listener Selection screen

Figure 8.5. The Listener Selection screen

Unlike previous versions, the current version of Oracle Grid Infrastructure no longer uses the LISTENER_hostname naming convention for node listeners. Instead, all listeners are simply called LISTENER. Select LISTENER and proceed; most systems probably do not have additional listeners in place at this stage.

The netca utility warns you that it will shut down the listener with name LISTENER (see Figure 8-6). This is a clusterwide shutdown, which means that no new connections can be established to the listener you are changing. Existing connections are not affected. Acknowledging the warning prompts netca to shut down the listener.

Shutdown Warning

Figure 8.6. Shutdown Warning

The Protocol Selection screen lets you select the protocol the listener should support (see Figure 8-7). For most deployments, this should be TCP/IP.

The Protocol Selection screen

Figure 8.7. The Protocol Selection screen

The Select Subnet dropdown box at the top of the screen is useful only if you have multiple public networks in use. We will discuss this option in the "Configuring the Network" section later in the chapter. For now, select the protocols you would like to use and click Next. You will land on the Port Number screen shown in Figure 8-8.

The Port Number screen

Figure 8.8. The Port Number screen

It is finally time to change the port from the default number 1521 to your preferred port number. The example in Figure 8-8 uses port 1526.

Click Next to complete the reconfiguration. The utility does not always reliably start the listener resource in the cluster, so you should ensure that the listener resource is up on all nodes before completing the work.

To verify that the listeners are really listening on the correct port, issue the srvctl command, as shown in the following example:

[oracle@london1 ˜]$ srvctl config listener
Name: LISTENER
Network: 1, Owner: oracle
Home: <CRS home>
End points: TCP:1526
[oracle@london1 ˜]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): london1,london2

The messages here indicate that the listener is enabled, and that it is running on two nodes. You can also see that they are listening on port 1526, which is the correct port for our example.

Selecting a non-Default SCAN Listener Endpoint

Changing the SCAN listener endpoint is simpler than changing the database listener endpoints. The srvctl command line utility offers following option for modifying the SCAN listener endpoints:

[oracle@london1 log]$ srvctl modify scan_listener -h

Modifies the SCAN listeners so that the number of SCAN listeners is the same
as the number of SCAN VIPs or modifies the SCAN listener endpoints.

Usage: srvctl modify scan_listener

{-u|-p [TCP:]<port>[/IPC:<key>][/NMP:<pipe>][/TCPS:<sport>] [/SDP:<port>]}
    -u Update SCAN listeners to match the number of SCAN VIPs
    -p [TCP:]<port>[/IPC:<key>][/NMP:<pipe>][/TCPS:<sport>] [/SDP:<port>]
           SCAN Listener endpoints
    -h Print usage

You should keep the current configuration for your change record manual for the cluster. The cluster can be queried using the srvctl config listener command, as shown in the following example:

[oracle@london1 log]$ srvctl config scan_listener
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521

Now you need to stop the SCAN listener, change the configuration, and then start it up again:

[oracle@london1 log]$ srvctl stop scan_listener
[oracle@london1 log]$ srvctl modify scan_listener -p  1526
[oracle@london1 log]$ srvctl config scan_listener
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1526
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1526
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1526
[oracle@london1 log]$ srvctl start scan_listener
[oracle@london1 log]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node london2
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node london1
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node london1

Changing the SCAN After Installation

The SCAN can take up to three IP addresses that have to resolve round-robin style in DNS. If the system has been set up with less than the maximum possible number of IP addresses, you can easily add more IP addresses after the installation. First, you need to change the DNS so it reflects all the IP addresses. Second, you check the current IP address list, as in the following example:

oracle@london1:˜> host cluster1.example.com
cluster1.example.com has address 172.17.1.205
cluster1.example.com has address 172.17.1.206
cluster1.example.com has address 172.17.1.207

As the Grid software owner (oracle, in this example, but it could be any operating system account), you need to stop the SCAN and SCAN listener:

[oracle@london1 ˜]$ srvctl stop scan_listener
[oracle@london1 ˜]$ srvctl stop scan

Next, you should verify that the both SCAN and SCAN listener have indeed stopped:

[oracle@london1 ˜]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is not running

[oracle@london1 ˜]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is not running

With the SCAN and SCAN listener down, modify the SCAN as root:

[root@london1 ˜]# srvctl modify scan -n cluster1.example.com
[root@london1 ˜]# srvctl config scan
SCAN name: cluster1, Network: 1/172.17.1.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /cluster1.example.com/172.17.1.205
SCAN VIP name: scan2, IP: /cluster1.example.com/172.17.1.206
SCAN VIP name: scan2, IP: /cluster1.example.com/172.17.1.207

Updating the SCAN listener itself is straightforward:

[root@london1 ˜]# srvctl modify scan_listener -u
[root@london1 ˜]# srvctl config scan_listener
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521

Finally, you need to restart the SCAN and SCAN listener:

[root@london1 ˜]# srvctl start scan
[root@london1 ˜]# srvctl start scan_listener

The log files for the SCAN listeners are somewhat hidden in the $GRID_HOME/log directory. To view them, you need to point the ADR_BASE to $GRID_HOME/log, as in the following example:

[oracle@london1 ˜]$ adrci

ADRCI: Release 11.2.0.1.0 - Production on Sat Jul 3 11:26:11 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/oracle"
adrci> set base /u01/app/crs/log
adrci> show homes
ADR Homes:
diag/tnslsnr/london1/listener_scan1
diag/tnslsnr/london1/listener_scan2
diag/tnslsnr/london1/listener_scan3
diag/clients/user_root/host_3052993529_76
diag/asm/+asm/+ASM1
adrci> set home diag/tnslsnr/london1/listener_scan1

With the ADR base set, you can view the log. Bear in mind that the SCAN listener can migrate. Even though there is a log file, that does not mean that this particular SCAN listener is currently running on that node. The srvctl status scan_listener command tells you which node a particular SCAN listener is running on.

Maintaining Voting Disks

Prior to Oracle 11.2, voting disks were stored on block devices or, alternatively, on raw devices. Block devices such as /dev/mapper/ocrvote1p1 provided an easier interface to the voting disks because no (deprecated) raw devices had to be created during the operating system's boot process. Nevertheless, and as you saw in the earlier "Troubleshooting" section, the CSSD process requires specific permissions and ownership of the voting disks. You set that ownership either through udev or the rc.local script.

In Oracle 10.2 and 11.1, we recommended that you create multiple voting disks to eliminate a potential single point of failure. Oracle 10.1 did not support this feature; in that version, there was always only one voting disk. Even with multiple voting disks, you should still back these disks up on a regular basis and test your recovery procedures. To back up a voting disk for these releases prior to version 11.2, you could use the dd operating system utility:

[root@londonl ˜]# dd if=voting_disk of=backup_file_name

You did not need to shut down the database or Oracle Clusterware before backing up the voting disk. Although the voting disks were (in theory) identical, backing up each configured voting disk was recommended. Because voting disk locations were configured in the OCR, modifying them during any recovery process was difficult. Therefore, the most straightforward strategy was to back up all of the voting disks and, if recovery was necessary, restore each copy to its original location.

To recover the voting disks, you had to ensure that all databases and Oracle Clusterware were shut down. You could then restore the voting disks using the dd operating system utility:

[root@londonl ˜]# dd if=backup_file_name of=voting_disk

With Oracle 11.2, you no longer need to back up voting disks because the contents of the voting disks are automatically backed up in the OCR. When a new voting disk is added, the contents of the voting disk will be copied to the new voting disk.

Restoring Voting Disks

In the unlikely event that a voting disk becomes corrupted, it must be restored from a backup. This can be further complicated if the OCR becomes corrupted, as well. Should this be the case, you need to restore the OCR before proceeding. How to restore the OCR was described previously in this chapter's "Maintaining the Local and Cluster Registry" section.

The exact recovery process depends on whether you used ASM or a cluster file system to store your voting disks. In the ASM scenario, you follow these steps to restore the voting disks as root:

  1. Use crsctl stop cluster -all -f to stop the Clusterware stack on all nodes.

  2. On an arbitrary node, start the Clusterware stack in exclusive mode, and then use the -excl flag as follows: crsctl start crs -excl. If this does not work, and you get CRS-4640/CRS-4000 errors, then you can use crsctl disable crs to disable the automatic starting of the Clusterware stack. With this option set, reboot the node. When it comes back online, you should be able to successfully execute the crsctl start crs -excl command. This will start the required background processes that make up a local ASM instance, which is one of the last things you see before the start is completed.

  3. If the diskgroup containing the voting files was lost, then create a new one with exactly the same name (important!) and mount it if it is not mounted. Don't forget to set the ASM compatibility to 11.2 for the diskgroup. Also, if you are using ASMLib, then don't forget to execute /etc/init.d/oracleasm scandisks on the nodes for which you did not create the disk. Doing so makes those disks available to those nodes.

  4. If you lost the OCR as well, then restore it first (see the "Dealing with a Corrupt or Inaccessible OCR" section later in this chapter for more information)

  5. Restore the voting disks using this command: crsctl replace votedisk + disk_group_name. Remember that this diskgroup name must be identical to the one you lost.

  6. Stop the partially started cluster stack using this command: crsctl stop crs.

  7. Restart the cluster stack completely with the following command: crsctl start crs.

  8. Start the cluster on the remaining nodes. If you have disabled the automatic start of the Clusterware stack in Step 2, re-enable it using crsctl enable crs.

The situation is slightly different if your voting disks are located outside of ASM. In that case, you will need to follow these steps to restore the voting disks:

  1. Stop the cluster on all nodes using the following command: crsctl stop cluster -all -f.

  2. Identify the status of your voting disks with the following command: crsctl query css votedisk.

  3. Start the local node's cluster stack in exclusive mode: crsctl start crs -excl. You may have to disable the automatic start of Clusterware and reboot the node, as described in the previous ASM example if Clusterware states it is already started.

  4. The File Universal ID (FUID) from the preceding output is important for the next step. Delete the damaged voting disk using this command: crsctl delete css votedisk FUID.

  5. Replace the lost disk by adding another disk straight away using crsctl add css votedisk /path/to/deletedVotingDisk.

  6. Stop the partially started Clusterware stack on the local node using this command: crsctl stop crs

  7. Restart the cluster using the following command: crsctl start cluster -all. If you disabled CRS in Step 2, then re-enable it.

Moving Voting Disks into ASM

ASM is the preferred location for the voting disks and OCR. Oracle Universal Installer no longer offers administrators the choice of storing voting disks on raw or block devices for new installations.

However, upgraded clusters still use block or raw devices to store the OCR and voting disks. Moving these into ASM requires a handful of steps. First, we recommend using a dedicated ASM diskgroup for the shared files. The separate ASM disk group is optional, but you might as well store the OCR in the same disk group as your data files. The reason we suggest a separate disk group has to do with LUN cloning. Assume that your storage array can create copies of the ASM disks—it would make sense to use this functionality to rapidly clone a database. If the OCR and voting disks are in the cloned disk groups, then the success of the clone operation might be in question.

Space requirements for the diskgroup are modest, and many sites use individual partitions of 1G each to form the diskgroup. The following example assumes that you are using ASMLib to label the disks. If you were to use udev instead, that would require a different setup. Please refer to the Chapter 9 on "Automatic Storage Management" for more information about udev and ASMLib.

Next, the storage team should present 1, 3, or 5 LUNs, depending on whether your new disk group will use external, normal, or high redundancy. Most systems should be fine with normal redundancy, given the fact that most storage arrays use some sort of internal protection. We do not recommend using a diskgroup with external redundancy because doing so would not allow you to have multiple voting disks. Furthermore, you cannot store the voting disks by specifying multiple diskgroups with external redundancy. Also, you cannot use a setup of disk group +DG1 to store the first voting file, +DG2 to store the second, and +DG3 to store the third voting file. There can only ever be one diskgroup that contains all the voting disks for a cluster. When requesting the storage for the voting diskgroup, you can provide additional protection by requesting that this storage not come from the same shelf in the array. Last but not least, the individual LUNs presented to the cluster for the new diskgroup should be of equal size.

The newly acquired storage then needs to be discovered and partitioned. Each of the new partitions should be marked as an ASM disk, as in the following example:

[root@london1 ˜]# /etc/init.d/oracleasm createdisk OCRVOTE1 /dev/mapper/ocr1p1
[root@london1 ˜]# /etc/init.d/oracleasm createdisk OCRVOTE2 /dev/mapper/ocr2p1
[root@london1 ˜]# /etc/init.d/oracleasm createdisk OCRVOTE3 /dev/mapper/ocr3p1

Note

The actual device name will be different if you are not using device-mapper multipath. Your system administrator should be able to provide you with the device name to use.

In the preceding example, the administrator created three ASM disks: OCRVOTE1, OCRVOTE2, and OCRVOTE3. The device-mapper multipath configuration in the /etc/multipath.conf file maps three block devices to the names OCRVOTE1, OCRVOTE2, and OCRVOTE3 in the /dev/mapper directory. Once the ASM disks are created, execute the /etc/init.d/oracleasm scandisks command on all remaining nodes of the cluster.

Next, connect to the local ASM instance as SYSASM and create the new diskgroup, as follows:

SQL> create diskgroup OCRVOTE normal redundancy disk
 2   'ORCL:OCRVOTE1','ORCL:OCRVOTE2','ORCL:OCRVOTE3';

Diskgroup created

SQL> alter diskgroup OCRVOTE set attribute  'compatible.asm'='11.2';

Diskgroup altered.

Alternatively, you can use asmca to accomplish the same. You should start the diskgroup resource on all nodes of the cluster; the -srvctl start diskgroup -g OCRVOTE command will do this for you. The final step is to get the voting disks into ASM. An interesting fact is that, regardless of how many voting disks were used in the cluster before, Grid Infrastructure will always create the number of voting disks based on the ASM diskgroup redundancy. The final command, which does not require the Clusterware daemons to be shut down, is to move the voting disks into ASM. We recommend making a backup of all voting disks beforehand, as described in the "Maintaining Voting Disks" section. The following example illustrates how to move the voting disks:

[root@london1: ˜]# crsctl replace votedisk '+OCRVOTE'

You can check the location of the voting disks to ensure a successful execution of the command. If there were any problems during the execution, they will be reported on the command line and in the ASM instance's alert.log.

Maintaining Local and Cluster Registry

As noted in the "Storing Cluster Information with the Oracle Cluster Registry" section earlier in this chapter, voting disks and the OCR are located on shared storage. Your options are to use either ASM or a supporting cluster file system. Using block or raw devices is not recommended for Oracle 11.2. The fact that block and raw devices are deprecated should be a strong indicator of the future direction of the tool. During normal operation, you do not normally have to deal with the OLR and OCR because they work in the background, enabling the cluster to function correctly. The following sections will cover how to deal with OLR and OCR corruption, while the last section explains how to move the OCR into ASM.

Dealing With a Corrupt or Inaccessible OLR

If the OLR is corrupt or otherwise inaccessible, the OHAS daemon cannot start. The output on the server's console will show this message:

CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.

Enterprise Linux Enterprise Linux Server release 5.4 (Carthage)
Kernel 2.6.18-164.el5 on an x86-64

When logged in as root, you can check the OLR using the ocrcheck command line utility (note the -local flag):

[root@london1 ohasd]# ocrcheck -local
Status of Oracle Local Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2232
         Available space (kbytes) :     259888
         ID                       : 1022831156
         Device/File Name         : /u01/app/crs/cdata/london1.olr
                                    Device/File integrity check failed

         Local registry integrity check failed

         Logical corruption check bypassed

The "Device/File integrity check failed" message clearly indicates there is corruption in the OLR; the "Local Registry integrity check failed" message provides ample additional evidence of the same. In this case, there is no other option except to restore the OLR. Luckily, there is at least one backup, which was taken during the execution of the root.sh script when you installed or upgraded Grid Infrastructure.

By convention, the OLR backups are stored in $GRID_HOME/cdata/hostname/backup_date_time.olr, unless another backup location has been chosen. The OLR stores its own backup information, but that is not helpful in this case because it is corrupt. Don't let the message that no local backups are available trick you!

Find the most recent OLR backup and restore it in a runlevel other than 3 or 5, as in the following example. With the /etc/init.d/init.ohasd process started, it is not possible to perform the restore. You will get the following message if you try to do so anyway:

[root@london1 ohasd]# ocrconfig -local -restore 
/u01/app/crs/cdata/london1/backup_20091010_211006.olr
PROTL-19: Cannot proceed while the Oracle High Availability Service is running

Instead, you need to change to runlevel 2; you can do this using the init command:

[root@london1 ohasd] init 2

Now log back in and restore the OLR:

[root@london1 ˜]# ocrconfig -local -restore 
> /u01/app/crs/cdata/london1/backup_20091010_211006.olr

Each call to ocrconfig creates a log file in $GRID_HOME/log/hostname/client. This log file can provide additional information about the output of the command; it can also be helpful when troubleshooting problems. At this point, you should return to the previous run level, either 3 or 5. If you successfully restored the OLR, then you will see the following message:

CRS-4123: Oracle High Availability Services has been started.

If case the local registry is lost completely—if the file system is corrupt, for example—then the scenario plays out slightly differently. The error reported by ocrcheck looks like the following:

[root@london1 ohasd]# ocrcheck -local
PROTL-602: Failed to retrieve data from the local registry
PROCL-26: Error while accessing the physical storage Operating System error
  [No such file or directory] [2]

The "no such file or directory" message is what gives it away. In this case, you cannot restore the OLR as you did in the previous example. The ocrconfig command fails, but it does not give you a reason for the failure; to find that, you need to check the log file in $GRID_HOME/log/hostname/client/ocrconfig_pid.log:

[root@london1 client]# cat ocrconfig_11254.log
Oracle Database 11g Clusterware Release 11.2.0.1.0 - Production
Copyright 1996, 2009
Oracle. All rights reserved.
2010-06-25 09:21:24.866: [ OCRCONF][3905760784]ocrconfig starts...
2010-06-25 09:21:24.867: [  OCROSD][3905760784]utopen:6m':failed in stat OCR
file/disk /u01/app/crs/cdata/london1.olr, errno=2, os err
string=No such file or directory
2010-06-25 09:21:24.867: [  OCROSD][3905760784]utopen:7:failed to open
any OCR file/disk, errno=2,os err string=No such file or directory
[ default][3905760784]u_set_gbl_comp_error: OCR context was NULL
2010-06-25 09:21:24.867: [  OCRRAW][3905760784]phy_rec:1:could not
open OCR device
2010-06-25 09:21:24.867: [ OCRCONF][3905760784]Failed to restore OCR/OLR from
[/u01/app/crs/cdata/london1/backup_20100625_085111.olr]
2010-06-25 09:21:24.867: [ OCRCONF][3905760784]Exiting [status=failed]...

The "No such file or directory" message means that because there is no file in the OCR location, so it cannot be replaced. Now change to runlevel 2 before continuing; this prevents any Oracle processes from interfering with the restore operation. Before restoring the file, you need to touch it first and set the correct permissions, as in this example:

[root@london1 client]# touch /u01/app/crs/cdata/london1.olr
[root@london1 cdata]# chown oracle:oinstall london1.olr
[root@london1 ˜]# ocrconfig -local -restore 
> /u01/app/crs/cdata/london1/backup_20100625_085111.olr

If you installed the Grid Infrastructure software stack under a user other than oracle, then change the chown command in the preceding example to use the Grid software owner instead. Next, check the client log file again to see whether your attempt succeeded:

Oracle Database 11g Clusterware Release 11.2.0.1.0 - Production Copyright 1996,
2009 Oracle. All rights reserved.
2010-06-25 09:42:01.887: [ OCRCONF][2532662800]ocrconfig starts...
2010-06-25 09:42:03.671: [ OCRCONF][2532662800]Successfully restored OCR and set
  block 0
2010-06-25 09:42:03.671: [ OCRCONF][2532662800]Exiting [status=success]...

The OHAS daemon will then proceed to start the remaining daemons. After a few minutes, you should check the status:

[oracle@london1 ˜]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Dealing with a Corrupt or Inaccessible OCR

While a corrupt OLR has significant impact on the availability of an individual node, a corrupt OCR is far more severe in its effect on the cluster. As you learned in the "Troubleshooting" section earlier in the chapter, the OCR must be accessible for CRSD to start. Without a working CRS daemon, a node cannot join the cluster, which effectively breaks the cluster.

To honor the importance of the OCR, Oracle has always backed the OCR up on a regular basis. The ocrconfig -showbackup command lists the node where the automatic backup has been taken, as well as the time the backup occurred and the backup file name. For Oracle 11.1 and later, it is also possible to perform manual backups. We strongly recommend you to make a manual backup of the OCR using the ocrconfig -manualbackup command before modifying any cluster, such as by adding or deleting nodes or by performing other OCR-relevant maintenance operations.

If neither the OCR and nor any of its mirrors is accessible, then you need to restore the OCR to start the cluster. What steps you should take to do so depends partly on where your OCR is located. The OCR can either be in ASM, (the recommended approach) or outside of ASM on a cluster file system or block/raw devices.

If the OCR is in the ASM, the procedure to recover the OCR is very similar to the procedure for restoring the voting disks. In fact, if you have placed the OCR on the same ASM diskgroup as the voting disks and lost it, then you must restore the OCR before restoring the voting disks because the voting disk backups are in the OCR. Follow these steps to restore an OCR that is located in the ASM:

  1. Stop the Clusterware stack on all nodes using the following command: crsctl stop cluster -all -f.

  2. On an arbitrary node, start the Clusterware stack in exclusive mode using crsctl start crs -excl. If this does not work, and you get CRS-4640/CRS-4000 errors, then you can disable the automatic start of the Clusterware stack using crsctl disable crs. With this option disabled, reboot the node. When it comes back online, you should be able to successfully execute the crsctl start crs -excl command. Doing so will initiate the required background processes to start a local ASM instance, which is one of the last things you see before the start is completed.

  3. If the diskgroup containing the voting files was lost, then create a new one with exactly the same name (important!) and mount it if it is not mounted. Don't forget to set the ASM compatibility to 11.2 for the diskgroup. Also, if you are using ASMLib, then don't forget to execute /etc/init.d/oracleasm scandisks on the nodes where you did not create the disk. This makes those disks available to those nodes.

  4. Restore the OCR using ocrconfig -restore backup_file_name. Note how there is no option to specify a different destination, which is why it is important to recreate the diskgroup with the same name. Check the $GRID_HOME/log/hostname/client/ directory for the latest trace file. Any problems should be reported there. If you get a PROT-19 error ("Cannot proceed while the Cluster Ready Service is running"), then you need to stop the CRS daemon using the crsctl stop res ora.crsd -init command before attemping to restore the OCR again.

  5. Stop the partially started cluster stack using this command: crsctl stop crs.

  6. Now run this command to restart the cluster stack completely: crsctl start crs.

  7. Restart the cluster on the remaining nodes. If you disabled the automatic start of the Clusterware stack in Step 2, then re-enable it using crsctl enable crs.

If the OCR is not stored in ASM, then the steps to restore the OCR are similar. Again, you need to ensure that the OCR can be restored in the same location it existed before it was lost. You should touch the OCR file names and set the permissions correctly. The file name should be part of your installation documentation, or in an archived output from ocrcheck. For example, you could use these commands if your OCR was on /u03/oradata/grid/ocr{1,2,3}:

  • touch /u03/oradata/grid/ocr{1,2,3}

  • chown gridOwner:oinstall /u03/oradata/grid/ocr{1,2,3}

  • chmod 0640 /u03/oradata/grid/ocr{1,2,3}

Begin by following Steps 1 and 2 from the ASM scenario to stop the cluster. Next, follow these steps to complete the process of restoring the OCR:

  1. Recreate the block device or the cluster file system where the OCR was located if it was lost.

  2. Touch the OCR files and set permissions as just described.

  3. Run the ocrconfig -restore backup_file to restore the OLR.

  4. Bring the cluster up as described in Steps 5-7 of the ASM scenario.

Moving the OCR into ASM

Moving the OCR into ASM is simpler than the procedure for moving the voting disks into ASM. We recommend that you use at least one dedicated ASM diskgroup to store the shared Clusterware files. The following example moves the OCR to the OCRVOTE diskgroup:

[root@london1: bin]# ./ocrconfig -add +OCRVOTE
[root@london1 ˜]# ocrcheck
Status of Oracle Cluster Registry is as follows :
 Version                  :          3
 Total space (kbytes)     :     292924
 Used space (kbytes)      :       8096
 Available space (kbytes) :     284828
 ID                       :  209577144
 Device/File Name         : /dev/raw/raw1
 Device/File integrity check succeeded
 Device/File Name         : +OCRVOTE
 Device/File integrity check succeeded

 Device/File not configured

 Device/File not configured

 Device/File not configured

 Cluster registry integrity check succeeded

 Logical corruption check succeeded

Next, you need to remove the raw device (or whatever other storage method was used out of ASM):

[root@london1 ˜]# ocrconfig -delete /dev/raw/raw1
[root@racupgrade1 ˜]# ocrcheck

Status of Oracle Cluster Registry is as follows :
Version                  :          3
Total space (kbytes)     :     292924
Used space (kbytes)      :       8096
Available space (kbytes) :     284828
ID                       :  209577144
Device/File Name         :    +OCRVOTE
Device/File integrity check succeeded
Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

Unfortunately, it is not possible to have more than one copy of the OCR in the same diskgroup. The redundancy level of the diskgroup should provide protection from disk failure. Unlike with the voting disks, you are not limited to having all copies of the OCR in the same diskgroup, so you could choose to spread the OCR copies over other existing diskgroups.

Tip

For the sake of safety, you should also run the cluvfy comp ocr command to check the OCR consistency across all cluster nodes.

Summary

We have covered a lot of ground in the chapter. We began by examining the hardware and software requirements, and then moved on to cover the most important files in the cluster stack. Some of the files were already known to us from previous Oracle versions; however, the Grid Plug and Play profile and Oracle Local Registry are new in 11.2.

We also discussed the main Grid Infrastructure daemons and the correct startup sequence. The number of log files maintained by Grid Infrastructure has increased quite dramatically, thanks to the additional background processes. We also briefly touched on inter-resource dependencies.

Next, we looked at the main utilities used for maintaining the software stack. These are most likely the tools the administrator will have to work with when it comes to resolving problems. We tried to group these utilities together in a logical fashion in the relevant troubleshooting sections.

We also covered how Clusterware allows developers to define callout scripts that can be used to notify administrators when UP and DOWN events are published by the Fast Application Notification framework. For example, we mentioned that one potential use of these callouts would be to send an e-mail to the on-call DBA's pager or to raise a ticket.

Next, we explained how Grid Infrastructure also allows administrators to set up cost efficient active/passive clusters, protecting Oracle databases (or third party applications) with the requirement to run on only one node with a floating virtual IPs.

Another useful topic that we covered is Oracle Restart, which is an interesting new option that brings you a lot of the "feel" of Oracle RAC to a single instance. Resources such as ASM, the listener, and databases are protected by the Grid Infrastructure. Such resources will be automatically started (but not stopped at the time of this writing due to a bug) when the server reboots.

Next, we dealt with troubleshooting the startup sequence and Java utilities, then segued into a discussion of how to patch Grid Infrastructure.

The ability to adding and remove nodes to the cluster is one of the great features offered by Grid Infrastructure in both its current and previous iterations. We discussed how that process differs in current and other recent releases.

We also delved into advanced topic such as changing the SCAN post installation and using non-default listener ports, before concluding the chapter with by explaining how to maintain (and restore) voting disks and the OCR/OLR.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset