Chapter 6. Linux Installation and Configuration

This chapter describes the steps required to install and configure the Linux software operating system for the Oracle Database 11g Release 2 RAC. As mentioned in Chapter 1, the emphasis in this book is on the Oracle Enterprise Linux distribution because the installable CD and DVD images are freely available for download and fully redistributable without cost. This combination makes Oracle Enterprise Linux the most accessible Linux operating system release supported in a RAC configuration.

As noted in Chapter 1, the differences between Oracle Enterprise Linux and Red Hat Enterprise Linux are marginal; therefore, the installation and configuration details are all directly applicable to a Red Hat Installation. Oracle Enterprise Linux provides the option of installing an Oracle modified kernel instead of the default Red Hat Kernel, with additional bug fixes by Oracle. However, we recommend reviewing the release notes to observe how applicable these fixes are to your environment before moving from the default kernel common to both the Red Hat and Oracle Enterprise Linux releases. Note that this focus on Oracle Enterprise Linux is not intended to exhibit a preference for ease of installation, use, or suitability for Oracle over the alternative supported releases of SUSE Linux Enterprise Server or Asianux, each of which may be more applicable in a particular environment. For example, there might be better choices than Oracle Enterprise Linux in cases where one of the aforementioned Linux releases has been adopted across the data center for hosting non-Oracle software applications.

The installation process is covered for the standard hardware platforms supported by Oracle Enterprise Linux and the Oracle Database 11g Release 2 RAC —namely x86 andx86-64.. The installation steps described in this chapter should be performed identically on each node in the cluster.

Selecting the Right Linux Software

In Chapter 4, we discussed software availability on the Certification Matrices page of Oracle's web site in the context of hardware platforms. The certification site also details the supported Linux releases at the granularity of the selected architecture, as shown for example in Table 6-1.

Table 6.1. Supported Oracle Enterprise Linux Releases for x86 and x86-64

Operating System

Products

Certified With

Version

Status

Oracle Enterprise Linux 5

11gR2

Oracle Clusterware

11g

Certified

Oracle Enterprise Linux 4

11gR2

Oracle Clusterware

11g

Certified

In Oracle Database 11g, all versions of the supported Linux releases are based on the 2.6 kernel with the performance and scalability benefits that this brings. Thousands of users can be handled reliably, and larger amounts of memory can be supported. Up to 64GB of memory is supported on 32-bit, x86 systems; however, on Oracle Enterprise Linux, the Hugemem kernel is required for this support.

Note

The Hugemem kernel is not available on Oracle Enterprise Linux 5, and the standard SMP kernel on x86 supports up to 16GB of memory only.

On x86-64 from the Oracle Enterprise Linux 5.3 release and later, the certified maximum memory is 1TB, and the number of supported CPUs is 255.

In the first edition of this book, which focused on Oracle Database 10g, we detailed the installation and configuration of Red Hat Enterprise Linux 4 AS. For that reason, this edition focuses on the more recent supported version of Oracle Enterprise Linux Release 5. You should also pay particular attention to the requirements of your chosen hardware platform and its support for a particular processor or chipset.

Reviewing the Hardware Requirements

Oracle publishes a minimum set of hardware requirements for each server (see Table 6-2). Obviously, the minimum requirements must be met in order to install and configure a working Oracle database; and production environments should adhere to at least the recommended values. We strongly recommend that every node of the cluster have an identical hardware configuration, although this is not strictly mandatory, except for the CPU architecture type. The client display resolution requirements are for the system on which the Oracle Universal Installer (OUI) is displayed. You set these in the DISPLAY environment variable; therefore, the display resolution requirements do not apply to the cluster nodes themselves.

Table 6.2. Minimum and Recommended Server Requirements

Hardware

Minimum

Recommended

CPU

1 certified CPU per node

2 or more certified CPUs per node

CPU Architecture

Same on all nodes

--

Interconnect network

100Mb

Teamed 1Gb or 10Gb

External network

100Mb

1Gb or 10Gb

Backup network

100Mb

1Gb or 10Gb

HBA or NIC for SAN

1Gb HBA

Dual-pathed storage vendor certified HBA iSCSI or NAS

Memory

1.5GB per node

2GB per CPU

Swap space

Between equivalent to RAM and 1.5 times RAM up to 16GB dependent on

Equivalent to RAM configured RAM size

Grid software space

4.5GB

10GB

Database software space

4GB

10GB

Temporary disk space

1GB

 

Client Display Resolution

1024 × 786

--

Drilling Down on Networking Requirements

Before installing the Linux software on each node in the cluster, it is essential to fully plan and specify the network configuration and to prepare a DNS server. Depending on your requirements, you may also want a DHCP server. The planning and advance configuration requirements are determined by whether you choose to use the Oracle Grid Naming Service (GNS) or a manual IP configuration. If you have not yet selected an IP configuration scheme, then we strongly recommend that you review the concepts of GNS detailed in Chapter 2. You will need to make a choice at this point in time because the IP naming configuration required for both the network and on the operating system of the cluster nodes differs for each scheme.

Configuring a GNS or a Manual IP

Whether you're using GNS or a manual IP configuration, the availability of DNS is required to support the cluster configuration; we cover how to configure DNS to support each scheme later in this section. If using GNS, a DHCP server configuration is also required; we will also cover how to do this later in this section. As introduced in Chapter 2, GNS is an implementation of Apple's Bonjour software, the Zero Configuration Networking Standard, or Zeroconf. Avahi is similar free software that is available on Linux. However, the installation of any additional Zeroconf software is not required. GNS provides a Zeroconf service for the nodes in the cluster. It implements a system of multicast DNS to provide both the Virtual host names and IP addresses (VIP) dynamically in response to a changing cluster configuration. As a consequence, when using GNS, the requirements for the network configuration are to provide a DNS configuration and to delegate a subdomain within this configuration, so it can be managed by GNS. An additional requirement is to provide a DHCP server on the cluster subnet, from which GNS can allocate IP Addresses. The allocation of virtual names and addresses is managed by GNS when using Zeroconf, so it is important not to configure these names and addresses. Nor can you configure the Single Client Access Name (SCAN) statically, either in DNS or in the /etc/hosts file on the nodes in the cluster. DHCP is not a requirement for a manual IP configuration. However, DNS is a requirement for a manual IP configuration because it lets you resolve multiple IP addresses to the SCAN without using GNS. A manual IP configuration differs from GNS in that both the physical and virtual node names and IP addresses must be configured in both DNS and the /etc/hosts file on the cluster nodes; SCAN, on the other hand, should be configured in DNS only.

Whichever scheme you choose, we suggest that you follow a standards-driven logical naming convention that is easy to remember. If you're using a single public network, you should ensure that all public IP addresses and VIP addresses are unique within that network and located on the same subnet. If multiple public networks are required, then these may be configured and identified within the Oracle database after the Oracle software installation with the init.ora parameter, listener_networks. Note that Internet Protocol Version 6 (IPv6) is not supported with Oracle 11g Release 2 for either RAC or Clusterware. Therefore, the focus in this chapter is entirely on the standard IP addressing of IPv4. Also, IPv6 should be disabled on all hosts. Table 6-3 shows a sample network checklist for a two-node cluster with a GNS configuration.

Table 6.3. Sample GNS Network Configuration

Network Configuration

Example

DNS domain name

example.com

DNS host name

dns1.example.com

DNS server address

172.17.1.1

DHCP server address

172.17.1.1

DHCP address range

172.17.1.201 to 220

GNS sub domain

grid1.example.com

GNS host name

cluster1-gns.grid1.example.com

GNS VIP address

172.17.1.200

Cluster name

cluster1

SCAN name

cluster1-scan.grid1.example.com

SCAN addresses

Assigned by DHCP

Public Network

172.17.1.0

Public Gateway

172.17.1.254

Public Broadcast

172.17.255.255

Public Subnet Mask

255.255.0.0

Public Node names

london1, london2

Public IP addresses

172.17.1.101, 172.17.1.102

VIP names

Automatically assigned

VIP addresses

Assigned by DHCP

Private Network

192.168.1.0

Private Subnet Mask

255.255.255.0

Private Nodes names

london1-priv, london2-priv

Private IP addresses

192.168.1.1, 192.168.1.2

IPMI addresses

Assigned by DHCP

In a similar vein, Table 6-4 shows a sample network checklist for a two-node cluster with a manual IP configuration.

Table 6.4. Sample Manual Network Configuration

Network Configuration

Example

DNS domain name

example.com

DNS host name

dns1.example.com

DNS server address

172.17.1.1

DHCP server address

Not required

DHCP address range

Not required

GNS sub domain

Not required

GNS host name

Not required

GNS VIP address

Not required

Cluster name

cluster1

SCN name

cluster1-scan.example.com

SCAN addresses

172.17.1.205, 172.17.1.206, 172.17.1.207

Public Network

172.17.1.0

Public Gateway

172.17.1.254

Public Broadcast

172.17.255.255

Public Subnet Mask

255.255.0.0

Public Node names

london1, london2

Public IP addresses

172.17.1.101, 172.17.1.102

VIP names

london1-vip, london2-vip

VIP addresses

172.17.1.201, 172.17.1.202

Private Network

192.168.1.0

Private Subnet Mask

255.255.255.0

Private Nodes names

london1-priv, london2-priv

Private IP addresses

192.168.1.1, 192.168.1.2

IPMI addresses

172.17.1.10, 172.17.1.20

Configuring DNS and DHCP

As noted when we covered the networking requirements, a DNS server is a mandatory requirement, whether you're using Oracle GNS or a manual network configuration. Configuration of a fully redundant enterprise class DNS system is beyond the scope of this book, so we recommend the book Pro DNS and Bind 10 (Apress, 2010) by Ron Aitchison if you are contemplating the installation of a full DNS configuration. But given that DNS is mandatory for proceeding with the installation of Oracle 11g Release 2 RAC, we deem it necessary to detail the Linux configuration of an authoritative-only Name Server. Such a server provides the minimum DNS configuration for the successfully installing and operating Oracle 11g Release 2 RAC in a standalone manner. Pro DNS and Bind provides all the information required for incorporation into a more extensive DNS System.

Before installing the DNS software, you should be aware that the DNS server should be a separate server from any of the nodes of the Oracle RAC cluster. This helps ensure that node names and IP addresses can still be resolved, irrespective of the availability of any individual node. Advanced configurations will also provide for the high availability of the DNS software. The separate server may be a physical server or a virtual server configured under Oracle VM, and the Linux software installation will either be based on an Oracle VM template or follow the approach described later in this chapter. Note that the latter approach will have no additional requirements for configuring the Oracle user, preparing for the installation of the Oracle software, or configuring the external storage. As explained in Chapter 5, an Oracle VM-based installation can provide the additional benefit of letting you base the operating system for the Name Server in a high availability environment.

In the following example, we have configured an Oracle VM-based virtual server as the DNS server dns1.example.com with the IP address 172.17.1.1. On the DNS Server, networking is enabled by default in the file /etc/sysconfig/network, and the hostname is given as the full host and domain name, as in this example:

[root@dns1 sysconfig]# cat network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=dns1.example.com

The hostname and IP address of the DNS server is also configured in the /etc/hosts file:

[root@dns1 sysconfig]# more /etc/hosts
127.0.0.1       localhost.localdomain   localhost
172.17.1.1      dns1.example.com dns1

By default, the DNS Server software will not have been installed on the Linux operating system for the installation detailed in this chapter or for an Oracle VM template. Our focus here is on the standard DNS software deployed on Linux, Berkeley Internet Name Domain (BIND). BIND is installed as an RPM package from the install media, as in this example:

[root@dns1 ˜]# rpm -ivh bind-9.3.4-10.P1.el5.x86_64.rpm
Preparing...                ########################################### [100%]
   1:bind                   ########################################### [100%]

After installation, the named service is installed, but not yet running, and you will get the following status message:

[root@dns1 ˜]# service named status
rndc: connect failed: 127.0.0.1#953: connection refused
named is stopped

Before starting the service, it is necessary to provide the DNS configuration for your domain. The configuration provided is for an authoritative-only Name Server for the example.com domain, which can be used in a standalone environment. The top level of the configuration is set in the file /etc/named.conf, and this file specifies the names of the four required zone files for the forward and reverse look-ups of the example.com and localhost domains. The following example shows the /etc/named.conf file:

[root@dns1 ˜]# cat /etc/named.conf
options
{
directory "/var/named";
};
zone "example.com" {
type master;
file "master.example.com";
        };
zone "localhost" {
type master;
file "master.localhost";
        };
zone  "1.17.172.in-addr.arpa" {
type master;
file  "172.17.1.rev";
};
zone "0.0.127.in-addr.arpa" {
type master;
file "localhost.rev";
};

Forward lookups for the example.com domain are configured in the file master.example.com, which is typically located in the directory defined in /etc/named.comf. In this example, it is called /var/named. This file details the mapping of names to IP addresses for the DNS name server itself, as well as the fixed IP address for the cluster hosts. However, additional lines can be added to resolve names to addresses for additional hosts in the domain. Also, in this case, the file provides the configuration for a subdomain to be allocated to and managed by Oracle GNS. The name server for the subdomain can be named however you wish. However, you must configure the glue record to correspond to the IP address on which the GNS service will be running after the Grid Infrastructure software installation. In this iteration of the master.example.com file, the GNS server IP Address is 172.17.1.200:

[root@dns1 named]# more master.example.com
$TTL    86400
@               IN SOA                                dns1.example.com. root.localhost (
2010063000         ; serial
28800                    ; refresh
14400                    ; retry
3600000                                ; expiry
86400 )                  ; minimum
@               IN NS                  dns1.example.com.
localhost       IN A            127.0.0.1
dns1            IN A            172.17.1.1
london1         IN A            172.17.1.101
london2         IN A            172.17.1.102
$ORIGIN grid1.example.com.
@               IN NS           cluster1-gns.grid1.example.com.
                IN NS           dns1.example.com.
cluster1-gns         IN A                         172.17.1.200; glue record

For a manual IP configuration, the GNS subdomain configuration is not provided. Instead, the name to address mappings for the Public VIP addresses are required, as well as from one to three IP addresses for the SCAN name. The multiple IP addresses are returned in a round-robin manner to lookups. Note that in a manual configuration, the Public and Private IP addresses, as well as the Public VIP addresses, must also be included in the /etc/hosts file on the cluster hosts. However, the SCAN name should only be included in the DNS configuration; the master.example.com file is configured as follows:

$TTL    86400
@               IN SOA          dns1.example.com.    root.localhost (
                                2010063000         ; serial
                                28800              ; refresh
                                14400              ; retry
                                3600000            ; expiry
                                86400 )            ; minimum
@               IN NS           dns1.example.com.
localhost       IN A            127.0.0.1
dns1            IN A            172.17.1.1
london1         IN A            172.17.1.101
london2         IN A            172.17.1.102
london1-vip     IN A            172.17.1.201
london2-vip     IN A            172.17.1.202
cluster1-scan   IN A            172.17.1.205
                IN A            172.17.1.206
                IN A            172.17.1.207

If you are planning to perform a typical Grid Infrastructure software installation, only a manual IP configuration is available. In this case, the name of the cluster is the SCAN name, minus the domain name extension. For this reason, the SCAN name must consist of alphanumeric characters and hyphens, not be longer than 15 characters in length, and be resolved from within the DNS domain. In this example, the SCAN name for a typical installation would instead be cluster1 and fully resolved as cluster1.example.com. This restriction does not apply to an advanced installation, where the SCAN name may exceed 15 characters in length such as used here cluster1-scan. For a manual IP configuration the name is fully resolved as cluster1-scan.example.com and for GNS as cluster1-scan.grid1.example.com.

Reverse lookups are configured in the file 172.17.1.rev to provide the corresponding mappings from IP address to domain names. If you're using a manual configuration, the additional Public VIP and Manual IP addresses should also be included, as shown in the following example:

[root@dns1 named]# cat 172.17.1.rev
$TTL    86400
@                IN SOA      dns1.example.com. root.localhost.  (
                             2010063000      ; serial
                             28800           ; refresh
                             14400           ; retry
                             3600000         ; expiry
                             86400 )         ; minimum
@                IN NS       dns1.example.com.
1                IN PTR      dns1.example.com.
101              IN PTR      london1.example.com.
102              IN PTR      london2.example.com.
201              IN PTR      london1-vip.example.com.
202              IN PTR      london2-vip.example.com.

A similar configuration file is required for the local domain in the master.localhost file:

[root@dns1 named]# cat master.localhost
$TTL    86400
@               IN SOA  @       root (
                                2010063000         ; serial
                                28800              ; refresh
                                14400              ; retry
                                3600000            ; expiry
                                86400 )            ; minimum
                IN NS           @
                IN A            127.0.0.1

You must also have a zone file, such as the following example of localhost.rev, that details the reverse lookups for the local domain:

[root@dns1 named]# cat localhost.rev
$TTL    86400
@               IN SOA           localhost. root.localhost.  (
                                 2010063000      ; serial
                                 28800           ; refresh
                                 14400           ; retry
                                 3600000         ; expiry
                                 86400 )         ; minimum
                IN NS            localhost.
1               IN PTR           localhost.

To direct lookups for the correct domain and Name Server, the file /etc/resolv.conf should be configured both on the Name Server and on all hosts that require name and IP addresses to be resolved by DNS. In cases where GNS will be configured, the search path should also include the GNS subdomain, as shown here:

[root@dns1 named]# cat /etc/resolv.conf
search example.com grid1.example.com
nameserver 172.17.1.1
options attempts: 2
options timeout: 1

If the iptables service is enabled, external hosts will not be able to connect to the DNS server. Therefore, you must either update the rules to permit access to the default port of 53 or stop the iptables service, as shown in this example (which approach you take will depend on your security requirements):

[root@dns1 ˜]# chkconfig iptables off
[root@dns1 ˜]# service iptables stop
Flushing firewall rules:                                   [  OK  ]
Setting chains to policy ACCEPT: filter                    [  OK  ]
Unloading iptables modules:                                [  OK  ]

Finally, the named service can be started; this will allow it to begin accepting name and IP address resolution requests for the configured domain:

[root@dns1 named]# service named start
Starting named:                                           [  OK  ]

The configuration should be verified for forward and reverse name lookups on the DNS server host, as well as for external hosts on the network within the subnet of the configured domain. The following example shows a forward name lookup:

[root@dns1 named]# nslookup london1
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:   london1.example.com
Address: 172.17.1.101

And the next example shows a reverse name lookup:

[root@dns1 named]# nslookup 172.17.1.101
Server:         127.0.0.1
Address:        127.0.0.1#53

101.1.17.172.in-addr.arpa       name = london1.example.com.

If you're planning to use a manual IP configuration, the SCAN names will resolve to the multiple allocated IP addresses at this point. For example, using the ping command to check connectivity will use each given SCAN IP address in turn. However, if using GNS, queries for the SCAN name within the subdomain are forwarded to the GNS service. For this reason, queries for names within the subdomain will only be successful after the Grid Infrastructure software has been installed. Thus we recommend verifying your DNS configuration after the installing the Grid Infrastructure software to ensure that your subdomain delegation has been successfully configured and that GNS is responding to queries, as in this example:

[root@london1 ˜]# nslookup cluster1-scan
Server:         172.17.1.1
Address:        172.17.1.1#53

Non-authoritative answer:
Name:   cluster1-scan.grid1.example.com
Address: 172.17.1.207
Name:   cluster1-scan.grid1.example.com
Address: 172.17.1.208
Name:   cluster1-scan.grid1.example.com
Address: 172.17.1.209

The preceding example reiterates the point that all names within the subdomain grid1.example.com are resolved by GNS. It also illustrates that all names within the domain example.com are resolved by DNS. If GNS is not configured, then there is no subdomain in the configuration.

You may have noted that, in the manual IP configuration, the Public VIP names and SCAN names are allocated to fixed IP addresses under the DNS configuration. Note these IP addresses have not previously been defined if you're using a GNS configuration. It is for this reason that a DHCP server is not required for a manual IP configuration; however, a GNS configuration must provide the VIP and SCAN IP addresses dynamically. Host names are also configured within the subdomain by GNS, and these host names correspond to the allocated DHCP addresses automatically. The DHCP server may be on a separate server or on the same host as the DNS server. In any event, the DHCP server should not be configured on a cluster node. If the DHCP service is not already available, it may be installed as an RPM package from the install media. The DHCP configuration is detailed in the configuration file /etc/dhcp.conf, along with the desired IP address range, as in this example:

[root@dns1 ˜]# cat /etc/dhcpd.conf
ddns-update-style interim;
ignore client-updates;
        subnet 172.17.0.0 netmask 255.255.0.0 {
        range                           172.17.1.201 172.17.1.220;
        option routers                  172.17.1.254;
        option subnet-mask              255.255.0.0;
        option domain-name              "example.com";
        option domain-name-servers      172.17.1.1;
        }

The following snippet starts the DHCP service, so it can make the configured IP address range available for lease:

[root@dns1 ˜]# service dhcpd start
Starting dhcpd:                                            [  OK  ]

After the Grid Infrastructure software is installed, you can verify the utilization of the DHCP service and the allocation of the IP addresses in the file /var/lib/dhcpd/dhcpd.leases:

[root@dns1 /]# cat ./var/lib/dhcpd/dhcpd.leases
...
lease 172.17.1.210 {
  starts 6 2005/06/11 04:33:14;
  ends 6 2005/06/11 16:33:14;
  tstp 6 2005/06/11 16:33:14;
  binding state free;
  hardware ethernet 00:00:00:00:00:00;
  uid "00london1-vip";
}
lease 172.17.1.208 {
  starts 6 2005/06/11 04:33:29;
  ends 6 2005/06/11 16:33:29;
  binding state active;
  next binding state free;
  hardware ethernet 00:00:00:00:00:00;
  uid "00cluster1-scan2-vip";
}
...

Within your leases file, you can also view the automatically assigned GNS host names. With your DNS and DHCP (assuming you're using GNS) services available, you're ready to install and configure the Linux operating system on the cluster nodes themselves.

Downloading the Linux Software

As discussed in the introduction to Chapter 1, Oracle Enterprise Linux can be downloaded and redistributed without cost from http://edelivery.oracle.com/linux. The registration page requires that you enter your name, company name, e-mail address, and country. You must also accept both the agreement terms and export restrictions. After the registration page, the media pack search page enables the selection of a product pack. These options include Enterprise Linux, Oracle VM, and Oracle VM Templates. You can learn more about the installation and configuration of Oracle VM and Oracle VM Templates in Chapter 5. Under Enterprise Linux, the platform menu specifies IA64 for Itanium systems and x86 32-bit and x86 64-bit for x86 and x86-64, respectively.

The timing of releases may be different for the different architectures. At the time of writing, Oracle 11g Release 2 is available for the x86 and x86-64 architectures only. Select Go for the chosen architecture, and then select an operating system release under that architecture. You will note that the Oracle database is certified to the major operating system release. However, you do have the option of selecting a number of Update releases that introduce more recent kernel versions, as well as a number of incremental updates and upgrades beyond the previous operating versions.

The precise details of the changes introduced in each release are detailed in the Oracle Enterprise Linux Release Notes. These are available at Oracle's Free and Open Source Software site at the Oracle Unbreakable Linux Support page (http://oss.oracle.com/oracle-on-linux.html). You can find these notes beneath the Key Resources heading, to the right of the bulleted item that reads View source code. For example, all of the Oracle Enterprise Linux Release 5 releases notes are available at this location: http://oss.oracle.com/el5/docs/. Reviewing these release notes will enable you to select the optimal point release for your requirements. It will also help you understand the differences between the Oracle Enterprise Linux release and the upstream Red Hat Enterprise Linux release on which the Oracle Enterprise Linux Release is based. Selecting the appropriate heading for a particular release presents a number of CD ISO images and a single DVD image for download. You must download either all of the CD images or the single DVD image. It is also possible to construct the DVD image from the CD images using a script called mkdvdiso.sh, which is available from http://mirror.centos.org/centos/build/mkdvdiso.sh. In other words, you don't need to download both the CD and DVD images, if both are required. Similarly, a full installation doesn't require that you download the source code for both the CD or DVD images. If you wish to either install from alternative media or run the Oracle Validated RPM as discussed later in this chapter, then we recommend downloading or constructing the DVD image for the most flexibility during installation and configuration. If you wish to install from CDs, then you need to follow a couple steps. First, unzip the CD images using an unzip utility (you can find such a handful of utilities at http://updates.oracle.com/unzips/unzips.html). Second, you need to use a DVD or CD writer to burn the correct image or images to DVD-ROM or CD-ROM. To do this, you will need to use a software utility that is capable of writing ISO files, such as cdrecord.

Preparing for a Network Install

To achieve a more rapid installation of a larger number of nodes, you may wish to install the software over a network containing the installation media. Additionally, we recommend using a network installation to configure an environment that makes running the Oracle Validated RPM as straightforward as possible. You'll learn how to do this later in this chapter.

To prepare for a network installation, it is still necessary to create either a bootable CD-ROM or USB flash drive to initiate the installation. A bootable USB flash drive can be created by copying diskboot.img from the /images directory on the first CD-ROM of the install set. Doing so ensures that the destination drive specified is the USB flash drive. For example, the following listing shows a system where the first CD is mounted on /media/cdrom and the flash disk is mounted on /media/usbdisk as device /dev/sdb1:

[root@londonmgr1 media]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda2              72G   44G   25G  65% /
/dev/hda1             190M   46M  135M  26% /boot
none                  252M     0  252M   0% /dev/shm
/dev/hdc              622M  622M     0 100% /media/cdrom
/dev/sdb1             964M  159M  805M  17% /media/usbdisk

Now note the device name of the USB disk, which you can use to unmount the file system:

[root@londonmgr1 ˜]# umount /media/usbdisk/

From the /images directory of the CD-ROM, use the dd command to copy the diskboot.img file to the USB device. The following example shows an invocation of dd:

[root@londonmgr1 images]# ls
boot.iso      minstg2.img  README      TRANS.TBL
diskboot.img  pxeboot      stage2.img  xen
[root@londonmgr1 images]# dd if=diskboot.img of=/dev/sdb
24576+0 records in
24576+0 records out
12582912 bytes (13 MB) copied, 3.02538 seconds, 4.2 MB/s

Note that you must use the full device name. In the preceding example, /dev/sdb must be used and not a partition name, such as /dev/sdb1. Otherwise, the master boot record (MBR) will not be correctly written. If this is the case, the USB disk will not be bootable, and the system BIOS will report a message such as Missing Operating System when a system boot from the USB device is attempted. If the USB device has been correctly written, then by default it will run the Anaconda installer in the same mode that is specified when the argument linux askmethod is passed to the boot: installer prompt.

In Text mode, the installer will ask you to choose a language and keyboard type. Next, it will ask you to choose an installation method from the following options: Local CDROM, Hard Drive, NFS Image, FTP, or HTTP. If you're using a USB disk to boot, pressing Next at the first graphical installer screen will raise a warning like this one:

/dev/sdf currently has a loop partition layout.

This warning relates to the USB disk itself. Therefore, the option Ignore drive should be selected to prevent this drive being formatted. Alternatively, you can specify the nousbstorage option at the boot prompt to prevent this error from being raised, as in this example:

linux askmethod nousbstorage

You can use a similar approach to implement a bootable installer CD. Do so by creating an ISO image of the full isolinux/ directory from the first CD-ROM using the mkisofs utility. Next, write this image to a CD with a suitable utility, as explained previously.

If a server has been previously installed with Linux, it is also possible to locate the install image on the hard disk of that server and configure the GRUB bootloader to boot the installation image. This enables a managed upgrade of a remote server without using an external USB device or CD-ROM.

After booting the Anaconda installer, it is necessary to host the installation media with one of the methods detailed previously. We will use FTP as an efficient method for transferring files. This approach also makes it straightforward to configure an FTP daemon (such as VSFTP) to present either multiple CD disk images or a single DVD image from a remote server, as in the following example:

[root@ftp1 OEL5U4-x86-64]# ls
Enterprise-R5-U4-Server-x86_64-disc1.iso  mkdvdiso.sh
Enterprise-R5-U4-Server-x86_64-disc2.iso  V17795-01.zip
Enterprise-R5-U4-Server-x86_64-disc3.iso  V17796-01.zip
Enterprise-R5-U4-Server-x86_64-disc4.iso  V17797-01.zip
Enterprise-R5-U4-Server-x86_64-disc5.iso  V17798-01.zip
Enterprise-R5-U4-Server-x86_64-disc6.iso  V17799-01.zip
Enterprise-R5-U4-Server-x86_64-dvd.iso    V17800-01.zip
[root@ftp1 OEL5U4-x86-64]# mkdir -p /var/ftp/pub/el5_4_x86_64
[root@ftp1 OEL5U4-x86-64]# mount -o loop Enterprise-R5-U4-Server-x86_64-dvd.iso
/var/ftp/pub/el5_4_x86_64

You will be able to view the file listing under the /var/ftp/pub/el5_4_x86_64 directory. Next, it is necessary to configure and start the FTP server. If you're using the VSFTP daemon, the most direct way to do this is to edit the file /etc/vsftpd/vsftpd.conf and enable anonymous FTP with the standard /var/ftp directory as the root directory for anonymous login:

# Allow anonymous FTP? (Beware - allowed by default if you comment this out).
anonymous_enable=YES
anon_root=/var/ftp
#

There are a number of optional parameters in the vsftpd.conf file that you may also consider using. For example, by default the xferlog_enable parameter records the details of all FTP Server activity in the file /var/log/xferlog. If the local_enable option is set to YES, then you will also be able to locally test the availability of the installation media. Once you have set and reviewed the available options, you can then start the vsftpd service, as shown in this example:

[root@ftp1 ˜]# service vsftpd start
Starting vsftpd for vsftpd:                                [  OK  ]

You can also verify the configuration of the settings with a local FTP login:

[root@ftp1 ˜]# ftp localhost
Connected to ftp1.
220 (vsFTPd 2.0.1)
530 Please login with USER and PASS.
530 Please login with USER and PASS.
KERBEROS_V4 rejected as an authentication type
Name (localhost:root): anonymous
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> ls pub/el5_4_x86_64
227 Entering Passive Mode (127,0,0,1,34,41)
150 Here comes the directory listing.
drwxr-xr-x    3 0        0            2048 Sep 16 14:24 Cluster
drwxr-xr-x    3 0        0            4096 Sep 16 14:24 ClusterStorage
-rw-r--r--    1 0        0            7037 Sep 08 22:54 EULA
-rw-r--r--    1 0        0           18390 Sep 08 22:54 GPL
-rw-r--r--    1 0        0            3957 Sep 08 22:54 README-en
-rw-r--r--    1 0        0            8394 Sep 08 22:54 README-en.html
-rw-r--r--    1 0        0           14639 Sep 08 22:54 RELEASE-NOTES-en
-rw-r--r--    1 0        0           36477 Sep 08 22:54 RELEASE-NOTES-en.html
-rw-r--r--    1 0        0            1397 Sep 08 22:54 RPM-GPG-KEY
-rw-r--r--    1 0        0            1397 Sep 08 22:54 RPM-GPG-KEY-oracle
drwxr-xr-x    4 0        0          489472 Sep 16 14:24 Server
-r--r--r--    1 0        0            4215 Sep 16 14:24 TRANS.TBL
drwxr-xr-x    3 0        0           10240 Sep 16 14:24 VT
-rw-r--r--    1 0        0            5165 Sep 08 22:54 blafdoc.css
-rw-r--r--    1 0        0            7037 Sep 08 22:54 eula.en_US
-rw-r--r--    1 0        0            3334 Sep 08 22:54 eula.py
drwxr-xr-x    4 0        0            2048 Sep 16 14:24 images
drwxr-xr-x    2 0        0            2048 Sep 16 14:24 isolinux
-rw-r--r--    1 0        0             105 Sep 08 22:54 supportinfo
226 Directory send OK.
ftp>

You can now boot from the install USB disk or CD and specify the FTP server as the location of the installation media. When using this installation method in Text mode, you are asked to select a preferred network interface to configure and to provide the network details necessary to configure the chosen interface on the server to be installed. If you do not have DHCP available, then select Enable IPv4 support and Manual Configuration. Next, enter your IP address, subnet mask, Gateway, and DNS Server addresses. The installer then requests the name or IP address of the FTP Server and the Enterprise Linux directory, which is /pub/el5_4_x86_64 in this case. The installer then continues in a graphical mode, unless the linux text or linux text askmethod arguments were given to the boot prompt. The installation process can then proceed according to the requirements dictated by an installation that relies on a CD-ROM, which we will cover later in this section. This approach also permits simultaneous installations of multiple nodes in the cluster—without requiring a change of installation media.

The configuration of the FTP server can also now serve as the location for your local YUM repository for running the Oracle Validated RPM after installation; you will learn more about this later in this chapter.

Another installation alternative relies on a combination of the Preboot Execution Environment (PXE) (which must be supported by the BIOS of the target server) and DHCP and TFTP servers. This combination enables the loading of the boot media across the network, but without requiring a bootable USB flash drive or CD-ROM. You can combine this approach with the Linux Kickstart utility to provide the Linux configuration information. It then becomes possible to complete an unattended installation of Oracle Enterprise Linux across the network. Kickstart can also be configured to provide default installation options for the installation methods described previously.

Your preferred installation method will depend on the number and frequency of installations you are required to perform, as well as the location of the servers to be installed. A single installation of two to four nodes in a local environment from CD-ROM or DVD will require a minimal amount of additional configuration. For example, an installation of four or more nodes, an upgrade of servers in a remote location, or a moderate frequency of installations will warrant some additional configuration for an installation method such as FTP. It will also bring the benefit of a local YUM repository. Finally, if you need to install hundreds of nodes or you need to reinstall frequently in a facility such as a training environment, then it will be well worth your time to evaluate PXE boot and Kickstart.

Installing Oracle Enterprise Linux 5

This section describes installation of Oracle Enterprise Linux version 5, and it is applicable to systems based on x86 and x86-64. Oracle Enterprise Linux version 5 is functionally equivalent to the Red Hat Enterprise Linux Advanced Platform, the full-featured release of Red Hat Enterprise Linux.

As previously discussed, the operating system software can be supplied by a number of methods, including CD-ROM, NFS, FTP, and HTTP. We will explain how to install this OS from a CD-ROM; regardless, the recommendations covered in this section are also directly applicable to the other installation methods.

Starting the Installation

To start a CD-ROM–based installation on an x86 architecture server, boot the server with the first CD-ROM of the installation set. It may be necessary to configure the BIOS to select the Boot option from CD-ROM option. The BIOS can usually be accessed when booting the server by pressing a key combination; the specific key combination depends on the BIOS vendor. Instead of a BIOS layer, more recent systems use the UEFI (Unified Extensible Firmware Interface UEFI) as the interface between the operating system and the platform firmware. EFI resides permanently in NVRAM, enabling access to the boot interface without any additional external software. EFI systems also provide the option to select the boot device from an interactive menu.

After a successful boot from CD-ROM, select the option presented by the first prompt:

To install or upgrade in graphical mode, Press the <ENTER> key.
Press the Enter key to run the Anaconda installer.

Installation Media Check

Before you begin an Oracle Enterprise Linux CD-ROM–based installation session, you are prompted to verify the installation media. We recommend that you test the media before using it for the first time. This can help you avoid unnecessary failures later in the installation process. You can test each individual CD-ROM from the installation set by selecting OK. If the media has already been verified, you can choose Skip to start the Anaconda installer.

Anaconda Installation

The Welcome page shows an Oracle Enterprise Linux splash screen and gives you the opportunity to review the release notes. When you are ready, click Next to continue.

Next, choose the language to use for the installation process and click Next. The language specified here is used during the installation process; a different language can be specified when the operating system is installed onto the system.

Now choose an appropriate keyboard and click Next. This selection can be modified once the system is operational through the System Settings menu.

Install or Upgrade

The installer searches for the existence of any previous installations of Oracle Enterprise Linux. If there has been an installation of Oracle Enterprise Linux on the system, such as version 4, the installer detects the installation and displays the option to upgrade. We recommend always doing a fresh install to achieve the most consistent installation base across all the nodes in the cluster. Choose Install Enterprise Linux and click Next. If no previous installation is detected or the Linux operating system is not an Oracle Enterprise Linux, then this page will not be displayed, and the installation will proceed to the Disk Partitioning Setup page.

If the partition table on any of the devices is unreadable, then no installation has previously taken place. Instead of the Upgrade page, you will see a warning dialog that displays the option to initialize the unreadable drives. You should do this only for drives on which you do not wish to preserve the data. For example, ASM disks previously configured on other nodes in the cluster should not be initialized. This is particularly true in cases where a node is being added to an existing cluster.

Disk Partitioning

Next, we will explain how to install an Oracle Enterprise Linux 5 system suitable for Oracle Database 11g Release 2 RAC. To ensure consistency across the cluster, we recommend accepting the default partitioning scheme.

Creating a Default Partitioning Scheme

To create a default partitioning scheme, begin by selecting the option from the dropdown menu to remove Linux partitions on selected drives. Next, create a default layout to ensure that only the internal system drives are selected. For example, you don't want to select external drives that may be shared with the other nodes in the cluster. Also, select the checkbox to review and modify the partitioning layout. You should now see something like what is shown in Figure 6-1. Click Next when you have everything right.

The Linux partitioning selection

Figure 6.1. The Linux partitioning selection

A warning dialog will be displayed, indicating that proceeding will remove all partitions and data on the selected drives. Click Yes to proceed. Next, you will see the disk setup page, which includes options for partitioning (see Figure 6-2). The top pane displays the drives available and their corresponding sizes; the bottom pane shows the default partitioning scheme.

The disk setup

Figure 6.2. The disk setup

The default partitioning scheme illustrates how Oracle Enterprise Linux creates physical partitions and uses the Logical Volume Manager (LVM) to create a single Logical Volume Group named VolGroup00. This group is concatenated from the selected drives and their partitions. Under these partitions, the two logical volumes LogVol00 and LogVol01 are created for the / (root) partition and swap space, respectively. The first available drive also includes a /boot partition; this partition cannot reside on a Logical Volume because it cannot be read by the GRUB boot loader. The following snippet shows the result of a default partitioning scheme after installation:

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      886G  2.4G  838G   1% /
/dev/sda1              99M   13M   82M  13% /boot
tmpfs                 7.9G     0  7.9G   0% /dev/shm

Warning

There is no redundancy protection in the default configuration. Thus the failure of a single drive will result in the loss of the / (root) and /boot partitions.

We recommend that you protect your server disk drives by using a form of onboard RAID storage (see Chapter 4 for more information on RAID). If onboard RAID is not available, you should consider the options for manually configuring both RAID and the Logical Volume Manager during the disk partitioning stage, as explained later in this section.

Creating a Partition Manually

You might wish to customize the default partitioning scheme or to implement RAID and LVM manually. If so, then you have a number of options for proceeding. You also face some restrictions related to partitioning disk devices suitable for a node of Oracle Database 11g Release 2 RAC. As with the default configuration on systems based on Oracle Enterprise Linux 5 on x86 and x86-64, a minimum partitioning scheme without LVM must include at least the root partition (/) and swap partition (swap). If LVM is used, then the /boot partition must not be created under LVM, for the reason given in the previous section.

The choice of partitioning formats is dependent on whether the interface between the hardware and operating system is BIOS- or EFI-based; you will learn about your partitioning options in the upcoming sections.

Creating an MBR Partition

The most commonly used partitioning format on BIOS-based systems is the MS-DOS–based master boot record (MBR) format. Typically found on x86 and x86-64 systems, a BIOS-based MBR format allows up to four primary partitions to be created on each disk. One of these primary partitions can be extended and subdivided into logical partitions. The logical partitions are chained together. Theoretically, this means the maximum number of logical partitions is unlimited. However, for SCSI disks, Linux limits the total number of partitions to 15. A 2TB limit is also enforced within the MBR formatting scheme. And because the fdisk command only supports MBR formatting, fdisk also has a limit of 2TB. The MBR format stores one copy of the partition table. It also stores some system data in hidden or unpartitioned sectors, which is an important factor when trying to ensure that disk partitions are correctly aligned with external storage arrays. You'll learn more about this topic later in this chapter.

Note

In Oracle Enterprise Linux 5, the Anaconda installer on x86 and x86-64 systems only supports MBR formatting. Therefore, it is not possible to create partitions larger than 2TB during installation. To create partitions of greater than 2TB, you need to use the GUID partition table (GPT) format, which you'll learn more about later in this section. Anaconda only supports the GPT format on the Itanium architecture.

Because Oracle Enterprise Linux 5 is based upon the 2.6 Linux kernel, SCSI commands support 64-bit block addresses. This means they can support individual devices beyond 2TB in size. However, the Host Bus Adapter (HBA), device driver, and the storage itself must all also support 64-bit block addressing before you can use it. If any of these components are limited to 32-bit block addresses, then a single device will also be limited to 2TB in size, regardless of the formatting scheme. Potential device and partitioning limitations aside, both the default ext3 file system and the OCFS2 file system support a maximum file system size of 16TB and a maximum individual file size of 2TB. Also, when using LVM, the maximum logical volume size is 16TB on x86 and 8EB (exabytes) on x86-64. Therefore, if you wish to create an ext3 file system larger than 2TB on a system with MBR formatting, it is possible to use LVM directly on a single disk device that has not been partitioned. It is also possible to concatenate a number of 2TB partitions with LVM, and then create the ext3 file system on the resulting logical volume.

Creating an EFI Partition

EFI-based systems were introduced with Itanium-based systems, but they are now present on some of the more recent x86 and x86-64 systems. EFI-based systems also support MBR formatting; however, you may consider using the GPT format instead for the advantages that this scheme brings. First, the GPT format is not limited to the primary and logical partitioning of the MBR; instead, it can support up to 128 partitions per disk.

Subject to using 64-bit SCSI addressing and a 2.6 kernel–based system, the GPT format can be up to 18EB in size—considerably larger than the 2TB limit. In addition, the GPT format uses primary and backup partition tables and checksum fields for redundancy and improved integrity. This can help you protect against partition table corruption. GPT partition tables also do not store any data in hidden sectors, which makes alignment offsets for external storage arrays unnecessary. Because the fdisk command does not support the GPT format, the parted command must be used instead.

The ability to implement GPT partitions after installation is not restricted to EFI-based systems. However, when using an unmodified version of GRUB, the default bootloader on x86 and x86-64, it is not possible to boot from a GPT partition. Therefore, GRUB must always be installed on an MBR-formatted partition, which means it is not possible to boot from a device with partitions larger than 2TB.

Customized Partitioning

It's possible you will want to customize the default partitioning scheme. Many system administrators prefer to partition the system with a number of partitions for distinct / (root device), swap, /boot, /home, /var, /usr /home, ORACLE_BASE, and other site-dependent configurations. Two of the main historical factors for adopting this schema were to reduce both the risk of corruption and the time required to complete file system checks in the event of a system restart when using a nonjournaled file system. We have found that Linux-based Oracle Database 11g Release 2 RAC nodes advanced, journaled file systems such as ext3 provide the most flexible approach. Therefore, we recommend the minimal configuration provided by the default installation. When installing the Oracle software, the database software is usually located in a directory tree below the ORACLE_BASE directory. Under the Optimal Flexible Architecture (OFA), the first mount point is the /u01 directory. This means that, in the default configuration, the /u01 directory can be created in the root partition. Thus we recommend that you have at least 10GB of disk space available for the Oracle software installation.

The space required for the boot device is dependent on the number of kernels compiled and configured for booting. Oracle Database 11g Release 2 RAC nodes only require the standard precompiled kernel, so space utilization rarely exceeds 30MB. The actual space used is dependent on the hardware architecture. However, if you're customizing the partition layout, then the 100MB allocated by the default configuration is sufficient.

Configuring Swap Space

Oracle's recommendations for Linux vary. For example, it recommends setting the swap size anywhere from equal to the RAM to four times the RAM. Consider the following documented example for Oracle 11g Release 2, which is typical. For systems with between 1GB and 2GB of RAM, Oracle recommends swap space of 1.5 times the size of RAM. For systems with between 2GB and 16GB of RAM, Oracle recommends that you set the swap size to equal the size of RAM. And for systems with more than 16GB of RAM, Oracle recommends 16GB of swap space. The size of a single swap partition in Oracle Enterprise Linux 5 is determined by the limits on the partition or the logical volume size, as discussed previously.

Swap space can be regarded as a disk-based component of virtual memory. In Linux, unlike some other UNIX operating systems, virtual memory is the sum total of RAM plus swap space. Recommendations to make the swap size one or two times the size of the RAM are often based on the requirements of traditional UNIX operating systems. This swap configuration was required because an executable program had been loaded into both swap space and RAM at the same time. Oracle's guidelines on swap allocation tend to reflect these traditional requirements. Linux, on the other hand, maintains swap space for demand paging. When an executable is loaded on Linux, it is likely that a portion of the executable will not be required, such as the error-handling code. For efficiency, Linux only loads virtual memory pages when they are used. If RAM is fully utilized and one process requires loading a page into RAM, Linux will write out the page to swap space from another process, according to a Least Recently Used (LRU) algorithm.

The performance of disk-based storage is orders of magnitude slower than memory. Therefore, you should plan to never be in the position of having pages written to swap space due to demand for RAM from competing processes. Oracle should always run in RAM, without exception. As a basic rule, you do not need a significant amount of swap space for Oracle on Linux, assuming there is enough memory installed. Every rule has an exception, however. Linux tends to be reasonably aggressive in managing swap space, maintaining as much free RAM as possible. It does this by writing out memory pages to swap space that have not been recently used, even if there is no current demand for the memory at the time. This approach is designed to prevent a greater impact on performance than doing all of the maintenance work when the space is required. This approach also frees up RAM for other potential, such as the disk buffer. The behavior of the swapping activity can be modified with the kernel parameter, vm.swappiness. A higher value means the system will be more active in unmapping mapped pages, while a lower value means that swapping activity will be reduced. However, the lower amount of swap activity comes at the expense that RAM that could potentially be freed.. The default value is 60, which you can see in /proc/sys/vm/swappiness. Note that memory allocated as huge pages is not swapped under any circumstances. The vm.panic_on_oom kernel parameter also affects virtual memory. By default this parameter is set to 0. If all virtual memory is exhausted (including the swap space), the operating system will recover memory by terminating processes with high levels of memory consumption, but low CPU activity. If the preceding kernel parameter is set to 1, a kernel panic will occur, terminating all system activity. This is the preferred approach under a RAC environment. However, you should endeavor to configure the virtual memory to ensure that the condition that provokes a kernel panic is never reached.

With this information, you can intelligently size the swap space for Oracle on Linux. You can also evaluate the correct choice between what may appear to be conflicting recommendations. First, any requirement for swap space that is two to four times the size of the RAM is an over specification based on traditional UNIX requirements. Second, although one given recommendation is for swap of 1.5 times the size of the RAM with installed RAM of 1GB to 2GB, we recommend that you install at least 2GB or RAM per installed CPU socket. Therefore, if you're meeting these guidelines, you should not plan for more swap space than you have RAM.

For the basic operating environment, swap space should not be utilized during normal usage. With enough memory, even a single 2GB swap partition will often suffice. The exception occurs in the case where many Oracle user processes or other processes are created and intersperse active periods with idle time. In this scenario, it is necessary to allocate swap space to account for cases where these processes are temporarily paged out of memory. This type of application profile may actively use more than 2GB of swap space without having any impact on overall performance; also, it can potentially increase performance by freeing RAM that is not being used. However, it is also likely under this application profile that an Oracle shared server configuration would be more efficient.

The swap space needed will depend on application requirements. Given sufficient disk space, we recommend adopting the simplest single starting point for swap space allocation for Oracle on Linux. That is, you should plan for an amount of swap space equal to the amount of RAM installed in the system. If you know that you will not create a large number of processes on the system, then you may reduce the allocation accordingly. If you do have a large number of processes but sufficient RAM, then you can evaluate using vm.swappiness to reduce the usage of swap space. If additional swap space is required, you can use the mkswap command at a later point in time. One benefit of the mkswap command: It can also be used when the system is operational. Conversely, if you have a large RAM configuration, then resizing the swap space by many tens or hundreds of gigabytes will not be detrimental to performance, as long as your swap space is equivalent to your installed RAM. However, this approach may use disk space unnecessarily. If this is the case, and you wish to conserve disk space, then you may size a swap partition smaller than the installed amount of RAM. Oracle advises a 16GB swap partition for configurations of more than 16GB of RAM. However, this does not mean that you cannot have a swap partition of more than 16GB if you wish. For example, you might use more than 16GB of swap space if you have enough disk capacity that conserving the disk space between 16GB and your available RAM size is not a concern.

Configuring RAID

If you do not have a system with onboard RAID, you may wish to consider protecting your Linux operating systems disk drives with an add-in RAID adapter. We recommend a hardware RAID solution wherever possible because it is far superior solution to its software equivalent. If you cannot configure a hardware RAID solution and wish to create a partitioning scheme with some level of redundancy, then there are two ways in which this can be achieved. First, you can use software RAID, and then use the configured RAID device as the underlying device for an LVM configuration. Second, you can use LVM to mirror devices. As previously noted, the /boot partition cannot reside on an LVM volume. When mirroring drives with a RAID 1 configuration (see Chapter 4 for more information on RAID), LVM reads from one side of the mirror only. In a software-based RAID solution, a RAID 1 configuration reads from both mirrored devices, which results in higher levels of performance.

It is also possible to configure the /boot partition on a RAID configuration with RAID 1 that uses at least one of the first two drives on the system. If the /boot directory is not a separate partition from the / (root) directory, then these conditions apply to the / (root) directory instead. For these reasons, we recommend software RAID over LVM if you're manually configuring redundancy. However—and this cannot be overemphasized—you should opt for hardware RAID wherever possible. Hardware-based RAID can protect against a drive failure transparently, but software-based software RAID may require rebooting the server if a drive fails. The advantage of the scenario just described is this: if the boot drive is in a software RAID configuration, then downtime is restricted to the time required for the reboot to take place, as opposed to the time required to restore from a backup.

The following example shows a software RAID 1 equivalent of the default partitioning scheme on the first two drives in the system. To begin configuring software RAID on the disk partitioning page, select Create custom layout from the dropdown menu and press Next. On the disk setup page, check whether there are any existing partitions or logical volumes. If so, use the Delete button to remove the disk configuration, so that the chosen drives can display free space (see Figure 6-3).

Setting up software-based RAID 1

Figure 6.3. Setting up software-based RAID 1

Next, select the RAID button. This will present you with a single option: Create a software RAID partition. Press OK to display the Add Partition window. As noted previously, it is not possible for the /boot partition to reside on a logical volume. For that reason, if the / (root) and swap are to utilize LVM, then you need to independently create four software RAID partitions. Create the first two partitions on /dev/sda and /dev/sdb and make them 100MB each. Next, create two more partitions, one on each drive. This enables you to utilize the rest of the maximum allowable size. Figure 6-4 shows the created RAID partitions.

Creating the RAID partitions

Figure 6.4. Creating the RAID partitions

Press the RAID button again to make the following option available: Create a RAID device. This screen also shows that you now have four software RAID partitions you can use. Select the

Create a RAID device option to display the Make RAID Device window. Next, choose the following options: /boot for the Mount Point, ext3 for the File System Type, md0 for the RAID Device, and RAID 1 for the RAID Level. For the RAID members, select the two 100MB software RAID partitions created previously, and then press OK. /boot is created on /dev/md0. Repeat the process to display the Make RAID Device window. However, this time you should select a File System Type of physical volume (LVM) from the RAID Device md1 with RAID Level 1. Also be sure to select the two remaining RAID Members. Click OK to create the physical volume. Now press the LVM button to display the Make LVM Volume Group window, and then press the Add button to create the logical volumes for the / (root) and swap (see Figure 6-5).

Creating the LVM Volumes

Figure 6.5. Creating the LVM Volumes

Press OK to complete the disk partitioning. At this point, you have created the default configuration with RAID 1 across the first two drives in the system (see Figure 6-6).

The software RAID configuration

Figure 6.6. The software RAID configuration

After installation, the partitioning layout will look something like this:

[root@london2 ˜]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      433G  1.3G  409G   1% /
/dev/md0               99M   13M   82M  14% /boot
tmpfs                 7.9G     0  7.9G   0% /dev/shm

After completing your chosen partitioning scheme, click Next to save the partitioning map and advance to the Boot Loader Configuration page on x86 and x86-64 systems. The disks will not be physically partitioned at this point. This won't occur until after all of the installation information has been collected, but before the installation of packages has commenced.

Configuring the Boot Loader and Network

The Boot Loader Configuration page is displayed on x86 and x86-64 systems. Accept all of the default GRUB bootloader options, and then click Next to continue to the Network Configuration page.

An Oracle Database 11g Release 2 RAC node needs at least two network devices, one for the external interface and one for the private interconnect network. These network devices should already have been cabled and configured for their respective networks by a network administrator. Assuming Ethernet is used, these will by default be displayed as devices with the following names: eth0, eth1, eth2, and so on. If you plan to use NIC bonding (also known as NICteaming) for interconnect resilience (see Chapter 4 for more information on this topic), the Network Configuration page will display more than the required number of devices. Teaming cannot be configured until after installation is complete, so you should only configure primary devices, leaving the Active on Boot checkbox unchecked against the planned secondary devices. The primary device configuration created at this stage will be useful when migrating to the teamed configuration.

Whether using single network devices or a teamed configuration, you need to ensure that all nodes have the same interface name for the external device. For example, if eth0 is configured as the public interface on the first node, then eth0 should also be selected as the public interface on all of the other nodes. This is a requirement for the correct operation of the VIP addresses configured during the Oracle Grid Infrastructure software installation.

On the Network Configuration page, highlight the primary external devices—for example, eth0—and click Edit. This brings up the Edit Interface dialog box for eth0. Whether you are using a GNS configuration or a manual IP configuration, we recommend that you use fixed IP addresses for the public and private interfaces on all nodes. Therefore, you should uncheck the Configure Using DHCP checkbox. If using GNS, DHCP will be used during the Oracle install to configure the VIP and SCAN addresses. However, DHCP is not required directly on the host network configuration. If you are running a network installation, then the configuration for your primary interface will already be populated from the network information given previously.

Next, you should ensure that the Activate on Boot checkbox is selected, and then complete the IP Address and Netmask according to your planned network configuration. IPv6 is not supported in a RAC configuration, so it should be disabled on all interfaces; click OK after entering the network configuration. Next, complete the information for the private interconnect device using an IP address from Table 6-5. These IP addresses have been reserved for use in private networks. For most clusters with fewer than 254 nodes, a class-C network address with a nonsegmented subnet mask (255.255.255.0) should be sufficient.

Table 6.5. IP Addresses for Private Networks

Class

Networks

Subnet Mask

A

10.0.0.0 through 10.255.255.255

255.0.0.0

B

172.16.0.0 through 172.31.0.0

255.255.0.0

C

92.168.0.0 through 192.168.255.0

255.255.255.0

If you provided the DNS Server IP Address for a network-based installation, then the DNS Server address will already be completed, and the hostname will already be configured with the full domain extension, such as london1.example.com. Alternatively, you can manually set the hostname and any gateway or DNS server addresses according to your network configuration.

Your naming conventions should allow for additional nodes, as well as for any standby databases that may be added in the future. Our example uses a geographically based naming convention, where the first node in the cluster is london1, and the second node is london2. A third node would be london3, and so on. Figure 6-7 shows how to configure host london1 using an external network configuration on eth0 and an interconnect configuration on eth1.

The network configuration

Figure 6.7. The network configuration

When you have finished updating the network settings, click Next to continue to the Time Zone Selection page.

Selecting a Time Zone

The time zone required will be determined by the geographical location of your cluster. All nodes within a cluster should have the same time zone configuration. You can select the nearest location on the interactive map or from the dropdown list beneath the map.

Oracle Enterprise Linux allows you to specify whether the hardware clock should be set to UTC/GMT time or local time. We recommend setting the system clock to UTC by checking the System Clock Uses UTC option. This enables automatic adjustment for regional daylight savings time. Setting the hardware clock to local time is useful only for systems that are configured to dual boot with other operating systems that require a local time setting. However, it is unlikely that you will want to configure dual booting for a production Oracle Database 11g Release 2 RAC node.

After selecting the correct system time zone, click Next to continue to the Root Password Configuration page.

Configuring the Root Password

The password for the root user account must be at least six characters in length. Enter and confirm a suitable password, and then click Next to continue to the Package Installation Defaults page.

Reviewing the Package Installation Defaults

The Package Installation Defaults page gives you a pair of options. First, you can accept a default set of packages for the Linux installation. Second, you can customize the set of packages available. Oracle provides an RPM in addition to the default installation. That RPM is termed the Oracle Validated RPM. After installation, the RPM installs any packages required for running an Oracle environment and automatically configures system settings. For example, it creates the Oracle user and related groups, as well as configuring the correct kernel parameters. The Oracle Validated RPM is included in the installation media for Oracle Enterprise Linux 5.2 and Oracle Enterprise Linux 4.7 upwards on x86 and x86-64. In addition, Oracle customers subscribed to the Unbreakable Linux Network (ULN) can retrieve the Oracle Validated RPM from there. Customers not subscribed to the Unbreakable Linux Network (ULN) can obtain the RPM from the same URL used to obtain the release notes. For example, the Oracle Validated RPM for Oracle Enterprise Linux 5 can be downloaded from this location:http://oss.oracle.com/el5/oracle-validated/.

Selecting a Package Group

The availability of the Oracle Validated RPM gives you two options for the installing Oracle Enterprise Linux. You can do either a minimal install or a default install. Running the Oracle Validated RPM after installing Oracle Enterprise Linux completes the configuration by installing the remaining Oracle required packages. You should select a minimal installation only if you require the lowest number of packages to install and run Oracle. A minimal installation also has the advantage of reducing the number of unneeded services that will run on the system. By default, it frees resources for Oracle's use. For example, even after you run the Oracle Validated RPM, using the minimal installation option does not install the X Windows server packages, which means it does not enable a graphical environment to be run directly on the server. However, the installation does include the X Windows client libraries, which enables graphical tools to be run on the local system, but displayed on a remote system with a full X Windows installation. Before choosing a minimal installation, you should also be confident that you already have or are able to configure a suitable YUM repository to resolve package dependencies when running the Oracle Validated RPM. You will learn more about configuring this repository later in this chapter. Assuming you're ready to proceed with a minimal installation (see Figure 6-8), select the Customize now checkbox at the Package Group Selection page, and then press Next.

The Package Group Selection screen includes the option to modify the package selection grouped under a number of headings, such as Desktop Environments, Applications, and Development. Under each of these headings, the right-hand pane lets you deselect all of the packages, except for the Base packages detailed under Base System. The Base system also includes the openssh package, which is required for internode communication in RAC.

A minimal package group selection

Figure 6.8. A minimal package group selection

It's possible you might wish to choose a more complete initial installation at the Package Group Selection page. If so, then leave the default selection of packages unmodified, select the Customize later checkbox, and Press Next. If you do not wish to run the Oracle Validated RPM or the RPM is not available for your architecture, then it is necessary to modify the default selection of packages at installation time. For this option, select the Software Development and the Customize now checkboxes, and then press Next. With the Development option selected in the left-hand pane, select Legacy Software Development in the right-hand pane. Next, under the heading Base System in the left-hand pane, select Legacy Software Support in the right-hand pane, and then press the Optional packages button. Now select compat-db under Packages in the Legacy Software Support window and press Close. Similarly, you should select sysstat under Packages in the System Tools window that is under the System Tools heading.

When you have selected all of the required packages for your installation, click Next and the installer will check the dependencies in the packages you have selected before continuing to the Begin Installation page.

Installing Packages

Click Next to proceed with the installation from the Begin Installation page. This will display the Required Install Media dialog box, which shows the CDs that will be needed if you're installing from CD-ROM media, depending on the packages selected. Click Continue to initiate the installation process.

The Installing Packages page shows the packages currently being installed, and it gives an estimate of the time remaining to complete the installation. If you are not performing a network installation, change the CD-ROM when prompted by the installer.

After you install the packages, you need to remove any remaining CD-ROMs or bootable USB drive, and then click Reboot.

Setting the Final Configuration

If you perform a minimal installation, on the first boot the system will present a text dialog to run a number of administrative tasks. Pressing Exit takes the system to a nongraphical login prompt. If you perform a default graphical installation, the system will call the /etc/rc.d/init.d/firstboot script the first time it system boots after installation. This enables you to set the final configuration before the installation process completes. When this script runs, the process creates the file /etc/sysconfig/firstboot, which includes the following entry: RUN_FIRSTBOOT=NO. To rerun firstboot on any subsequent reboots, it is necessary to remove this file and ensure the firstboot script is called during the boot process. You can do this by executing the command chkconfig -level 5 firstboot on. When firstboot runs, the Welcome screen is displayed. This left-hand pane for this page shows the stages required to complete the firstboot process. Click Forward to advance to the first of these stages, the License Agreement page.

Accepting the License Agreement

Browse the license agreement to be aware of the conditions, and then select Yes, I Agree to the License Agreement. You must accept the license agreement to complete the firstboot process. Click Forward to continue to the Firewall page.

Configuring the Firewall

Oracle Enterprise Linux enables you to configure varying degrees of network security. However, for an Oracle Database 11g Release 2 RAC node, enabling a firewall configuration can inhibit the correct functioning of Oracle services. For the Oracle Database 11g Release 2 RAC, we suggest you implement a dedicated firewall infrastructure on the network, but that you do not configure a firewall on any of the nodes themselves. Therefore, choose the Disabled option from the dropdown menu on the Firewall page and click Forward to continue. This will present an additional warning dialog box; click Yes to advance to the SELinux page.

Configuring SELinux

The Security Enhanced Linux (SELinux) page provides the option to configure security settings, so they comply with a number of security policies issued by regulatory bodies. Just as you did when configuring the firewall, select the Disabled option from the dropdown menu and click Forward to continue. You will see another warning dialog box that indicates no firewall has been configured; click Yes to advance to the Kdump page.

Enabling kdump

kdump is a crash-dump utility based on the system calls provided by kexec. This utility provides system state information at the time of a system crash. By default, this information is stored in /var/crash.

Note

In an Oracle 11g Release 2 RAC environment, we do not recommend enabling kdump by default during installation. There are two reasons for this. First, the options available during installation are limited to a local crash-dump configuration. Second, the inherent stability of the overwhelming majority of Oracle 11g Release 2 RAC on Linux environments means that you may simply not need to consider crash-dump analysis as part of your operations.

However, if you do find that you need such a utility at a later point, then we recommend configuring a net dump server to send crash dumps to a remote server across the network. You can then configure kdump on the local system using the graphical command system-config-kdump; run this command as the root user. You can learn more about kdump configuration in the /etc/kdump.conf file, and authentication with the net dump server is enabled by the command service kdump propagate. After a reboot, kdump can then be managed as a service.

In addition to enabling kdump for debugging purposes, you may also consider setting the kernel parameter kernel.panic_on_oops to a value of 1. By default, this value is set to 0. If this parameter is left at its default value, a kernel error may cause an oops and terminate the process involved in the error. If this happens, the kernel may be left in an uncertain state, which means that a kernel panic occurs at a later point when the affected resources are attempted to be used, as opposed to the time when they were corrupted. Setting kernel.panic_on_oops to 1 means that the system will not attempt to continue past an error, and a kernel panic will occur at the time of the oops. This behavior might deliver better root cause analysis. For example, it might help pinpoint an error to a particular device driver.

If you choose to accept our recommendation and not enable kdump at this time, leave the Enable kdump checkbox unselected and press Forward to move to the Date and Time page.

Setting the Date and Time

The Date and Time page displays the information received from the internal system clock. Before installing Oracle Database 11g Release 2 RAC, it is essential that all of the nodes within the cluster be configured with exactly the same time settings. You can do this by selecting the Network Time Protocol tab. If you are setting the Network Time Protocol here, select the Enable Network Time Protocol checkbox. If you do not have specific NTP servers available, there are public NTP servers given by default at these addresses: 0.rhel.pool.ntp.org, 1.rhel.pool.ntp.org, and 2.rhel.pool.ntp.org. You have a couple options here. First, you can use these public servers. Second, you can delete the public servers from the Server listbox and instead add the names of the NTP servers you have permission to use. Click Forward and the system will contact the NTP servers specified before proceeding to the Create User page. If you do not configure NTP either at this stage or manually at a later point, then the Oracle Software installation will report that the NTP configuration is missing. Even so, it will continue to proceed and configure the Oracle Cluster Time Synchronization Service daemon (OTCSSD)CTSS in an active state. This ensures that the time is configured identically on all nodes in the cluster. However, NTP has the advantage that this unified time will also be set to the correct clock time.

Creating a User

The Create User page lets you create a user account in addition to the standard root user on the system. However more specific details than are available here are required, in terms of the Oracle user and groups these are required to be created by the Oracle Validated RPM or manually after installation. Therefore, click Forward, and then click Continue to bypass the warning dialog box. This skips creating any additional users at this point. A subsequent page will be displayed if a sound card is detected on the system. However, it is unlikely that a sound card will be present on a server dedicated to Oracle. If one is detected, however, just click the Play button on the displayed page to test the configuration. Next, click Forward to continue to the Additional CDs page.

Installing Additional CDs

The Additional CDs page lets you install applications in addition to the standard Red Hat Enterprise Linux software. No additional software is required to be installed to complete an Oracle configuration at this stage. Click Finish to complete the installation, and then click OK at the Reboot dialog window if SELinux has been disabled.

Configuring Oracle Enterprise Linux 5

This section covers the steps required to configure the Linux operating system in preparation for an Oracle 11g Release 2 RAC installation. Our aim goes beyond simply listing the configuration changes required. Rather, we want but to explain the reasons for these changes and their effects. Doing so will enable you to optimize the each environment intelligently.

Unless stated otherwise, each configuration step should be performed by the root user on every node in the cluster. We recommend that you fully install all nodes in the cluster and confirm that the network is operational before attempting to configure the shared storage.

This chapter focuses on changes that are either essential or of great benefit for installing and operating Oracle RAC. Next, we will discuss the following configuration and verification topics:

  • Configuring the Oracle Validated RPM and YUM

  • Running the Oracle Validated RPM

  • Verifying the Oracle Validated RPM Actions

  • Post Oracle Validated RPM configuration of the Oracle user, kernel, and kernel module parameters

  • Completing the Linux configuration with hostnames and name resolution

  • Setting the Network Time Protocol (NTP)

  • Implementing a secure shell

  • Setting up shared storage, including udev and device-mapper

  • Handling network channel bonding

  • Configuring IPMI

All of the configuration changes described in this chapter can be made to the standard Oracle Enterprise Linux installation. However, your hardware configuration can make some steps optional. For example, you may not need to configure I/O multipathing or IPMI. This chapter will assist you in implementing the requiring settings for a successful Oracle 11g Release 2 RAC installation. It will also help you understand why these settings are required to achieve an optimal configuration.

Configuring a Server with the Oracle Validated RPM

The fastest way to configure an Oracle Enterprise Linux server for an Oracle database installation is to run the Oracle Validated RPM. Although that RPM is located on the installation media, by default it is not installed with the operating system. Therefore, we recommend installing and running the Oracle Validated RPM because it can help you ensure a consistent and complete installation across all nodes of the cluster. However, simply copying and running the Oracle Validated RPM after a default installation will fail because the RPM has several additional dependencies, as shown in the following example:

[root@london1 ˜]# rpm -ivh oracle-validated-1.0.0-18.el5.x86_64.rpm
warning: oracle-validated-1.0.0-18.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 1e5e0159
error: Failed dependencies:
        compat-gcc-34 is needed by oracle-validated-1.0.0-18.el5.x86_64
        compat-gcc-34-c++ is needed by oracle-validated-1.0.0-18.el5.x86_64
        libXp.so.6 is needed by oracle-validated-1.0.0-18.el5.x86_64
        libaio-devel is needed by oracle-validated-1.0.0-18.el5.x86_64
        libdb-4.2.so()(64bit) is needed by oracle-validated-1.0.0-18.el5.x86_64
        libodbc.so.1()(64bit) is needed by oracle-validated-1.0.0-18.el5.x86_64
        sysstat is needed by oracle-validated-1.0.0-18.el5.x86_64
        unixODBC-devel is needed by oracle-validated-1.0.0-18.el5.x86_64

There's good news and bad news. The bad news first: Resolving these dependencies manually can be a time-consuming process. The good news: Oracle Enterprise Linux comes with a tool called the Yellow dog Updater, Modified (YUM) that can help you resolve these dependencies automatically and automate the installation of the Oracle Validated RPM.

Configuring YUM

A YUM server provides a repository for RPM packages and their associated metadata. This makes installing the packages and their dependencies straightforward. Oracle provides a public YUM server at http://public-yum.oracle.com, but its server provides only the packages you have already downloaded on the installation media. Subscribers to the Unbreakable Linux Network can access additional security updates and patches on top of the content available on the public YUM server. If you do not have access to the Unbreakable Linux Network or you do not wish to use the public YUM server, it is a simple enough process to configure your own from the installation media.

Previously in this chapter, we explained how to configure VSFTP to make the installation media available across the network. Fortunately, this installation media also contains the YUM repository metadata. For example, the metadata for the Server RPMs is located under the Server/repodata directory in the file repomd.xml. This means you do not have to create additional metadata by building a custom repository with the createrepo command. You can test the readiness of the installation media for use as a YUM repository with the wget command. This should enable you to retrieve a selected file without any additional steps, as shown in this example:

[root@london1 tmp]# wget ftp://ftp1.example.com/pub/el5_4_x86_64/Server/oracle-validated*
...
2009-09-17 13:54:06 (6.10 MB/s) - `oracle-validated-1.0.0-18.el5.x86_64.rpm.1' saved [15224]

If you prefer, your local YUM repository can also be based on a mounted DVD-ROM. However, this repository will be local to an individual server. With the repository available, the next step is to configure YUM on the client. To do this, edit the file /etc/yum.conf and add the following section to the end of the file:

[Server]
name=Server
baseurl=ftp://ftp1.example.com/pub/el5_4_x86_64/Server/
gpgcheck=0
enabled=1

Setting the gpgcheck option to the value of 1 ensures that all packages are signed and that YUM will verify the signatures. As the preceding example shows, it is not strictly necessary to set this value if the YUM server is private and based on the Oracle installation media. However, you should set the value to 1 when using public YUM servers. The baseurl property specifies the location of the server RPMs. In the preceding example, baseurl property points to the FTP Server. If a local DVD is used, then baseurl takes the following form: baseurl=file:///media/disk/Server/. In this case, baseurl reflects the location of the mounted DVD. It is also possible to copy the DVD installation media to the local server, mount it with the loop option, and specify this as the file location. However, this approach is likely to be considerably more time consuming than simply configuring the FTP Server. To check the configuration, run the command yum list. If correctly configured, this command will list the RPMs shown as installed, as well as the ones listed under the Server category.

Running the Oracle Validated RPM

The system is now ready for you to install the Oracle Validated RPM with the yum install oracle-validated command. Once all the dependencies have been resolved, it is necessary to confirm the installation at the Is this ok prompt, as shown in the following truncated output:

[root@london1 etc]# yum install oracle-validated
Loaded plugins: security
Setting up Install Process
Resolving Dependencies
...
Is this ok [y/N]: y
...
Installed:
  oracle-validated.x86_64 0:1.0.0-18.el5

Dependency Installed:
  compat-db.x86_64 0:4.2.52-5.1
compat-gcc-34.x86_64 0:3.4.6-4
  compat-gcc-34-c++.x86_64 0:3.4.6-4
  elfutils-libelf-devel.x86_64 0:0.137-3.el5
  elfutils-libelf-devel-static.x86_64 0:0.137-3.el5
  gcc.x86_64 0:4.1.2-46.el5
  gcc-c++.x86_64 0:4.1.2-46.el5
  gdb.x86_64 0:6.8-37.el5
  glibc-devel.i386 0:2.5-42
  glibc-devel.x86_64 0:2.5-42
  glibc-headers.x86_64 0:2.5-42
  kernel-headers.x86_64 0:2.6.18-164.el5
  libXp.i386 0:1.0.0-8.1.el5
  libaio-devel.x86_64 0:0.3.106-3.2
  libgomp.x86_64 0:4.4.0-6.el5
  libstdc++-devel.x86_64 0:4.1.2-46.el5
  sysstat.x86_64 0:7.0.2-3.el5
  unixODBC.x86_64 0:2.2.11-7.1
  unixODBC-devel.x86_64 0:2.2.11-7.1

Complete!

Once the Oracle Validated RPM installation completes, all the RPM packages and system configuration steps required for an Oracle Database 11g Release 2 RAC installation have also been completed. For example, the required user and groups have been created, and the necessary kernel parameters have been set. You can find the installed packages listed in /var/log/yum.log. In some combinations of the Oracle Validated RPM and Oracle Database 11g Release 2 on 64-bit systems, the OUI will report that some of the required packages are missing. For example, packages commonly listed as missing include libaio-devel, unixODBC, and unixODBC-devel. The error typically looks something like this:

This is a prerequisite condition to test whether the package
"libaio-devel-0.3.106" is available on the system. (more details)
  Check Failed on Nodes: [london2, london1]

This warning occurs because, on x86-64 architecture systems, the OUI checks for the presence of both the x86-64 and i386 packages, whereas some versions of the Oracle Validated RPM install only the 64-bit versions. The package versions can be verified with the following snippet:

[root@london1 ˜]# rpm -q --queryformat "%{NAME}-%{VERSION}-%{RELEASE} (%{ARCH})
" 
libaio-devel 
unixODBC 
unixODBC-devel
libaio-devel-0.3.106-3.2 (x86_64)
unixODBC-2.2.11-7.1 (x86_64)
unixODBC-devel-2.2.11-7.1 (x86_64)

You can prevent 32-bit RPMs from being reported as missing by the OUI by installing them directly with the rpm command or with the yum command. Doing so ensures that the requested architecture is fully specified by the yum install command, as in this example:

[root@london1 ˜]# yum install unixODBC-2.2.11-7.1.i386 unixODBC-devel-2.2.11-7.1.i386 libaio-devel-0.3.106-3.2.i386
Loaded plugins: security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package libaio-devel.i386 0:0.3.106-3.2 set to be updated
---> Package unixODBC.i386 0:2.2.11-7.1 set to be updated
---> Package unixODBC-devel.i386 0:2.2.11-7.1 set to be updated
--> Finished Dependency Resolution
...
Installed:
  libaio-devel.i386 0:0.3.106-3.2           unixODBC.i386 0:2.2.11-7.1
  unixODBC-devel.i386 0:2.2.11-7.1

Complete!

You can now verify the subsequent presence of the correct architectures for the packages:

[root@london1 ˜]# rpm -q --queryformat "%{NAME}-%{VERSION}-%{RELEASE} (%{ARCH})
" 
> libaio-devel 
> unixODBC 
> unixODBC-devel
libaio-devel-0.3.106-3.2 (x86_64)
libaio-devel-0.3.106-3.2 (i386)
unixODBC-2.2.11-7.1 (x86_64)
unixODBC-2.2.11-7.1 (i386)
unixODBC-devel-2.2.11-7.1 (x86_64)
unixODBC-devel-2.2.11-7.1 (i386)

On NUMA based systems you may also wish to install the numactl-devel RPM package to enable full NUMA support, see chapter 4 for more details. The preceding YUM configuration provides only the Server RPM packages required by the Oracle Validated RPM. However, if you wish to continue to using YUM as a method for RPM installation, then it is also necessary to add additional categories in /etc/yum.conf under the Cluster, ClusterStorage, and VT headings (and their corresponding RPM directories). Adding these categories lets you ensure the full availability of all of the RPM packages for YUM that are present on the installation media. Alternatively, you may use the Oracle public YUM server.

Using the up2date Command

If you are a customer of the ULN, then once you have registered your system, you may also use the up2date command to install and run the Oracle Validated RPM and maintain the operating system. However, we recommend that you avoid using this command because it has been superseded by YUM. The up2date command provides functionality similar to YUM, albeit with different syntax. Specifically, this command can automatically resolve dependencies of the installed RPM packages.

Note

Not only has up2date has been superseded by Yum, but it is no longer used in Red Hat Enterprise Linux 5, the upstream release for Oracle Enterprise Linux 5.

Although the up2date command is available in Oracle Enterprise Linux 5, it no longer implements the original up2date configuration. For the sake of backward compatibility, it has essentially been re-implemented as a wrapper around YUM commands by using YUM repositories. Consistency in configuration across nodes is essential for a clustered environment, so we recommend that you standardize on using only YUM commands for installing packages and resolving dependencies. To implement this approach, we also recommend that you use a local YUM repository (as described previously in this chapter) and synchronize this repository with the systems provided by the ULN. The ULN's YUM Repository Setup explains how to do this. You can then use your own YUM repository as the direct source for updating the packages on the nodes in your cluster. The benefit of this approach: You won't need to rely on an external repository where an interruption in access could render the operating system configurations on the nodes in your cluster inconsistent.

Verifying the Oracle Validated RPM Actions

In addition to installing the required RPM packages, the Oracle Validated RPM installs and runs a script to automate the configuration of the Linux operating system. This script prepares the system for the installation of the Oracle software. If you are unable to run the Oracle Validated RPM (or you choose not to), then reviewing these tasks also serves a checklist for completing the required actions manually.

The script named oracle-validated-verify is located in the /etc/sysconfig/oracle-validated directory with a symbolic link to the script located in /usr/bin. The /etc/sysconfig/oracle-validated directory also includes the file called oracle-validated.params. This file contains the input parameters for the oracle-validated-verify script. After you run this script, you can find a file called orakernel.log in the /etc/sysconfig/oracle-validated/results directory. This log contains the details of the script's actions.

The oracle-validated script's initial actions are to gather details on the system architecture, operating system kernel and distribution release, and the processor type. It uses the uname -m and uname -r commands and the /etc/issue and /proc/cpuinfo files to determine its subsequent actions. These actions are based on the values found in oracle-validated.params. The script also validates that the user running the script is the root user.

Creating the Oracle User and Groups

In some environments, one group of users will install and maintain Oracle software, while a separate group of users will utilize the Oracle software to maintain the database. Maintaining separate groups requires the creation of user groups to draw a distinction between two sets of permissions. The first set lets a group install Oracle software, which requires access to the Oracle Universal Installer (OUI) and oraInventory. The second set grants permissions for a group to accomplish general database administration tasks. These two groups are typically named oinstall and dba.

In most RAC environments, the DBA installs, maintains, and uses the Oracle software. In such cases, the oinstall group may appear superfluous. However, the Oracle inventory group is included in some preinstallation system checks; therefore, using this group for Oracle software installation is a good practice. The oracle-validated-verify script creates the dba and oinstall groups using the commands groupadd dba and groupadd oinstall, respectively. For a new installation, these groups will have the group IDs of 500 and 501, respectively. If you create these groups manually, you may specify the group ID on the command line:

[root@london1 root] # groupadd -g 500 dba

Any group ID may be used, but the same group ID should be used for the dba and oinstall groups on all nodes in the cluster. For this reason, you should ensure that no other groups are created on all the nodes in the cluster until the oracle-validated-verify script has been run.

After creating the groups, the script uses the following command to create the oracle user as a member of the dba and oinstall groups:

useradd -g oinstall -G dba -d /home/oracle -p $encpasswd oracle

Also, note how the preceding line sets the encrypted password environment variable to oracle.

As when creating a group, there is no cluster awareness of the user ID when creating a user. If this is the first user created after installation, then the user ID will be 500. Again, you should ensure that this is the same on all the nodes in the cluster. You can run the following command to create the oracle user manually:

[root@london1 root] # useradd -u 500 -g oinstall -G dba -d /home/oracle oracle

The preceding line creates a specific user ID, where the user is also a member of the dba and oinstall groups. The oracle user will be appended to the /etc/passwd file, and it will have the group ID of the dba group. The user will also be added to the dba and oinstall groups in the /etc/group file, as well as to the default home directory location of /home/oracle. The oinstall group is the primary group, while the dba group is a supplementary group. The /home directory can be in any location, but it's good practice to keep it distinct from the ORACLE_HOME directory, the default location for Oracle software. By default, the user is created with the default bash shell, which can be modified by using the useradd command with the -s option.

You can verify that the oracle user and dba group have been correctly configured using the id command, as in this example:

[oracle@london1 ˜]$ id oracle
uid=500(oracle) gid=501(oinstall) groups=501(oinstall),500(dba)

After the oracle-validated-verify script creates the account, you should use the passwd command to change the default, unsecured password for the oracle user to something more secure:

[root@london1 root]# passwd oracle
Changing password for user oracle
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

To complete the user configuration, you must also manually configure the environment variables for the oracle user; you'll learn how to do this later in this chapter.

Configuring Kernel Parameters

After configuring the Oracle user and groups, the Oracle Validated RPM configures the Linux kernel parameters to meet Oracle's recommendations. The updated kernel parameters are set in the file /etc/sysctl.conf; you can find a copy of the original at /etc/sysctl.conf.orabackup. Some of the kernel parameters are set according to value of the total memory in the system, which is retrieved from/proc/meminfo. Existing parameters are modified in place, while additional parameters are added to the end of the /etc/sysctl.conf. These additions are based on the values contained in oracle-validated.params. You can also manually edit the /etc/sysctl.conf file, save its contents, and then apply those settings using the following command:

[root@london1 root]# sysctl -p

The kernel parameters and their correct values for 11g Release 2 are shown in Table 6-6.

Table 6.6. The Kernel Parameter's Recommended Values

Kernel Parameter

Recommended Value

kernel.sem (semmsl)

250

kernel.sem (semmns)

32000

kernel.sem (semopm)

100

kernel.sem (semmni)

142

kernel.shmall

1073741824

kernel.shmmax

4294967295 on x86 4398046511104 on x86-64

kernel.shmmni

4096

kernel.msgmni

2878

kernel.msgmax

8192

kernel.msgmnb

65536

kernel.sysrq

1

fs.file-max

327679

fs.aio-max-nr

3145728

net.core.rmem_default

262144

net.core.rmem_max

4194304

net.core.wmem_default

262144

net.core.wmem_max

262144

net.ipv4.ip_local_port_range

1024 to 65000

The recommended Linux kernel parameter settings are very much general guidelines, but understanding these parameters and how applicable they are to your particular system is essential. You may still need to change some of the updated parameters for your configuration. These parameters can be loosely grouped into five classifications: shared memory, semaphores, network settings, message queues and open files. We will examine each of these classifications next.

Working with Shared Memory

When an Oracle instance starts, it allocates shared memory segments for the SGA. This allocation applies to RAC instances in exactly the same way as a single instance. The SGA has fixed and variable areas. The fixed area size cannot be changed, but the variable area, as the name implies, will vary according to the Oracle database parameters set in the SPFILE or init.ora file.

Within Oracle Database 11g Release 2 RAC, the memory configuration of the SGA and PGA—and consequently, the appropriate kernel parameters—are dependent on whether Automatic Memory Management (AMM), Automatic Shared Memory Management (ASMM), or even manual memory management is used. If AMM is used, then Oracle manages both SGA and PGA from one memory allocation set, according to the Oracle parameters MEMORY_TARGET and MEMORY_MAX_TARGET. If ASMM if used, however, the two most significant Oracle parameters to be aware of for sizing the SGA are SGA_TARGET and SGA_MAX_SIZE. And you must also know that session-level memory set with the PGA_AGGREGATE_TARGET parameter is not allocated from the SGA.

Setting the SGA_TARGET parameter enables Oracle to automatically set the parameters for the variable components of the SGA, such as db_cache_size and shared_pool_size; doing this does not necessarily require manually sizing these components individually. However, manually sized components such as the redo log buffer, streams pool, and nondefault database caches (e.g., the keep, recycle, and nondefault block size caches) are given priority for the allocated shared memory. Subsequently, the balance of the memory is allocated to the automatically sized components.

If you do choose to set these parameters manually, the kernel parameters required are the same as when ASMM is used. You can view the allocated shared memory segments using the ipcs -m command. The following example shows a single shared memory segment created for an SGA_TARGET and SGA_MAX_SIZE that is set to 14GB. The additional shared memory segment in this case is a single 4KB page for the ASM instance, where shmid identifies the memory mapped files in /dev/shm for the use of AMM:

[oracle@london1 ˜]$ ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x6422c258 32768      oracle    660        4096       0
0x10455eac 98305      oracle    600        15034482688 31

You can also view the defined shared memory limits with the ipcs -lm command:

[root@london1 ˜]# ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 47996078
max total shared memory (kbytes) = 32388608
min seg size (bytes) = 1

As introduced in Chapter 4, the choice of Oracle memory management determines the operating system call to allocate memory. The shmget() system call is used for ASMM, while the mmap() system call is used for AMM. Failure to set the appropriate memory-related kernel parameters correctly could result in the Oracle instance failing to start with the correct kernel.shmall, kernel.shmmax, and kernel.shmmni parameters for ASMM. The section on Open Files should be reviewed for the parameters relevant to using AMM.

Setting the kernel.shmmax Parameter

The shmmax parameter sets the maximum size (in bytes) for a shared memory segment allowed on the system. By default on Oracle Enterprise Linux 5, shmmax is set to 68719476736, which is 64GB. This parameter limits the maximum permissible size of a single Oracle SGA shared memory segment. Typically, this should be set to exceed the value of SGA_MAX_SIZE, which itself must be greater than SGA_TARGET. If SGA_MAX_SIZE is greater than shmmax, then Oracle will fail to create the SGA in a single shared memory segment. Instead, it will use multiple, smaller shared memory segments. This most significant implications of this behavior occur when there is a NUMA-based architecture (see Chapter 4 for more information).

On NUMA architecture systems, enabling NUMA at the system, operating, and Oracle database levels results in the creation of multiple shared memory segments for the SGA, with one per memory node for an optimal NUMA configuration. (again, see Chapter 4 for information on how to accomplish this). If multiple shared memory segments are created, however, shmmax limitations may keep them from being evenly distributed.

Note

shmmax determines the size of any individual shared memory segment, as opposed to the entire region of shared memory. This holds true whether you set the shmmax parameter to a value smaller than the SGA, or it is set as a result of NUMA optimization.

When creating a single shared memory segment, setting shmmax as follows will limit an Oracle SGA size that uses that single shared memory segment to 2GB:

kernel.shmmax = 2147483648

However, setting this value in excess of your desired SGA_MAX_SIZE enables Oracle to manage its own memory effectively. For example, assume you're on a system with 16GB of RAM available, an SGA_MAX_SIZE of 10GB, and an SGA_TARGET of 8GB. In this case, setting kernel.shmmax as follows enables a maximum shared memory segment of just over 10GB, which is above that required by the Oracle parameters:

kernel.shmmax =  10737418240

The Oracle Validated RPM sets the value for this parameter to 4398046511104 (4TB) on x86-64 or 4294967295 (4 GB) on x86. This helps ensure that the parameter's limit is above the architectural memory limits discussed in Chapter 4.

Setting the kernel.shmmni Parameter

The shmmni parameter specifies the maximum number of shared memory segments permissible on the entire system. For an Oracle system with one shared memory segment per SGA, this parameter reflects the maximum number of Oracle instances, including ASM instances that you wish to start on a single server. Therefore, the default value of 4096 is retained by the Oracle Validated RPM. In an operational environment, however, the number of shared memory segments is highly unlikely to ever approach this value.

Setting the kernel.shmall Parameter

Use the shmall parameter to define the maximum number of shared memory pages that can be allocated at any one time on the system; its default value is 4294967296 on x86-64 based architectures, which is equivalent to 16TB. This value is given in system pages, not in bytes, and it should be set to at least the value of shmmax/system page size. This helps ensure that sufficient memory pages are allocated to a single required SGA. Using the following Oracle Validated RPM value of on an x86-64 system, with a default 4KB page size, reduces the maximum shared memory allocation of 4398046511104 so that it is precisely equal to the 4TB limit of shmmax:

kernel.shmall = 1073741824

Therefore, the system default setting for shmall or the Oracle Validated RPM value both significantly exceed the physical memory that is supported on an x86-64 architecture system.

Using Semaphores

Semaphores are used by Oracle for resource management. They serve as a post/wait mechanism by enqueues and writers for events, such as free buffer waits. Semaphores are used extensively by Oracle, and they are allocated to Oracle at instance start-up in sets by the Linux kernel. Processes use semaphores from the moment they attach to an Oracle instance waiting on semop() system calls. Semaphores are set with the sysctl command or in the /etc/sysctl.conf file. This behavior differs from that of other Unix operating systems, where the different values are often allocated individually. You can view the semaphore limits on your system with the command ipcs -ls:

[root@london1 ˜]# ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767

You use the kernel.sem parameter to control the number of semaphores on your system. The Oracle Validated RPM sets the value of kernel.sem to the following values:

kernel.sem = 250 32000 100 142

This setting allocates values for the semmsl, semmns, semopm, and semmni parameters, respectively.

Semaphores can be viewed with the operating system command ipcs -ls. The semmsl parameter defines the total number of semaphores in a semaphore set. This parameter should be set to the Oracle Validated RPM value of 250. Setting semmsl should always be considered in terms of semmns because both parameters impact the same system resources.

The semmns parameter sets the total number of semaphores permitted in the Linux system, and 32000 is the recommended default high-range value. The previously discussed semmsl value sets the maximum number of semaphores per set, while semmni sets the maximum number of semaphore sets. This means that the overall number of semaphores that can be allocated will be the minimum value of semmns or semmsl, multiplied by the value of semmni. A default value of 32000 for semmns ensures that this value is used by the system for its total semaphore limit.

The semopm parameter defines the maximum number of semaphore operations that can be performed by the semop() system call. In Linux, semop() can set multiple semaphores within a single system call. The recommended setting for this value is 100.

The semmni parameter sets the maximum number of total semaphore sets defined, and it should have a minimum value of 128. Multiplying this value by a semmsl value of 250 precisely equals the semmns setting of 32000. Although the Oracle Validated RPM sets this value to 142, it is overridden by semmns.

Unless an extremely large number of Oracle user connections are required on each node, the Oracle Validated RPM semaphore settings should be sufficient.

Setting Network Parameters

Setting network parameters correctly is especially important in a RAC environment because of its reliance on a local high-performance network interconnect between the Oracle instances. The next section describes the kernel parameters modified by the Oracle Validated RPM.

Setting the net.ipv4.ip_local_port_range Parameter

The net.ipv4.ip_local_port_range parameter sets the range of local ports for outgoing connections. The Oracle Validated RPM changes the lower port value to 1024 and the upper port to 65000. However, the OUI for some releases of Oracle 11g Release 2 reports this as an error, and instead specifies a port range of 9000 to 65500. You may either ignore this warning and maintain the Oracle Validated RPM values or modify them to the OUI values to prevent installation warnings.

Setting the net.core.* Set of Parameters

A RAC environment includes a set of four parameters that have names that begin with net.core:

net.core.rmem_default, net.core.wmem_default,
net.core.rmem_max, net.core.wmem_max

In Linux versions based on the 2.6 kernel, you no longer strictly need to set the values of these four kernel parameters for TCP values. That's because these parameters are tuned automatically. However, because they affect all protocols, they are necessary for UDP, which is the default for protocol for the Oracle RAC interconnect communication with Gigabit Ethernet.

Values for these parameters are the default setting (in bytes) of the socket receive and send buffers, as well as the maximum sizes of these buffers. The Oracle Validated RPM modifies these values to 262144, except for net.core.rmem_max, which is set to 4194304. For some releases of Oracle 11g Release 2, the OUI reports the net.core.wmem_max setting as an error; instead, it requires a value of 1048576. You may either retain the Oracle Validated RPM value or set it to the OUI required value. Setting both the manual values and the auto-tune parameters has a different effect for both the default and maximum parameters for the TCP protocol. For TCP auto-tuning, the values of tcp_rmem and tcp_wmem specify the minimum, default, and maximum values of the socket send and receive buffers. An equivalent setting to the Oracle Validated RPM configuration would look like this:

net.ipv4.tcp_rmem = 4096 262144 4194304
net.ipv4.tcp_wmem = 4096 262144 4194304

In terms of the default parameters, the auto-tune values take precedence, so you must set them to achieve Oracle's recommended value of 262144. For the maximum values, however, the static settings take precedence, overriding the auto-tune parameters.

Message Queues

Processes that include Oracle processes can communicate in an asynchronous manner with interprocess communication (IPC) messages. These messages are placed into a message queue where they can be read by another process. An example of this in an Oracle database environment is the communication that takes place between the foreground and background processes visible through the Oracle wait events, rdbms ipc message and rdbms ipc reply. The rdbms ipc message is sent by a background process when idle and waiting for a foreground process to send a message. You can view the message queue limits on your system with the command ipcs -lq, as shown:

[root@london1 ˜]# ipcs -lq

------ Messages: Limits --------
max queues system wide = 16
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536

The ipcs -lq command provides information on three message queue parameters:

  • msgmni: Sets the number of message queues identifiers, which is the number of individual message queues permitted on the system. The default value is 16, and the Oracle Validated RPM increases this to 2878.

  • msgmax: Sets the maximum message size. The default size is 8192 bytes, and the Oracle Validated RPM preserves this value. All messages queued between processes are held in memory, and msgmax cannot be greater than the size of an individual queue.

  • msgmnb: Sets the maximum combined value (in bytes) of all of the messages on an individual message queue at any one time. The default queue size is 16384, and the Oracle Validated RPM increases this to 65536 or 64KB.

Setting the Number of Open Files

The parameter that affects the number of open files is fs.file-max, and the Oracle Validated RPM sets this parameter to 327679. The fs.file-max parameter is used to set the maximum limit of open files for all processes on the system. This value is likely to be lower than the default value, which is determined dynamically based on the system memory. For example, this value is 1545510 on a x86-64 system with Oracle Enterprise Linux 5 and 16GB RAM; for 18BG of RAM, this value is 1773467. On some releases of Oracle 11g Release 2, this value is also lower than the minimum value checked by the OUI of 6815744, although you may ignore this warning if the Oracle Validated RPM setting meets your requirements. (You may also choose to change the parameter it to the OUI required value to prevent the warnings.)

The number of open and available file handles on the system can be seen in /proc/sys/fs/file-nr. In an ASM environment, using ASMLIB will reduce the total number of file descriptors across the system. Conversely, in an AMM environment, using ASMLIB increases demand on the required file descriptors across the system.

In contrast to ASMM, using AMM means that both SGA and PGA memory are managed in a unified manner by setting the memory-related parameters, MEMORY_TARGET and MEMORY_MAX_TARGET. To manage the SGA and PGA memory together, the Oracle 11g Database on Linux uses the tmpfs file system mounted at /dev/shm for a POSIX implementation of shared memory, using the mmap() and shm_open() system calls instead of shmget().

By default, tmpfs is allocated from virtual memory. Thus it can include both physical RAM and swap space, and it is sized to half of physical memory without swap space. However, unlike huge pages, which you will learn more about later in this chapter, tmpfs memory allocation is dynamic. Therefore, if tmpfs is not used, it does not take any allocation of memory, and it does not require a change in settings if AMM is not used. When AMM is used, Oracle allocates a number of memory mapped files in tmpfs. These files are allocated with at a file, or granule, size ranging from 4MB to MEMORY_MAX_TARGET of 1GB and 16MB when it is greater than 1GB. The allocated files can be identified by the shmid of the corresponding single page shared memory segment. The following example shows single page shared memory segments for both an ASM instance and a database instance:

[oracle@london1 ˜]$ ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x6422c258 557056     oracle    660        4096       0
0x10455eac 819201     oracle    660        4096       0

The allocated memory itself can be viewed in the mounted directory of /dev/shm. The zero-sized segments are mapped by Oracle processes to constitute a memory allocation up to MEMORY_MAX_TARGET, but these are not yet allocated to any of the dynamic components identified within V$MEMORY_DYNAMIC_COMPONENTS; thus they do not consume any physical memory. By default, the tmpfs file system allocates half of physical memory. Therefore, if your MEMORY_TARGET parameter exceeds this value, then the instance will fail to start with the error:

SQL> ORA-00845: MEMORY_TARGET not supported on this system

You may want to increase the size of the memory allocation in a manner that is persistent across reboots. To do this, begin by unmounting the /dev/shm file system, and then modifying the entry in /etc/fstab relating to tmpfs:

[root@london1 ˜]# more /etc/fstab
/dev/VolGroup00/LogVol00 /                      ext3    defaults          1 1
LABEL=/boot             /boot                   ext3    defaults          1 2
tmpfs                   /dev/shm                tmpfs   defaults,size=15g 0 0
devpts                  /dev/pts                devpts  gid=5,mode=620    0 0
sysfs                   /sys                    sysfs   defaults          0 0
proc                    /proc                   proc    defaults          0 0
/dev/VolGroup00/LogVol01 swap                   swap    defaults          0 0

Next, remount the file system to view the newly allocated space. Note that, until the memory is used, allocating a larger amount of memory to tmpfs does not actually result in the memory being utilized immediately:

[root@london1 ˜]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      433G  147G  264G  36% /
/dev/sda1              99M   13M   82M  14% /boot
tmpfs                  15G     0   15G   0% /dev/shm

You should keep the following in mind when working with the kernel parameter settings of the Oracle Validated RPM: when using AMM limits on memory, memory allocation is partly determined by the number of file descriptors, but it is not dependent on the shared memory parameters previously discussed in this section. Oracle recommends that the number of file descriptors be set to a value of at least 512*PROCESSES. This means that the Oracle Validated RPM setting of fs.file-max is sufficient for an Oracle processes value of up to 640. Failure to have a sufficient number of file descriptors will result in the following error: ORA-27123: unable to attach to shared memory segment. There is also a per user limit for the number of open files, in addition to the system-wide value, which by default is set to 1024. The Oracle Validated RPM modifies this per user value in the file /etc/security/limits.conf; you'll learn more about this in this chapter's section on PAM limits configuration.

When using AMM, there are some distinctions that you should be aware of in comparison to ASMM. First, if you're operating in a virtualized environment such as the Oracle VM (see Chapter 5), then MEMORY_MAX_TARGET cannot be allocated to a value greater than the memory currently allocated to the system. This limits the levels of memory that can be allocated and de-allocated to the operating system dynamically. On the other hand, SGA_MAX_SIZE can be set to larger than the available memory, which offers more flexibility in this case. Additionally, AMM cannot be configured in conjunction with huge pages. You should consider all of these factors when weighing the relative merits of using huge pages or AMM.

Configuring Asynchronous I/O

The parameter fs.aio-max-nr sets the maximum number of concurrent asynchronous I/O operations permitted on the system. By default, this parameter is set to 65536, and the Oracle Validated RPM increases this to 3145828. The actual system value can be seen in /proc/sys/fs/aio-nr. If /proc/sys/fs/aio-nr is equal to fs.aio-max-nr, then asynchronous I/O operations will be impacted. However, if you are using ASMLIB, then the value of /proc/sys/fs/aio-nr does not exceed a value of greater than 0. In this case, setting fs.aio-max-nr to any value has no effect. Changing the value of aio-max-nr does not allocate additional memory to kernel data structures, so it can be maintained at the default value.

Using Magic SysRq Keys

When the Alt+SysRq+Ctr key combination is pressed on the system console, it enables direct communication with the kernel. This functionality is primarily enabled for debugging purposes. This feature is termed the Magic SysRq key, and it is enabled in the Linux kernel of Oracle Enterprise Linux 5. By default, however, the parameter kernel.sysrq is set to 0 to disable this functionality. The Oracle Validated RPM sets this parameter to 1 to enable all of the command key combinations, although a bitmask of values between 2 and 256 can be set to limit the number of commands available. An example key combination is Alt+SysRq+b, which immediately reboots the system. The value in kernel.sysrq only enables the system request functionality when the key combination is entered on the console keyboard. The same functionality is enabled in /proc/sysrq-trigger, regardless of the value of this setting. For example, the following command from the root user will also reboot the system, even if kernel.sysrq is set to 0:

echo b > /proc/sysrq-trigger

Clusterware uses the preceding command to reset a node in the cluster.

Other notable Magic SysRq values include t, which provides a detailed debugging report on the activity of all processes in the system; and m, which provides a report on memory usage. Both of these reports can be viewed in /var/log/messages or with the dmesg command. However, because the Magic SysRq commands communicate directly with the kernel, these debugging reports should be used infrequently and not as part of commands or scripts that you run regularly for system monitoring purposes.

Setting the parameter kernel.sysrq can be a matter of choice of security policies. However, this enabled functionality requires access to the console keyboard, so it will typically mean that the user has access to the physical system, such as the power supply.

Setting PAM Limits

Linux Pluggable Authentication Modules (PAM) are responsible for a number of authentication and privilege tasks on the system. You can find a number of modules to deal with activities such as password management, and the Oracle Validated RPM in particular sets values for the pam_limits module that determines which resources on the system can be allocated by a single user session, as opposed to across the system as a whole; that is, it determines the limits that are set at the login time of a session and not cumulative values for all sessions combined. For this reason, the important session is the one that is used to start the Oracle database. These limits are set for the oracle user in the file /etc/security/limits.conf, as in the following example, which shows the settings for an x86-64 system:

oracle  soft  nofile   131072
oracle   hard   nofile    131072
oracle   soft   nproc     131072
oracle   hard   nproc     131072
oracle   soft   core      unlimited
oracle   hard   core      unlimited
oracle   soft   memlock   50000000
oracle   hard   memlock   50000000

The values can be observed at the oracle user level; use the command ulimit -a to see all parameters:

[oracle@london1 ˜]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 155648
max locked memory       (kbytes, -l) 15360000
max memory size         (kbytes, -m) unlimited
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 131072
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Alternatively, you can use the ulimit command with the relevant argument in the preceding example to see the value of an individual parameter. For all of the relevant parameters, the Oracle Validated RPM sets both the soft and hard limits to the same value. Typically, the hard value is set higher than the soft value, and the user is able to increase or decrease the session level soft value up to the hard value. However, when these values are the same, the hard limit is the single enforceable value that the user cannot exceed.

The following four parameters affect PAM limits:

  • nofile: Sets the maximum number of open files at the session level. Note that this parameter is also limited by fs.file-max at the system level. However, when using features such as AMM, the user limit of 131072 will be the enforced value in the context of the maximum number of open files for the oracle user.

  • nproc: Sets the maximum number of concurrent processes that a user can have running at any one time. nproc does not represent a cumulative value of all the processes created within an individual session.

  • core: Sets (in KBs) a limit on the size of the core dump file generated as a result of the abnormal termination of a running program. The Oracle validated RPM sets this value to unlimited.

  • memlock: Sets the maximum amount of memory that the user can lock in memory (i.e., memory allocated from physical RAM that cannot be swapped out). This parameter is particularly important in determining the memory limit (in kilobytes) that Oracle can take from a huge pages allocation. The Oracle Validated RPM sets this value (in KBs) to 50000000 (48 GB) on x86-64 and to 3500000 (3.3 GB) on x86.

Setting Kernel Boot Parameters

For AMD x86-64 architecture processors, the Oracle Validated RPM modifies the kernel boot parameters in the file /boot/grub/grub.conf to set the parameter numa=off. This parameter setting disables NUMA features (see Chapter 4 for more information); however, NUMA is not disabled on recent generations of Intel x86-64 architecture processors, which are also based on NUMA. Therefore, we recommend reviewing the NUMA section in Chapter 4 to determine the potential impact that this setting may have on different systems. We also recommend testing your configuration for its impact upon performance.

Setting Kernel Module Parameters

In addition to modifying kernel boot parameters, the Oracle Validated RPM also sets individual kernel module parameters. In particular, the script examines the file /etc/modprobe.conf for the presence of the e1000 driver, which is the device driver for Intel PRO 1000 Network Adapters. The script adds the option FlowContro=l where this driver is present. Flow control is a method of using standard Ethernet pause frames to manage communication and prevent buffer overruns in cases where the sender is sending data faster than the receiver can handle it. For the e1000 driver, the FlowControl option determines whether these pause frames are both generated and responded to. The possible values are 0, which disables the feature; 1, which means receive only; 2, which means transmit only; and 3, which enables both transmitting and receiving. The Oracle Validated RPM sets this parameter for receive only. The impact of the setting can be viewed with the ethtool application with the -a argument. Note that this setting only applies to the e1000 driver for PCI and PCI-X devices. For later PCI-Express devices, the e1000e or igb drivers will be used; therefore, this setting does not apply to the later devices.

Post Oracle Validated RPM Configuration

In the next section, we review the areas where the Oracle Validated RPM has completed the base level of configuration that is applicable to both RAC and single instance environments. However, in these areas, additional changes are warranted specifically for an Oracle Database 11g Release 2 RAC environment.

Setting the Huge Pages Kernel Parameter

An additional kernel parameter that can bring performance benefits to an Oracle Database 11g Release 2 RAC on Linux configuration is vm.nr_hugepages, which implement a feature known variously as large pages, huge pages, and hugetlb.

Setting vm.nr_hugepages with sysctl configures the number of huge pages available for the system; the default value is 0. The size of an individual huge page is dependent on the Linux operating system installed and the hardware architecture. The huge page size on your system can be viewed at the end of /proc/meminfo.

The following example for an x86-64 system with Oracle Enterprise Linux 5 system shows a default huge page size of 2MB:

[root@london1 root]# cat /proc/meminfo
...
HugePages Total:  0
HugePages_Free:   0
Hugepagesize:     2048kB

Some systems also enable the setting of a 1GB huge page size. Therefore, this page size value should always be known before setting the vm.nr_hugepages value.

Huge pages are allocated as much larger contiguous memory pages than the standard system pages. Such pages improve performance at the Translation Lookaside Buffer (TLB) level. As discussed in greater detail in Chapter 4, a larger page size enables a large range of memory addresses to be stored closer to the CPU level, increasing the speed of Oracle SGA access. This approach increases access speed by improving the likelihood that the virtual memory address of the pages that Oracle requires is already cached.

The vm.nr_hugepages parameter can be set at runtime. However, the pages are required to be contiguous. Therefore, we advise setting the required number in /etc/sysctl.conf, which is applied at boot time. Once allocated, huge pages are not pageable. This guarantees that an Oracle SGA using them will always remain resident in main memory.

The vm.nr_hugepages parameter must be set to a multiple of the huge page size to reach a value greater than the required Oracle SGA size. In addition, the Oracle user must have permission to use huge pages; this permission is granted by the memlock parameter discussed previously in this chapter.

For testing purposes, the huge page allocation can be mounted as a file system of type hugetlbfs, and memory can be utilized from this allocation using mmap(). However, at the time of writing, this mounted file system does not support AMM, nor does it support shmget() when using Oracle. Thus, there is no requirement for the hugetlbfs file system to be mounted during normal operations.

When the Oracle 11g instance starts, the shmget() call to allocate the shared memory for the SGA includes the SHM_HUGETLB flag to use huge pages, if available. If you're using ASMM or if the SGA required, then the SGA_MAX_SIZE is greater than the huge pages available. In this case, Oracle will not use any huge pages at all. Instead, it will use standard memory pages. The huge pages will still remain allocated to their pool; this will reduce the overall amount of available memory for general use on the system because huge pages will only be used for shared memory. The shmmax parameter is still applicable when using huge pages, and it still must be sized greater than the desired Oracle SGA size.

The values in the file /proc/meminfo can be viewed again to monitor the successful use of huge pages. After an Oracle instance is started, viewing /proc/meminfo will show the following:

HugePages_Total:  7500
HugePages_Free:    299
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

This output shows that 7500 huge pages were created, and the Oracle instance took 14GB of this huge page allocation for its SGA. If the instance has been started and the total remains the same as the free value, then Oracle has failed to use huge pages, and the SGA has been allocated from the standard pages. You should ensure that this is the case when starting the instance with both sqlplus and the srvctl command. It is also possible to reduce the value of the vm.nr_hugepages value after the instance has been started and re-run the sysctl -p command to release remaining unallocated and unreserved huge pages.

When setting the shmall parameter, the huge pages are expressed in terms of standard system memory pages, and one single huge page taken as shared memory will be translated into the equivalent number of standard system pages when allocated from the available shmall total. The shmall value needs to account for all of the huge pages used expressed in the number of standard pages, not just the finite Oracle SGA size. setting shmall to the number of huge pages Failing to account for all of the huge pages would result in the following Oracle error when starting the instance:

ORA-27102: out of memory
Linux-x86-64 Error: 28: No space left on device

When correctly managed, huge pages can both provide performance benefits and reduce the memory required for page tables. For example, the following snippet from /proc/meminfo shows the memory consumed by page tables for a huge pages configuration on x86-64:

[root@london1 ˜]# cat /proc/meminfo
...
PageTables:      24336 kB
...
HugePages_Total:  7500
HugePages_Free:    331
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

This allocation is not fixed because page tables are built dynamically for all processes as required. However, if the parameter PRE_PAGE_SGA is set to true, then these page tables are built at the time of process creation. PRE_PAGE_SGA can impact both instance and session startup times, as well as memory consumption. Therefore, we only recommend setting this parameter in an environment where its impact has been assessed. For example, it may be applicable in a system where performance is being analyzed and the number of sessions created is fixed for the duration. However, this parameter is less applicable in cases where there a number of sessions that connect to and disconnect from the database in an ad-hoc manner. The following example shows a system without huge pages, where the page tables have already grown to more than 1GB in size:

[oracle@london1 ˜]$ cat /proc/meminfo
PageTables:    1090864 kB
...
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

However, there are caveats that should be kept in mind when using huge pages. As noted previously, huge pages are only used with shared memory; therefore, they should not be allocated to more than is required by the SGA. Additionally, at the time of writing, huge pages are not compatible with Oracle VM or AMM, as discussed earlier. Therefore using huge pages requires more manual intervention by the DBA for memory management. For these reasons, we recommend that you test the use of huge pages to assess whether the additional management delivers performance and memory efficiency benefits for your environment.

I/O Fencing and the Hangcheck-Timer Kernel Module

In a RAC environment, Oracle needs to rapidly detect and isolate any occurrence of node failure within the cluster. It is particularly important to ensure that, once a node has entered an uncertain state, it is evicted from the cluster to prevent possible disk corruption caused by access that is not controlled by the database or cluster software. With Oracle RAC at 10g and 11.1, eviction at the Oracle software level was managed by Oracle Clusterware's Process Monitor Daemon (OPROCD). On the Linux operating system, eviction was ensured by the hangcheck-timer kernel module.

When Clusterware was operational, both the oprocd and the hangcheck-timer were run at the same time. Although the hangcheck-timer remains a requirement for RAC at Oracle 11g Release 1, it is no longer a requirement for RAC at Oracle 11g Release 2. At this release, there is also no oprocd daemon, and the functionality has been implemented within Clusterware's cssdagent and cssdmonitor processes, without requiring the additional hangcheck-timer kernel module. The timeout value for resetting a node remains equal to Oracle Clusterware's misscount value. This configurable value is given in the file crsconfig_params and set when the root.sh script is run as part of the Grid Infrastructure software installation. By default, the value of CLSCFG_MISSCOUNT is left unset in Oracle 11g Release 2, and the parameter takes the default value of 30 seconds as confirmed with the crsctl command after installing the Grid Infrastructure software:

[oracle@london1 bin]$ ./crsctl get css misscount
30

If you do wish to configure an additional timer with respect to this misscount value in Oracle 11g Release 2, then we recommend evaluating and testing the optional functionality provided by IPMI, rather than using the hangcheck-timer kernel module (you'll learn more about this later in this chapter). An IPMI-based timer can provide functionality above and beyond Oracle Clusterware. This is because it can be implemented in a hardware-independent fashion from the operating system itself. In this case, it is possible to reset a system undergoing a permanent hang state. If activating an IPMI-based timer, we recommend a timeout value of at least double the configured misscount value. This will both complement Clusterware's functionality, but also enable Clusterware to serve as the primary agent in evicting failed nodes from the cluster.

Configuring the oracle user

As noted previously in this chapter, the oracle user is created by the Oracle Validated RPM. This user can be the owner of both the Oracle database and Grid Infrastructure software. If you do not need to explicitly create additional users for your environment, then we recommend maintaining this default ownership. However, if you wish to have a different owner for the Grid Infrastructure software, then you may also create the grid user using the methods detailed previously for the oracle user. You must also ensuring that the oinstall group is specified for the initial group. Next, we will focus on the configuration of the environment of the oracle user only as owner of all installed Oracle software, including the database software, the Grid Infrastructure software of Clusterware, and ASM.

Creating the Oracle Software Directories

Before installing the Oracle software, you need to create the directories to install Oracle 11g Release 2. These directories correspond to the environment variables you will set for the oracle user. The directories you create will depend on whether you wish to implement a configuration adhering to the OFA; unless you have a policy in place that prevents you from being OFA-compliant, then we recommend that you follow the OFA guidelines for directory configuration. For an OFA-compliant configuration, you will create a parent directory of /u01 or later or a separate /u01 disk partition under the root file system. All other directories containing Oracle software will be created under this directory.

When the Oracle software is installed, the OUI will search for an OFA directory configuration the oracle user has permission to write to, such as /u01/app. The OUI will use this directory for the Oracle software configuration. The OUI will create the oraInventory directory in this location with permissions set to the oinstall group. This ensures that if multiple users are configured, all users will be able to read and modify the contents of oraInventory. If the ORACLE_BASE environment variable is set, then this takes preference over the OFA-compliant directory structure. Of course, if ORACLE_BASE is set to the OFA-compliant directory of /u01/app/oracle, then the outcome is the same. If ORACLE_BASE is not set and an OFA directory structure is not available, then the oracle user directory is used for software installation. However, the default permissions in this configuration do more than prevent the oracle user having ownership of the Oracle software.

If basing your installation on OFA, then your directory structure will be based on an ORACLE_BASE for the oracle user, but on a separate ORACLE_HOME directory also termed the ORA_CRS_HOME for the Grid infrastructure software. For the Grid Infrastructure software, the subdirectory structure must be owned by the root user once it's operational. Therefore, the directory must be installed directly under ORACLE_BASE. The/u01/app should also be owned by root, but with a group of oinstall to permit access to the oraInventory directory. Similarly, installations owned by the oracle user should be under the ORACLE_BASE directory. For the database ORACLE_HOME, the directory structure should be owned by the oracle user.

When configuring an OFA-compliant structure for both the Grid Infrastructure software and Oracle database software, we recommend that you create the ORACLE_BASE directory and allow the OUI to create the ORACLE_HOME and ORA_CRS_HOME to default specifications.

Issue the following commands as root to implement this minimum requirement:

[root@london1 root]# mkdir -p /u01/app/oracle
[root@london1 root]# chown -R root:oinstall /u01
[root@london1 root]# chmod -R 775 /u01

Using mkdir, the directories specified by ORA_CRS_HOME and ORACLE_HOME can also be pre-created to use the preferred directory structure instead of the default values from the OUI, which are /u01/app/11.2.0/grid/ for the Grid Infrastructure software and /u01/app/oracle/product/11.2.0/dbhome_1/ for the Database software. Initially, the permissions on these directories should be set to 775. After a successful default Grid infrastructure software installation, however, some of the directory ownership and permissions will have been changed to an owner of root, group of oinstall, and permissions of 755.

[root@london1 /]# ls -ld /u01
drwxr-xr-x 3 root oinstall 4096 Sep 17 14:03 /u01
[root@london1 /]# ls -ld /u01/app
drwxr-xr-x 5 root oinstall 4096 Sep 17 15:25 /u01/app
[root@london1 /]# ls -ld /u01/app/11.2.0/grid/
drwxr-xr-x 64 root oinstall 4096 Sep 17 15:32 /u01/app/11.2.0/grid/

Under the grid directory itself, some of the directories will be owned by root, such as bin, crs, css, and gns. However, the majority of the directory structure will retain ownership under the oracle user (or grid user, if you have chosen a separate owner for the Grid Infrastructure software). The default ORACLE_HOME and oraInventory directory will retain the following permissions after the Database software has also been installed:

[root@london1 /]# ls -ld /u01/app/oracle/product/11.2.0/dbhome_1/
drwxr-xr-x 72 oracle oinstall 4096 Sep 17 17:07
[root@london1 /]# ls -ld /u01/app/oraInventory/
drwxrwx--- 5 oracle oinstall 4096 Sep 17 17:10

Setting Environment Variables

With the oracle user created and the password supplied by the Oracle Validated RPM, it is possible to log in with the newly created oracle user account. The oracle user account will be configured with a default ˜/.bash_profile in the oracle user's home directory, which will resemble the following:

[oracle@london1 oracle]$ cat ˜/.bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ˜/.bashrc ]; then
        . ˜/.bashrc
fi
# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH
unset USERNAME

If you are using the OFA directory structure, you do not need to update the default ˜/.bash_profile file with any environment variables before installing the Oracle Grid Infrastructure or Oracle database software. The installation will create the default directory structure detailed in the previous section. In fact, Oracle explicitly recommends not setting the environment variables ORA_CRS_HOME, ORACLE_HOME, ORA_NLS10, or TNS_ADMIN. And should you set the ORACLE_BASE environment variable only if you wish to change the installation locations from the default OFA.

Oracle has previously asserted that the ORACLE_BASE environment variable will become mandatory in future versions; however, at version 11g, it is not required if you are satisfied with the default such as /u01/app/oracle. Oracle recommends that you not set the additional environment variables because they are used extensively in the installation scripts. Therefore, not setting the variables ensures that there are no conflicting locations defined. If you do set the ORACLE_HOME environment variable, then your settings will override the default installation location for both the Grid Infrastructure software and Oracle database software. Thus, you should be aware of the correct relationship between the ownership and permissions of the different directory structures. You should also ensure that your settings comply with the appropriate guidelines.

After installation, you should set the required environment variables for the operation and administration of your system. Within the ˜/.bash_profile file, you should also set the following set the value of umask to 022.

The umask command applies a default mask to the permissions on newly created files and directories. The value 022 ensures that these are created with permissions of 644; this prevents other users from writing to the files created by the Oracle software owner.

Note

The ˜/.bashrc file is primarily for setting user-defined aliases and functions with environment variables being added directly to the ˜/.bash_profile file. Oracle does not require any additional aliases and functions, so there is no need to modify this file.

For Oracle 11g Release 2 RAC, the following environment variables should be set for the oracle user: PATH, ORACLE_SID, ORACLE_HOME, and ORACLE_BASE. Environment variables must be configured identically on all nodes except for ORACLE_SID, which is instance specific. The correct setting of the ORACLE_SID variable is dependent upon the chosen approach for workload management; you can learn more about this in Chapter 11. The following sections detail the purpose of the most relevant environment variables applicable to an oracle user installing and administering 11g Release 2 RAC on Linux.

Setting the ORACLE_BASE Variable

The ORACLE_BASE environment variable defines the root of the Oracle database software directory tree. If the ORACLE_BASE environment variable is not set explicitly, it will default to first to an OFA-compliant configuration. If such a configuration doesn't exit, it will default to the value of the oracle user's home directory. However, you should set your ORACLE_BASE environment variable to the setting defined by the OUI after installation is complete.

All Oracle database software installed on the system is located below the directory specified by ORACLE_BASE, which would be similar to /u01/app/oracle if you're using the recommended OFA-compliant directory structure. The oracle user must have been granted read, write, and execute privileges on this directory. If you're installing Oracle 11g Release 2 software on an ASM Cluster File System (ACFS), on an OCFS2 file system on SAN, or on NAS storage presented as an NFS file system, then the ORACLE_BASE and the directories below it can be located on this shared storage with a single directory. Also, the software installation should be shared between all of the nodes in the cluster. It is important to distinguish between the location of the Oracle 11g Release 2 database software and the Oracle Grid Infrastructure software, which must be installed on storage local to the node and not on a shared file system.

As noted previously, the Oracle Grid Infrastructure Software is not installed below ORACLE_BASE, but in a directory parallel to such as /u01/app/11.2.0/grid. This directory is known variously as the ORA_CRS_HOME, CRS_HOME, or GRID_HOME and ownership belongs to the root user. The /u01/app directory also belongs to the root user, but it goes under the oinstall group to enable the oracle user access to the oraInventory location. How you set ORACLE_BASE also determines the default location of the Oracle 11g Database DIAGNOSTIC_DEST parameter, which replaces the previous background, user, and core dump destinations.

Setting the ORACLE_HOME Variable

The ORACLE_HOME environment variable specifies where the Oracle database software is installed. The ORACLE_HOME location must not be the same as ORA_CRS_HOME. For the Oracle database software a typical default location would be /u01/app/oracle/product/11.2.0/dbhome_1 if the ORACLE_HOME environment variable is not set before installation, the actual default location is determined by the OUI in a path dependent on whether ORACLE_BASE is also set. If ORACLE_BASE is set the ORACLE_HOME environment variables can also be set temporarily to ORA_CRS_HOME before installing the Grid Infrastructure software to ensure that the Grid Infrastructure Software is installed in a separate location from the ORACLE_BASE. After installation the ORACLE_HOME is used in many configuration environments to identify the location of the Oracle database software and configuration files and if accepting the default OUI values you should ensure that it is set correctly to these values after the install is complete.

Configuring the ORACLE_SID Variable

The ORACLE_SID environment variable defines the Oracle system identifier (SID) of the database instance on an individual node. In a standard, single-instance Oracle environment, the ORACLE_SID is the same as the global database name. In a RAC environment, each node has an instance accessing the same database. Therefore, each instance should have a different value for ORACLE_SID. We recommend that these values should take the form of the global database name, which begins with an alphabetic character, can be up to eight characters in length, and has a suffix for which the default is the instance number. For example, a database with a global database name of PROD might have these corresponding ORACLE_SID values on different nodes: PROD1, PROD2, PROD3, and PROD4.

If you're configuring an administrator-managed approach to workload management, then you can set the ORACLE_SID statically set for each node in the cluster, as well as its corresponding instance. If you're configuring a policy managed database, however, then you don't want to fix the ORACLE_SID on a particular node. Instead, you want to allocate it dynamically to nodes according to the workload policy. For this reason, if the ORACLE_SID environment variable is statically set, it may not correspond to the instance running on a node at a particular point in time. Therefore, it should reference the name of the current running instance before being set accordingly. Alternatively, under a policy managed database, the instances may be pinned to their respective nodes to ensure that Clusterware maintains the same node number order. Thus, you might pin the same ORACLE_SID allocation with this command:

crsctl pin css -n
...

Setting the ORA_CRS_HOME Variable

The ORA_CRS_HOME environment variable specifies the directory located parallel to the ORACLE_BASE directory. This is the directory where the Oracle Grid Infrastructure software is installed, as in this example:

/u01/app/11.2.0/grid

The Grid Infrastructure software must not be installed on a shared file system, even if the ORACLE_HOME variable has been configured in this way. We do recommend adhering to the OFA standards in this case. However, if you do not wish to adhere to them, then the Grid Infrastructure software may be installed in any location that is not the existing ORACLE_HOME for the Oracle database software, as long as ownership can be set to the root user. The ORACLE_HOME environment variable may not be used directly by the OUI; instead, the ORACLE_HOME environment variable must temporarily be set to this value when installing the Grid Infrastructure software to a particular user-specified directory. Once operational, however, this environment variable provides a distinct value for identifying the directory where the Grid Infrastructure software is installed. The ORA_CRS_HOME should also be set in the PATH environment variable of the root user.

In the 11g Release 2 documentation, the ORA_CRS_HOME variable is also interchangeably referred to as CRS_HOME or GRID_HOME. ORA_CRS_HOME is an environment variable set and used by some Oracle configuration scripts, so some people advise that you should not set it. Conversely, you may see conflicting advice that provides guidance on how to set this environment variable. We recommend that you set ORA_CRS_HOME, but ensure that it is set to the correct location, such as /u01/app/11.2.0/grid. If you choose not to set ORA_CRS_HOME, then you should set an environment variable (such as GRID_HOME) to use as your environment reference to the Grid Infrastructure software.

Setting NLS_LANG and ORA_NLS10

The NLS_LANG environment variable specifies the client-side language, territory, and character set. This variable enables Oracle to perform automatic conversion between the client and database character sets. If NLS_LANG is not set, then the default value will be based on locale settings (and the value of LANG environment variable in particular):

[oracle@london1 ˜]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

In this default setting, the client NLS_LANG configuration would be AMERICAN_AMERICA.AL32UTF8. Therefore, it is only necessary to set NLS_LANG if you wish the Oracle client-side localization settings to differ from the operating system environment. The environment variable ORA_NLS10 is set by default to the path $ORACLE_HOME/nls/data. This path determines the location of Oracle message-related data specific to a particular locale. In a default environment setting, ORA_NLS10 is not required unless the message files are located in a nondefault directory.

Configuring the TNS_ADMIN Variable

The environment variable TNS_ADMIN specifies the path of a directory containing the Oracle network configuration files, such as tnsnames.ora. If TNS_ADMIN is not set, then the default location for the network configuration files is used; that is, $ORACLE_HOME/network/admin will be used. Oracle recommends not setting TNS_ADMIN before software installation.

Setting the PATH Variable

The PATH environment variable defines the executable search path, and it should include both the $ORACLE_HOME/bin directory, the $ORA_CRS_HOME/bin directory for the oracle user, and the $ORA_CRS_HOME/bin directory. For example, this snippet defines the search path for the root user:

PATH=$PATH:$ORACLE_HOME/bin:$ORA_CRS_HOME/bin;

Setting the LD_LIBRARY_PATH Variable

The LD_LIBRARY_PATH environment variable specifies a list of directories that the runtime shared library loader searches for shared libraries. For security reasons, shared libraries found within this list take priority over default and compiled loader paths; therefore, only trusted libraries should be included in this list. Programs with the set-uid privilege always ignore the settings of LD_LIBRARY_PATH. You can also run the command ldconfig to set system-wide custom runtime library paths in the file /etc/ld.so.conf. For Oracle, the LD_LIBRARY_PATH environment variable is not required for the standard server software. However, it may be needed for other Oracle or third-party products that use shared libraries, such as the Oracle Orion tool (see Chapter 4 for more information about the Oracle Orion tool). If this variable is needed, it should be set to include the $ORACLE_HOME/lib and $ORA_CRS_HOME/lib directory.

Setting the JRE_HOME and CLASSPATH Variables

The JAVA_HOME and JRE_HOME environment variables are required for nondefault Java-based utilities such as jdbc or sqlj. These variables are typically set to the default locations of $ORACLE_HOME/jdk and $JAVA_HOME/jre/bin, respectively. The CLASSPATH environment variable specifies a list of directories and class libraries to be searched by the Java loader. In a default environment, these Java-related environment variables are not required. Therefore, they are ignored by utilities such as dbca because the location of the Oracle JDK is specified in the dbca scripts itself.

Configuring the ORACLE_PATH and SQLPATH Variables

The environment variables ORACLE_PATH and SQLPATH are set to a directory name that contains the location of a directory that SQL*Plus searches for SQL scripts. If these environment variables are not set, then no default value is enabled. The most useful script found in this location is login.sql, which enables the customization of the SQL*Plus profile for the oracle user.

Setting the DISPLAY Variable

The DISPLAY environment variable specifies an X Window display that graphics should be displayed to. This environment variable should be set to a server name followed by a colon, an X server number followed by a period, and a screen number. In most cases, when displaying directly onto the default X display of a system, the server number and screen number will both be zero. An example setting might be london1:0.0. The most common exception occurs when running multiple instances of software, such as Virtual Network Computing (VNC), for displaying graphics across the Internet.

Setting the TEMP and TMPDIR Paths

By default, the directory used for the storage of temporary files on Linux is usually the /tmp directory. In this directory, Oracle will create files, such as installation log files; and utilities will create files to track values, such as process identifiers. The /tmp directory should not be confused with the intended location of the Oracle TEMP tablespace, which must reside in a permanent storage area and not in /tmp. If you want Oracle to use a directory other than /tmp for temporary storage, both TEMP and TMPDIR should be set to the path of this directory.

Putting Environment Variables to Work

Now let's look at an example ˜/.bash_profile file that shows how to configure several environment variables and settings for a post-installation Oracle 11g Release 2 RAC node. Note that the export command is used on each environment variable line both to set the value of the variable and to ensure that this value is passed to all child processes created within the environment:

[oracle@london1 oracle]$ cat ˜/.bash_profile
# .bash_profile

if [ -t 0 ]; then
stty intr ^C
fi

# Get the aliases and functions
if [ -f ˜/.bashrc ]; then
        . ˜/.bashrc
fi


# User specific environment and startup programs
umask 022
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/11.2.0/dbhome_1
export ORA_CRS_HOME=/u01/app/11.2.0/grid
export ORACLE_SID=PROD1
export PATH=$ORACLE_HOME/bin:$ORA_CRS_HOME/bin:$PATH

These environment variables would be set automatically on login or directly with the following command:

[oracle@london1 oracle]$ source .bash_profile

Completing the Linux Configuration for RAC

In this section, we look at the actions not completed by the Oracle Validated RPM, but which are required to complete the Linux configuration in preparation for the installation of the Oracle Clusterware and Oracle database software for RAC functionality. This section pays particular attention to configuring the shared storage between the nodes in the cluster.

Configuring Hostnames and Name Resolution

After installation, the hostname of the system is set in the file /etc/sysconfig/network. The file will also contain content for enabling the network and for defining the gateway.

If you need to change the system hostname for any reason, then the recommended method is to do this through the graphical user interface. Begin by selecting Network from the System and then Administration menu, and the hostname can be modified under the DNS tab of this utility.

Warning

On any Oracle 11g RAC system, neither the hostname nor the domain name should be modified after the Grid Infrastructure software has been installed.

If you provided the IP address of your working DNS Server during the operating system installation, then some of the hostname and name resolution information will have already been configured for you. If this is the case, you should verify the contents of the file /etc/resolv.conf. Alternatively, you could configure the file manually for a DNS Server added after the operating system has been installed on the cluster nodes. In both cases, the file should reference the search path for the domain in use. The following example for a GNS configuration shows both the domain and subdomain:

[root@dns1 named]# cat /etc/resolv.conf
search example.com grid1.example.com
nameserver 172.17.1.1
options attempts: 2
options timeout: 1

Note

The subdomain is not required for a manual IP configuration.

You should also use the file /etc/nsswitch.conf to verify the order of name services that are queried. In particular, you want to ensure that references to NIS are not found before the DNS:

[root@dns1 ˜]# vi /etc/nsswitch.conf
#hosts:     db files nisplus nis dns
hosts:      files dns

You should use the nslookup command to verify that the DNS name resolution, as explained earlier in this chapter's "Drilling Down on Networking Requirements" section.

For both GNS and manual IP configurations, we recommend that the /etc/hosts file be kept updated with the public and private interconnect addresses of all the nodes in the cluster. This file should also include the domain name extension to be included as an alias in the host file for the fixed public IP addresses.

Keeping the file updated ensures that there is no break in operations during any temporary failure of the DNS service. Again, the preferred method is to modify the settings using the graphical user interface. Navigate to the same utility used to set the hostname and select the Hosts tab. At this point, you have the option to add, edit, or delete the address, hostname, and aliases of hosts relevant to the cluster. You should also modify the default hostnames, removing the configured hostname from the 127.0.0.1 loopback address to prevent potential name resolution errors in the Oracle software. For example, these kinds of errors can occur when using the srvctl utility. Only the local hostname should be associated with this loopback address. It is particularly important that you do not remove this loopback address completely; removing it will result in errors in Oracle network components, such as the listener. Therefore, you should also remove or comment out all of the special IPv6 addresses. The following /etc/hosts file shows the details for a two-node cluster for a manual configuration:

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost
172.17.1.101            london1.example.com london1
172.17.1.102            london2.example.com london2
172.17.1.201            london1-vip.example.com london1-vip
172.17.1.202            london2-vip.example.com london2-vip
192.168.1.1             london1-priv
192.168.1.2             london2-priv

For a GNS configuration, the VIP addresses should not be included in the /etc/hosts file. As noted previously in this chapter, SCAN addresses should not be configured in /etc/hosts for either GNS or a manual IP configuration.

Using NTP

All nodes in an Oracle cluster must have exactly the same system time settings. If the system clocks are not synchronized, you may experience unpredictable behavior in the cluster. For example, you might fail to successfully register or evict a node as required. Also, manually adjusting the clock on a node by a time factor of minutes could cause unplanned node evictions. Therefore, we strongly advise that you synchronize all systems to the same time and make no major adjustments during operation. ("Major adjustments" do not include local time adjustments, such as regional changes for daylight saving time.)

Within Linux, the most common method to configure time synchronization is to use the Network Time Protocol (NTP). This protocol allows your server to synchronize its system clock with a central server. Your preferred time servers will be available from your Internet service provider. Alternatively, if these are not available, you can choose from a number of open access public time servers.

In Oracle 11g Release 2 RAC, the Oracle Cluster Time Synchronization Service Daemon is always operational. However, if NTP is active, CTSS only monitors the status of time synchronization. If NTP is not available, then CTSS will synchronize the time between the nodes of the cluster itself without an external time source. For this reason, operating with NTP is preferable. After installing the Grid Infrastructure software, you can monitor the operation of CTSS in the octssd.log in the $ORA_CRS_HOME/log/london2/ctssd directory, as in this example:

[cssd(26798)]CRS-1601:CSSD Reconfiguration complete. Active nodes are london1 london2 .
2009-09-17 15:34:32.217
[ctssd(26849)]CRS-2403:The Cluster Time Synchronization Service on host london2 is in observer mode.
2009-09-17 15:34:32.224

A good practice within a network environment is to configure a single, local dedicated time server to synchronize with an external source. You should also set all internal servers to synchronize with this local time server. If this method is employed, you can use the configuration detailed momentarily to set the local time server to be the preferred server. This approach configures additional external time servers in case of the local one fails. If you do not have NTP available, then you should ensure that the NTP service is disabled. You should not run NTP without an external time source available because the CTSS service will remain in observer mode for potentially unsynchronized times, as reported in the octssd.log.

You can instruct the Oracle server to use NTP on Oracle Enterprise Linux systems from the graphical user interface by right-clicking the Date and Time tab displayed on your main panel, and then selecting Adjust Date & Time. Next, select the second tab on this page and add your NTP servers using the process detailed previously during the installation process. You can also configure NTP manually using the chkconfig and service commands to modify the /etc/ntp.conf and /etc/ntp/step-tickers files. To do this, begin by first ensuring that the NTP service has been installed on the system by using the chkconfig command:

[root@london1 root]# chkconfig  --list ntpd
ntpd   0:off  1:off  2:off  3:off  4:off  5:off  6:off

Next, manually edit the /etc/ntp.conf file and add lines specifying your own time servers, as in the following examples:

server ntp0.uk.uu.net
server ntp1.uk.uu.net
server ntp2.uk.uu.net

If you have a preferred time server, add the keyword prefer to ensure synchronization with this system, if available:

server ntp0.uk.uu.net prefer

To use open-access time servers, enter the following information:

server 0.rhel.pool.ntp.org
server 1.rhel.pool.ntp.org
server 2.rhel.pool.ntp.org

You will also have a default restrict line at the head of your configuration file. However, another good practice is to include server-specific security information for each server. This prevents that particular NTP server from modifying or querying the time on the system, as in the following example:

restrict ntp0.uk.uu.net mask 255.255.255.255 nomodify notrap noquery

Next, modify your /etc/ntp/step-tickers and add the same servers listed in your /etc/ntp.conf file:

server ntp0.uk.uu.net
server ntp1.uk.uu.net
server ntp2.uk.uu.net

Now use the chkconfig command to make sure that the NTP daemon will always start at boot time at run levels 3 and 5:

[root@london1 root]# chkconfig --level 35 ntpd on

You're now ready to verify the configuration, which you accomplish using chkconfig --list:

root@london1 root]# chkconfig  --list ntpd
ntpd  0:off  1:off  2:off  3:on  4:off  5:on  6:off

The start-up options for the NTP daemon are detailed in the /etc/sysconfig/ntpd file. In normal circumstances, the default options should be sufficient. You can now start the service to synchronize the time using the following command:

[root@london1 root]# service ntpd start

Next, the date command can be used to query the system time to ensure that the time has been set correctly. The NTP daemon will not synchronize your system clock with the time server if they differ significantly. If the system clock and time server differ too much, the NTP daemon will refer to the systems in the /etc/ntp/step-tickers files to set the time correctly. Alternatively, the time can be set manually using the ntpdate command:

[root@london1 root]# ntpdate -u -b -s ntp0.uk.uu.net

Configuring Secure Shell

During the installation of Oracle 11g Release 2 RAC software, a secure shell (ssh) configuration is required on all nodes. This ensures that the node on which the installer is initiated can run commands and copy files to the remote nodes. The secure shell must be configured, so no prompts or warnings are received when connecting between hosts. During the installation of the Grid Infrastructure software, the OUI provides the option to both test and automatically configure a secure shell between the cluster nodes. Therefore, we recommend that you not configure a secure shell in advance, but instead permit the OUI to complete this part of the configuration.

You should review the following steps to troubleshoot and verify the actions taken by the installer only if the connectivity test following automatic configuration fails. Additionally, these steps for the manual configuration of ssh are required for the Oracle Cluster Health Monitor user called crfuser (you will learn more about this in Chapter 12).

To configure a secure shell on the cluster nodes, first run the following commands as the oracle user to create a hidden directory called ˜/.ssh if the directory does not already exist (we use the standard tilde character [˜] here to represent the location of the oracle user's home directory):

[oracle@london1 oracle]$ mkdir ˜/.ssh
[oracle@london1 oracle]$ chmod 755 ˜/.ssh

Now create private and public keys using the ssh-keygen command. Next, accept the default file locations and enter an optional passphrase, if desired:

[oracle@london1 ˜]$ /usr/bin/ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/oracle/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/oracle/.ssh/id_rsa.
Your public key has been saved in /home/oracle/.ssh/id_rsa.pub.
The key fingerprint is:
a7:4d:08:1e:8c:fa:96:b9:80:c2:4d:e8:cb:1b:5b:e4 oracle@london1

Now create the DSA version:

[oracle@london1 ˜]$ /usr/bin/ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/oracle/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/oracle/.ssh/id_dsa.
Your public key has been saved in /home/oracle/.ssh/id_dsa.pub.
The key fingerprint is:
dd:14:7e:97:ca:8f:54:21:d8:52:a9:69:27:4d:6c:2c oracle@london1

These commands will create four files in ˜/.ssh called id_rsa, id_rsa.pub, id_dsa, and id_dsa.pub. These files contain the RSA and DSA private and public keys.

In the .ssh directory, copy the contents of the id_rsa.pub and id_dsa.pub files to a temporary file. This file will be copied to all other nodes, so you use the hostname to differentiate the copies:

[oracle@london1 oracle]$ cat id_rsa.pub id_dsa.pub > london1.pub

Repeat this procedure for each host in the cluster, and then copy the public key file to all other hosts in the cluster:

[oracle@london1 oracle] scp london1.pub london2:/home/oracle/.ssh

Next, concatenate all the public key files into /ssh/authorized_keys on each host in the cluster:

cat london1.pub london2.pub > authorized_keys

Finally, set the permissions of the authorized keys file on all nodes:

[oracle@london1 oracle]$ chmod 644 authorized_keys

If no passphrase was specified, ssh and scp will now be able to connect across all nodes. If a passphrase was used, then these two additional commands should be run in every new bash shell session to prevent a prompt being received for the passphrase for every connection:

[oracle@london1 oracle]$ ssh-agent $SHELL
[oracle@london1 oracle]$ ssh-add

Enter the passphrase, and the identity will be added to the private key files. You can test the ssh and scp commands by connecting to all node combinations, remembering to check the connection back to the node you are working upon. Connections should be tested across both public and private networks:

[oracle@london1 oracle]$ ssh london1
[oracle@london1 oracle]$ ssh london2
[oracle@london1 oracle]$ ssh london1-priv
[oracle@london1 oracle]$ ssh london2-priv

All combinations should be tested. On the first attempted connection, the following warning will be received, and the default of answer yes should be entered to add the node to the list of known hosts. Doing so prevents this prompt from stalling the Oracle installation:

[oracle@london1 .ssh]$ ssh london1
The authenticity of host 'london1 (172.17.1.101)' can't be established.
RSA key fingerprint is 6c:8d:a2:13:b1:48:03:03:74:80:38:ea:27:03:c5:07.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'london1,172.17.1.101' (RSA) to the list of known hosts.

If the ssh or scp command is run by the oracle user on a system running an X Windows–based desktop from another user, then the session will receive the following warning unless X authority information is available for the display:

Warning: No xauth data; using fake authentication data for X11 forwarding.

This warning is received because, by default, ssh is configured to do X11 forwarding, and there is no corresponding entry for the display in the ˜/.Xauthority file in the home directory of the oracle user on the system where the command was run. To prevent this warning from occurring during the Oracle software installation, edit the file /etc/ssh/ssh_config and change the line ForwardX11 yes to ForwardX11 no. Next, restart the sshd service, as shown here:

[root@london1 root]# servce sshd restart

X11 forwarding will now be disabled, and the warning should not be received for ssh or scp connections. Alternatively, you can use the following entries to disable X11 forwarding at a user level. Do this by creating a file called config in the .ssh directory of only the oracle user:

[oracle@london1 .ssh]$ cat config
Host *
ForwardX11 no

If X11 forwarding is disabled at the system or user level, you can still use the ssh or scp command with the -X option request to forward X Window display information manually, as in the following example:

[oracle@london1 oracle]$ ssh -X london2

If X11 forwarding is re-enabled after installation, then using ssh with the -x option specifies that X information is not forwarded manually:

[oracle@london1 oracle]$ ssh -x london2

Configuring Shared Storage

Before installing Oracle 11g Release 2 RAC, it is necessary, at minimum, for shared storage to be available for the Oracle Cluster Registry (OCR) and the Clusterware voting disk. Additional shared storage will also be required for the database files before creating the database. However, an Oracle software-only installation may be performed when creating a database at a later point in time.

This storage must present the same disk images to all of the nodes in the cluster for shared access; it must also be configured with this shared access in mind. For example, a file system type, such as ext3, can be used with a single mount point only. Therefore, it is not suitable for formatting the shared disk storage used in a clustered environment.

Configuring storage successfully is vital for providing a solid foundation for RAC. You can learn about the available storage options in general in Chapter 4; and you can learn more about OCFS version 2 in the context of virtualization in Chapter 5. We recommend OCFS version 2 as the key foundation for virtualized solutions; however, OCFS version 2 also remains a valid shared storage option for the OCR, the Clusterware voting disk, and database files.

In 11g Release 2, the OCR and voting disk can be stored in an ASM diskgroup. ASM, not OCFS2, is Oracle's recommend location for these at this release. Additionally, the Oracle 11g Release 2 database software may be installed in an ACFS, OCFS2, or NFS file system. However, the Grid Infrastructure software may not be installed on a shared cluster file system. Unless you're using a shared ORACLE_HOME, we recommend installing the database software on the same local file system in a location parallel to the Grid Infrastructure software, as explained previously in this chapter. You should review Chapters 4 and 5 in advance of configuring your Linux operating system because these chapters provide in-depth information on selecting the correct storage solution for a particular environment. Whichever option you select, in the context of configuring Linux, the distinction should be clearly understood between the storage used for installing the Oracle Grid Infrastructure software and the database server software, the storage used for the OCR and Clusterware voting disk and the storage used used for holding the Oracle database files.

The OCR and Clusterware voting disk may reside on a shared configuration of ASM (but not ACFS), OCFS2, or NFS. In contrast to how things worked in previous releases, block or raw devices are not supported by the OUI in 11g Release 2. While block or raw devices are supported by the software itself, they cannot be used for installation. We recommend that you protect against disk failure by using RAID storage. and with this in place the minimum number of copies required of each is one. If RAID storage is not available, then the OCR and Clusterware voting disk must have an odd number of multiple copies; this makes three copies of each the minimum required if there is no external redundancy. Each copy of the OCR and Clusterware voting disk requires 280MB of disk space. We recommend ensuring that sufficient disk is allocated initially. You can accomplish this by reserving 500MB for each copy; this value means you won't run up against space constraints from such things as disk formatting or ASM metadata. For example, if you wish to create an ASM diskgroup with external redundancy solely for the use of the OCR and Clusterware voting disks, then this diskgroup should be a minimum of 1GB in size to accommodate both.

Oracle database files may be installed on ASM (but not ACFS), OCFS2, or NFS. Block or raw devices are not supported by the OUI for Oracle database files; however, database files can be created on block or raw devices after installation. Regardless, the recovery area cannot be created on block devices or ACFS, and none of the Oracle database files can be created on locally configured storage for RAC.

If you're using a NAS-based solution, you should refer to the vendor-specific information for your certified storage to learn the relevant details such as mount options. This will help you correctly present the NFS file systems to the nodes in the cluster. For SAN-based installations (we include iSCSI NAS in this description), additional steps are required to discover and partition the LUNs presented by the SAN storage.

Discovering and Configuring SAN Disk

The host-level component of the SAN infrastructure is the HBA, and the storage vendor compatibility matrix should be observed to ensure full system support for the card and server chosen. The HBA itself carries a significant level of the processing intelligence for the protocols it implements. In other words, the HBA performs much of the processing of these protocols itself, without consuming resources from the host CPU. The HBA determines the bandwidth supported by the host for communication with the storage. Therefore, it must be considered in terms of the bandwidth supported by the switched fabric infrastructure and storage itself. This helps ensure compatibility at the performance levels attainable. Most HBAs have the ability to auto-negotiate the speed at which they operate with the storage.

With the HBA physically installed on the PCI bus of the server, the command lspci can be used to confirm that is has been correctly seated. The following truncated output from lspci illustrates that this host has a single Emulex FC HBA:

0b:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02)

If the adapter is physically established before the operating system installation takes place, then the most appropriate driver will usually be installed for the HBA as a kernel module at this time. If the HBA is added after installation, then the most appropriate driver will be installed dynamically. The driver is identified and configured within the file /etc/modprobe.conf:

[root@london1 ˜]# cat /etc/modprobe.conf
alias eth0 e1000e
alias eth1 e1000e
alias scsi_hostadapter ata_piix
alias scsi_hostadapter1 lpfc

The loaded driver can be confirmed with the lsmod command, and the driver version and the particular options supported at this release can be viewed with the command modinfo:

[root@london1 ˜]# modinfo lpfc
filename:       /lib/modules/2.6.18-128.el5/kernel/drivers/scsi/lpfc/lpfc.ko
version:        0:8.2.0.33.3p
author:         Emulex Corporation - [email protected]
description:    Emulex LightPulse Fibre Channel SCSI driver 8.2.0.33.3p
license:        GPL

Additional information regarding the driver may also be recorded in /var/log/messages at driver load time. For example, the following snippet confirms the link status:

Jul 27 12:07:12 london1 kernel: scsi2 :  on PCI bus 0b device 00 irq 90
Jul 27 12:07:12 london1 kernel: lpfc 0000:0b:00.0: 0:1303 Link Up Event x1 received Data: x1
x1 x10 x2 x0 x0 0

As discussed in Chapter 4, the LUNs can be added as SCSI devices dynamically. You do this by removing and reinserting the FC driver module or by rebooting the system. If the operation is successful, the disks will appear in /proc/scsi/scsi, as shown by the following output:

[root@london1 ˜]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: Hitachi HDT72505 Rev: V56O
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: Hitachi HDT72505 Rev: V56O
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: DGC      Model: RAID 0           Rev: 0324
  Type:   Direct-Access                    ANSI SCSI revision: 04
Host: scsi2 Channel: 00 Id: 00 Lun: 01
  Vendor: DGC      Model: RAID 0           Rev: 0324
  Type:   Direct-Access                    ANSI SCSI revision: 04
Host: scsi2 Channel: 00 Id: 00 Lun: 02
  Vendor: DGC      Model: RAID 0           Rev: 0324
  Type:   Direct-Access                    ANSI SCSI revision: 04

At this stage, the SAN disks are successfully configured and presented with the hosts. Note that in this example the disk model is shown as RAID 0 for a performance and testing configuration. You should ensure that your disks have a RAID level to ensure more data protection than then level shown here. Next, you need to partition them using the utility fdisk or parted.

Partitioning Disks

The first step in preparing the disks for both the Clusterware and Database files irrespective of how they are to be used is to first create partitions on the disks. The example commands used to illustrate the creation of partitions on these disks is fdisk and parted.

The command fdisk -l displays all of the disks available to partition. The output for this command should be the same for all of the nodes in the cluster. However, the fdisk command should be run on one node only when actually partitioning the disks. All other nodes will read the same partition information, but we recommend adopt one of the following pair of approaches. First, you can run the partprobe command on the other nodes in the cluster to update the partition table changes. Second, you can reboot these nodes for the partition information, so they are fully updated and displayed correctly.

The following example shows the corresponding disk for one of the SCSI devices detailed previously. In this example, /dev/sdd is the external disk to be partitioned:

[root@london1 ˜]# fdisk -l /dev/sdd

Disk /dev/sdd: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes

   Device Boot      Start         End      Blocks   Id  System

To partition the drives, use fdisk with the argument of the disk device you want to partition:

[root@london1 root]# fdisk /dev/sdd

In the preceding example, the shared device /dev/sdd has been selected as the disk where a copy of the OCR and Clusterware voting disk will reside under an ASM diskgroup. We will create the single required partition of 1GB in size, passing it to the ASM configuration.

At the fdisk prompt, enter option n to add a new partition; p, to make it a primary partition; and 1, the number of the next available primary partition. Next, accept the default value for the first cylinder and enter the size of the partition in the form of the number of cylinders. In this case, that number is half of the available value. The reported number of cylinders may vary even for exactly the same disks connected to different architecture systems. Therefore, you can also specify the partition size, such as +500M for a 500MB partition. For these selections, the fdisk dialog will resemble the following:

[root@london1 Desktop]# fdisk /dev/sdd

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1011, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-1011, default 1011):
Using default value 1011

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

The newly created partition can now be displayed as follows:

[root@london1 Desktop]# fdisk -l /dev/sdd

Disk /dev/sdd: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1        1011     1048376+  83  Linux

Now either run partprobe or reboot the remaining nodes in the cluster, and then use fdisk -l to verify that all of the nodes can view the partition tables written to the disk.

We noted during Linux installation that MBR partitioning stores data in hidden sectors at the start of the disk, whereas GPT partitioning does not do this. Therefore, when using MBR partitioning either with fdisk or parted, it is important to take this data into account when creating disk partitions, especially on the disks where the Oracle database files will reside. The modifications required are dependent upon the storage type and configuration, and the aim here is to ensure that, for a RAID striped configuration, disk I/O is aligned with the storage RAID stripe size. Depending on the I/O characteristics, failure to do this may lead to an increase in disk or stripe crossing in some cases. For example, this may occur in cases where a single logical read or multiple write operations from the system will result in multiple stripe operations on the storage, thus proving detrimental to performance. In MBR formatting, the hidden sectors occupy the first 63 sectors of the disk. In a case where this storage stripe size is 64kB, it is necessary to realign the partition boundary to sector 128, where 128 sectors of 512 bytes align with the 64kB stripe size. This should be done for each disk on which the database data files will reside, as well as for the first partition on a disk. All subsequent partitions should align with the new boundary. The following example illustrates the additional fdisk commands (run in expert mode) required to realign the partition boundary for the partitioning dialog shown previously:

Command (m for help): x

Expert command (m for help): b
Partition number (1-4): 1
New beginning of data (63-1044224, default 63): 128

fdisk can still be used to format disks with the MBR formatting scheme; however, parted must be used to take advantage of GPT partitioning. Like fdisk, parted is invoked with the argument of the disk to partition. On starting, it prints the version of the software and licensing information to the screen. Next, it presents the (parted) prompt:

[root@london1 ˜]# parted /dev/sdd
GNU Parted 1.8.1
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted)

If the disk label type is not already specified as gpt (the default), then use the mklabel command to specify the new disk label type:

(parted) mklabel
Warning: The existing disk label on /dev/sdd will be destroyed and all data on
this disk will be lost. Do you want to continue?
Yes/No? Yes
New disk label type?  [msdos]? gpt

The print command is used to display the partition table. We have just created the partition table, so this example shows that the disk label type is gpt and no partitions have yet been created:

parted) print

Model: DGC RAID 0 (scsi)
Disk /dev/sdd: 1074MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End  Size  File system  Name  Flags

To create a partition, use the mkpart command. At the prompts, enter a value for the partition name and accept the default file system type. Although mkpart can be used for creating a local file system at the same time as the partition, it does not support any clustered file systems that can be shared between nodes or ASM. For the partition sizes, enter the starting value for the partition in megabytes. The first value will be 0, and the ending value will be in megabytes. The following example creates a 1074MB partition:

(parted) mkpart
Partition name?  []? crs1
File system type?  [ext2]?
Start? 0
End? 1074
(parted)

Use mkpart to create the subsequent partitions, and then enter the values for partition and file system type. Next, enter the ending point of the previous partition as the starting point of the new one; for this partition's ending point, enter the value (in MBs) of the disk size detailed in the disk geometry section. Printing the partition table now displays the created partition. You do not need to call an additional write command, and the partition table will remain on exiting the parted application:

(parted) print

Model: DGC RAID 0 (scsi)
Disk /dev/sdd: 1074MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size   File system  Name   Flags
 1      17.4kB  1074MB 1074MB               crs1

GPT partitions are not visible with the fdisk command, and one partition will be displayed, regardless of the number of partitions created:

[root@london1 ˜]# fdisk -l /dev/sdd

WARNING: GPT (GUID Partition Table) detected on '/dev/sdd'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdd: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         131     1048575+  ee  EFI GPT
...

Therefore, if you have created multiple partitions, you should view /proc/partitions to see whether the partitions have been successfully created and are available for use.

If you're using an OCFS2 cluster file system method to store the OCR and Clusterware voting disk, then the Grid Infrastructure software installation will create the OCR and Clusterware voting disk as files during the installation process. Thus the only requirement is to format the disk partitions you created previously with OCFS2, as discussed in Chapter 5. You may then specify suitable file names to be created at the time of the Grid Infrastructure installation process (see Chapter 7 for more information on this). If you wish to use ASM, there are additional configuration steps required to prepare the partitions either with ASMLIB or manually.

I/O Multipathing with Device-Mapper

I/O multipathing is a concept similar to network channel bonding, which is covered later in this chapter. Like channel bonding, I/O multipathing support requires at least two storage HBAs per server that connect to the target storage to display the LUNs through two independent paths. To completely eliminate any single point of failure, each path should also be accessed through a separate storage switch. In the event of an HBA, cable, or switch failure, a path to the storage is maintained. During regular operations, such a configuration provides load balancing in addition to redundancy.

Typically storage vendors have provided multipathing software solutions dedicated for their range of storage products. In contrast, device-mapper provides a generic solution for I/O multipathing across a range of vendors' products, including Oracle Enterprise Linux 5. Notwithstanding the generic nature of multipathing with device-mapper, you should check with your storage vendor to determine its support for this form of multipathing. You should also inquire about the specific software and configuration settings for a complete multipathing solution at both the server and storage levels.

As its name implies, device-mapper provides a method to redirect I/O between block devices. It also provides the foundation for a number of storage configurations that you have already encountered in this chapter, such as LVM and RAID.

The device-mapper-multipath is included in a default installation, or it can be installed from the installation media with the rpm or yum commands. The module is loaded with the command modprobe dm-multipath. Before proceeding with multipath configuration, it is necessary to install and configure the storage and HBA, so that the two paths to the disk devices are visible to the Linux operating system. The following example illustrates the same storage approach shown in the previous example, but with the additional, visible path to the same devices:

[root@london1 ˜]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: Hitachi HDT72505 Rev: V56O
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: Hitachi HDT72505 Rev: V56O
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: DGC      Model: RAID 0           Rev: 0324
  Type:   Direct-Access                    ANSI SCSI revision: 04
Host: scsi2 Channel: 00 Id: 00 Lun: 01
  Vendor: DGC      Model: RAID 0           Rev: 0324
  Type:   Direct-Access                    ANSI SCSI revision: 04
Host: scsi2 Channel: 00 Id: 00 Lun: 02
  Vendor: DGC      Model: RAID 0           Rev: 0324
Type:   Direct-Access                    ANSI SCSI revision: 04
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: DGC      Model: RAID 0           Rev: 0324
  Type:   Direct-Access                    ANSI SCSI revision: 04
Host: scsi3 Channel: 00 Id: 00 Lun: 01
  Vendor: DGC      Model: RAID 0           Rev: 0324
  Type:   Direct-Access                    ANSI SCSI revision: 04
Host: scsi3 Channel: 00 Id: 00 Lun: 02
  Vendor: DGC      Model: RAID 0           Rev: 0324
  Type:   Direct-Access                    ANSI SCSI revision: 04

The additional devices will also be visible with the fdisk command; depending on the storage, however, they may not be accessible.

The multipathing setup is configured in the file /etc/multipath.conf. The following example is specific to the EMC CLARiiON storage; other storage vendors can supply examples specific to their products. Begin by removing or commenting out the following section:

#blacklist {
#        devnode "*"
#}

Next, add the following to the end of the file:

defaults {
path_grouping_policy failover
user_friendly_names yes
}
multipaths {
multipath {
wwid 360060160a5b11c00420e82c83240dc11
alias       crs
mode        640
uid         500
gid         500
 }
}
devices {
    device {
        vendor "DGC"
        product "*"
        product_blacklist "(LUNZ|LUN_Z)"
        path_grouping_policy group_by_prio
        getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout "/sbin/mpath_prio_emc /dev/%n"
        path_checker    emc_clariion
        path_selector   "round-robin 0"
        features        "0"
        hardware_handler        "1 emc"
        features "1 queue_if_no_path"
        no_path_retry 300
        hardware_handler "1 emc"
        failback immediate
}

The device-mapper creates a number of devices to support multipathing. A device name such as /dev/dm-* indicates the kernel device name, and it should not be accessed directly:

[root@london1 ˜]# ls -l /dev/dm-*
brw-rw---- 1 root root 253, 2 Aug  5 15:32 /dev/dm-2
brw-rw---- 1 root root 253, 3 Aug  5 15:32 /dev/dm-3
brw-rw---- 1 root root 253, 4 Aug  5 15:32 /dev/dm-4
brw-rw---- 1 root root 253, 5 Aug  5 15:32 /dev/dm-5
brw-rw---- 1 root root 253, 6 Aug  5 15:32 /dev/dm-6
brw-rw---- 1 root root 253, 7 Aug  5 15:32 /dev/dm-7

Devices names such as /dev/mpath/mpath2 are created as symbolic links, according to udev rules defined in /etc/udev/rules.d/40-multipath.rules. It is possible to modify these rules to set the correct permissions (as detailed in the previous section). However, these modified rules will only impact these symbolic links. Instead, the device to use for access is listed under the directory /dev/mapper/, as in this example: /dev/mapper/mpath2. For this reason, the syntax for the section to define persistence is based on the WWID; the LUN unique ID; and the alias, mode, and ownership. The upcoming example illustrates the approach for explicitly naming and changing the ownership of the OCR and Clusterware voting disk. At the same time, it leaves the other devices with their default naming, which means you can apply these changes to as many devices as you wish. This is the preferred method for configuring the correct permissions of multipathed devices, as opposed to using udev rules directly, an approach you'll learn about later in this chapter. You should ensure that the ownership and permission is set equally in /etc/multipath.conf for all of the nodes in the cluster.

To verify your configuration, run the multipath command with the -d option, as in this example: multipath -v3 -d. The verbose output gives considerable detail. In particular, the paths list section is useful for identifying the devices, their paths, and their current status. If the output is satisfactory, you can run the same command without the -d option to active the configuration. Checking the /dev/mapper directory shows the created multipath devices available for use:

[root@london1 ˜]# ls -l /dev/mapper*
total 0
crw-r----- 1 root   dba       10, 63 Aug  5 15:31 control
brw-r----- 1 oracle dba      253,  3 Aug  5 15:32 crs
brw-r----- 1 oracle dba      253,  6 Aug  5 15:32 crsp1
brw-rw---- 1 root   disk     253,  2 Aug  5 15:32 mpath2
brw-rw---- 1 root   disk     253,  5 Aug  5 15:32 mpath2p1
brw-rw---- 1 root   disk     253,  4 Aug  5 15:32 mpath3
brw-rw---- 1 root   disk     253,  7 Aug  5 15:32 mpath3p1
brw-rw---- 1 root   disk     253,  0 Aug  5 15:32 VolGroup00-LogVol00
brw-rw---- 1 root   disk     253,  1 Aug  5 15:31 VolGroup00-LogVol01

Note the updated name and permission for the OCR and Clusterware voting disk in the preceding example. The multipath command also shows the active paths to the devices:

root@london1 ˜]# multipath -l1
mpath2 (360060160a5b11c006840e1703240dc11) dm-2 DGC,RAID 0
[size=102G][features=1 queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=0][active]
 \_ 3:0:0:0 sdc 8:32  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:0:0 sdf 8:80  [active][undef]
crs (360060160a5b11c00420e82c83240dc11) dm-3 DGC,RAID 0
[size=1.0G][features=1 queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=0][active]
 \_ 3:0:0:1 sdd 8:48  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:0:1 sdg 8:96  [active][undef]
mpath3 (360060160a5b11c0046fd4df53240dc11) dm-4 DGC,RAID 0
[size=299G][features=1 queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=0][active]
 \_ 3:0:0:2 sde 8:64  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:0:2 sdh 8:112 [active][undef]

You can test the multipath configuration by disabling access to the active path:

[root@london1 ˜]# echo offline > /sys/block/sdd/device/state

In the next example, the failing path is identified by the major and minor device number combination. In this case, the system logs the path failed over in response to the path being disabled:

device-mapper: multipath: Failing path 8:48.
device-mapper: multipath emc: emc_pg_init: sending switch-over command

The multipath command shows the currently active path:

[root@london1 ˜]# multipath -l
...
crs (360060160a5b11c00420e82c83240dc11) dm-3 DGC,RAID 0
[size=1.0G][features=1 queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 3:0:0:1 sdd 8:48  [failed][faulty]
\_ round-robin 0 [prio=0][active]
 \_ 2:0:0:1 sdg 8:96  [active][undef]
...

The preceding example includes more actions than simply changing paths to the devices. In EMC CLARiiON terminology, the LUN is automatically trespassed between storage processors on the storage itself in response to the switch over command to enable access through the alternate path. This additional functionality reiterates the importance of ensuring that multipathing with device-mapper is a supported configuration throughout the entire system, including the HBA and storage.

To return the device back to an active state, you should echo running instead of offline to the device state.

Partitioning on multipath devices can be performed with the fdisk command, exactly as detailed previously on the /dev/mapper/mpath0 device. However, an additional step is required to then register the created partitions with the command kpartx.

Preparing the Partitions for ASM with ASMLIB

After presenting the shared storage to the operating system and partitioning the disks, no additional software is required to use them for ASM. If you're using ASM either for the OCR and Clusterware voting disk partition or for database storage, then you may also wish to configure ASMLIB on the underlying devices beforehand. One of the main benefits of ASMLIB is that it provides persistence in the naming and permissions of the disks to be used for ASM. And while udev can also provide this functionality without requiring additional package installations beyond the default operating installation, ASMLIB was always intended to provide additional benefits for Oracle environments. As noted previously in this chapter, the use of ASMLIB reduces the requirement for file descriptors across the system. ASMLIB also implements asynchronous I/O, bypassing the standard Linux implementation of asynchronous I/O. However, the main benefits of ASMLIB are in the area of manageability and flexibility.

Once configured, ASMLIB provides a standard method for identifying the presence of ASM disks on the system. It is also possible to add additional nodes to the cluster or transfer the disks between different architecture systems. Subsequently, scanning the disks will detect the ASM configuration and maintain the configured permission for ASMLIB installed locally on each node. This stands in contrast to udev, which requires manual identification and configuration of the ASM disks for each individual system. Therefore, the main benefits of ASMLIB are ease of use and the ability to identify ASM disks. If you are comfortable with the benefits of having udev implement this solution for the OCR and Clusterware voting disk partitions, then ASMLIB is not an essential requirement. However, ASMLIB may provide additional identification benefits for the ASM diskgroups that contain database files. This is especially true in cases where disks are often transferred between systems, thereby reducing the possibility of errors resulting in data loss.

ASMLIB requires the installation of three RPM packages: the ASM library, tools, and driver. The RPM packages for the tools and driver are included with the installation media; however, the ASM library is not. Customers of the Unbreakable Linux Network can retrieve the library from there. The library (and the other packages) can also be downloaded from the following location:

www.oracle.com/technology/software/tech/linux/asmlib/rhel5.html

You can install all three ASMLIB RPM packages, as follows:

[root@london1 ˜]# rpm -ivh oracleasm*
Preparing...                ########################################### [100%]
   1:oracleasm-support      ########################################### [ 33%]
   2:oracleasm-2.6.18-164.el########################################### [ 67%]
   3:oracleasmlib           ########################################### [100%]

Your first action is to configure the driver. This action sets the ownership of the device as a functional equivalent of udev configuration:

[root@london1 asmlib]# /etc/init.d/oracleasm configure
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting <ENTER> without typing an
answer will keep that current value.  Ctrl-C will abort.
Default user to own the driver interface []: oracle
Default group to own the driver interface []: dba
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done
Initializing the Oracle ASMLib driver:                     [  OK  ]
Scanning the system for Oracle ASMLib disks:               [  OK  ]

Next, you use the createdisk command to mark the disks as ASM disks:

[root@london1 ˜]# /etc/init.d/oracleasm createdisk VOL1 /dev/sdc1
Marking disk "VOL1" as an ASM disk:                        [  OK  ]

The configured disks can be deleted with the deletedisk command. In the process of deleting the disk metadata, this command will also delete access to any data configured on the disk. The listdisks command shows the configured volumes, while the querydisk command shows the mapping between the ASM disk and the system disk device:

[root@london1 asmlib]# /etc/init.d/oracleasm querydisk /dev/sdc1
Device "/dev/sdc1" is marked an ASM disk with the label "VOL1"

After configuring the first node in the cluster, you can use the scandisks command to detect the configured disks on the other nodes:

[root@london1 asmlib]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks:               [  OK  ]

If you have configured multipath devices as explained previously in this chapter, then an additional configuration step is required. A multipath configuration will present three views of the same disk device. These views represent both channels and the multipath device itself. By default, on scanning ASMLIB will select the first channel scanned, which may not be the multipath device. To ensure that it is the multipath device that ASMLIB uses, you can add a section that excludes scanning the non-multipath devices to the /etc/sysconfig/oracleasm configuration file. For example, you can exclude scanning the disk devices that represent the channels for the OCR and Clusterware voting disk partitions given in the previous example, as shown here:

ORACLEASM_SCANEXCLUDE="sdd sdg"

Explicitly excluding the channels from direct use means that ASMLIB will be configured with the multipath device, thereby ensuring that ASM will benefit from the underlying redundancy of the I/O configuration.

Preparing the Partitions for ASM with udev

Due to Linux's Unix-based heritage, devices in Linux are traditionally files created by the mknod command. This command directs input and output to the appropriate device driver identified by the major device number. One of the challenges with the static configuration of devices is that the number of devices pre-created to support a wide range of hardware connectivity is significant. This makes it difficult to identify which devices are connected to the system and active. With the 2.6 kernel, an in-memory virtual file system implementation called sysfs was introduced to present a rationalized view of the connected hardware devices to processes running in user space. The sysfs virtual file system is mounted on /sys; a driver loaded as a module registers with sysfs when the module is loaded, and the udevd event management daemon dynamically creates the corresponding devices under /dev. This implementation reduces the created devices down from many thousands of pre-created static devices to only the devices connected to the system. The device configuration enables device creation and naming, according to defined rules in the main configuration file, /etc/udev/udev.conf. This file includes an entry for the directory that defines the rules for device creation; by default, it is located under the directory /etc/udev/rules.d.

Familiarity with udev is important because previously, attributes such as device ownership and permissions could be modified and would be persistent across reboots due to the static nature of the configuration. Under udev's dynamic configuration, the device is created every time the corresponding driver module is loaded. Therefore, rules must be defined to ensure this configuration sets the desired attributes each time. Configuring devices with udev means that operating system standards can be used for correctly preparing a partition for ASM, but without requiring ASMLIB.

Configuring Udev Permissions

Due to the dynamic nature of device configuration with udev, it is necessary to configure udev so it changes the permissions of the disk devices for the OCR and Clusterware voting disk to the correct owner for purpose of installing and operating Clusterware. As shown in the following example, the devices are owned by the oracle user, which is also the grid infrastructure software owner with 660 permissions. Similarly, for any devices to be used by ASM for Database files, the ownership should be set to the database software owner. In this case, that is the same oracle user. To change both the owner and the group, the permissions can be modified by writing a udev rule. You can do this in the directory /etc/udev/rules.d by modifying the file, 50-udev.rules. However, introducing a syntax error into this file may prevent the system from booting; therefore, it is best practice to create a new rules file to be read after 50-udev.rules, such as 55-udev.rules. In this example, the rules shown change the ownership for the OCR and Clusterware voting partition previously created. The new name is the device name without the /dev directory:

# There are a number of modifiers that are allowed to be used in some of the
# fields.  See the udev man page for a full description of them.
#
# default is OWNER="root" GROUP="root", MODE="0600"
#
KERNEL=="sdd[1-9]", OWNER="oracle" GROUP="dba", MODE="0660"

Subsequently, you can reload the udev rules and restart udev:

[root@london1 rules.d]# udevcontrol reload_rules
[root@london1 rules.d]# start_udev
Starting udev:                                             [  OK  ]

The permissions are now preserved across system reboots:

[root@london1 rules.d]# ls -l /dev/sdd1
brw-r----- 1 oracle dba 8, 49 Jul 29 17:20 /dev/sdd1

Finally, you should ensure that the settings are applied to all of the nodes in the cluster.

Enabling Udev Persistence

Here's something else to consider if you're not using ASMLIB. In addition to using udev to set permissions, you should also consider using it to configure device persistence of the OCR and Clusterware voting disk partition, as well as the database storage partitions used for ASM. Similar to the setting of permissions, the dynamic nature of device discovery with udev means that there is no guarantee that devices will always be discovered in the same order. Therefore, it is possible that device names may change across reboots, especially after adding new devices. For this reason, we recommend that you also configure udev to ensure that the same devices are named consistently whenever udev is run.

The first step is to identify the physical devices that correspond to the logical disk names configured by udev. The physical device names are derived on the storage itself, and these typically correspond to the Network Address Authority (NAA) worldwide naming format to ensure that the identifier is unique. Figure 6-9 illustrates the identifier for a LUN under the heading of Unique ID.

The LUN Unique ID

Figure 6.9. The LUN Unique ID

The identifier can also be determined with the commands udevinfo or /sbin/scsi_id when run against the devices in the /sys directory. With the /sbin/scsi_id command, the arguments -p 0x80 and -s return the SCSI Vendor, model, and serial number to cross reference against the storage. However it is the optionally specified -p 0x83 and the -s arguments that return the unique identifier. In this case, that identifier is prefixed by the number 3, which identifies the device as an NAA type 3 device; this identifier also corresponds to the unique ID from the storage:

[root@london1 ˜]# /sbin/scsi_id -g -s /block/sdd
360060160a5b11c00420e82c83240dc11

The identifier being determined by the storage is based on the entire LUN, as opposed to the individual disk partitions configured by the systems. Therefore, the identifier is the same for multiple partitions on the same device. Additionally, the identifier will also be the same for the same device shared between nodes in the cluster:

[root@london2 ˜]# /sbin/scsi_id -g -s /block/sdd
360060160a5b11c00420e82c83240dc11

The file /etc/scsi_id.config enables you to set options to modify the default behaviour of the scsi_id command. In particular, the option -g is required for the scsi_id command to produce output. Therefore, we recommend that you add the line options=-g to this file, as follows:

# some libata drives require vpd page 0x80
vendor="ATA",options=-p 0x80
options=-g

The following example for setting udev rules assumes that this addition has been made. If you do not wish to make the preceding addition, then you should be aware that the -g option must be given to the scsi_id command when specified in the udev rules. If not, the command will generate no output, causing the rule to fail.

You can also retrieve the device information using the command udevinfo in conjunction with the identifier shown under the heading, ID_SERIAL:

[root@london1 ˜]# udevinfo -q all -p /block/sdd
P: /block/sdd
N: sdd
S: disk/by-id/scsi-360060160a5b11c00420e82c83240dc11
S: disk/by-path/pci-0000:0b:00.0-fc-0x5006016b41e0613a:0x0001000000000000
E: ID_VENDOR=DGC
E: ID_MODEL=RAID_0
E: ID_REVISION=0324
E: ID_SERIAL=360060160a5b11c00420e82c83240dc11
E: ID_TYPE=disk
E: ID_BUS=scsi
E: ID_PATH=pci-0000:0b:00.0-fc-0x5006016b41e0613a:0x0001000000000000

Once you have the unique identifier, you can edit the file created previously for setting device permissions, 55-udev.rules. The following entry corresponds to the device name identified previously and uses /sbin/scsi_id with the -g option that was set in the /etc/scsi_id.config file. Also, the %n substitution variable is specified so that the partitions for this device are correctly named in numerical order. At this point, the owner, group, and permissions are preserved, as previously described:

KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id",
RESULT=="360060160a5b11c00420e82c83240dc11", NAME="crs%n", OWNER="oracle", GROUP="dba",
MODE="0660"

You can verify the validity of the rules with the command, udevtest:

[root@london1 rules.d]# udevtest /block/sdd
main: looking at device '/block/sdd' from subsystem 'block'
run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath''
run_program: '/bin/bash' (stdout) 'dm_multipath           55257  0 '
run_program: '/bin/bash' returned with status 0
run_program: '/lib/udev/usb_id -x'
run_program: '/lib/udev/usb_id' returned with status 1
run_program: '/lib/udev/scsi_id -g -x -s /block/sdd -d /dev/.tmp-8-48'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=DGC'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=RAID_0'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=0324'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL=360060160a5b11c00420e82c83240dc11'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi'
...

Finally, reload the udev rules and restart udev. The devices that can be specified for the OCR and Clusterware voting disk have been created as devices, and they are now available for use. This ensures that a given name will preserved for a particular device across reboots:

[oracle@london1 ˜]$ ls -l /dev/crs*
brw-r----- 1 oracle dba 8, 48 Jul 31 15:44 /dev/crs
brw-r----- 1 oracle dba 8, 49 Jul 31 15:44 /dev/crs1

You can also use the command udevinfo to query the created device based on its name, as in this example:

udev udevinfo -q all -n /dev/crs

Network Channel Bonding

As explained in Chapter 4, we recommend using a teamed network interface configuration implemented on the private interconnect network because it can protect against the interconnect switch itself being a single point of failure for the entire cluster. To implement teaming, you will need two available network interfaces per node in addition to the external network interface (again, see Chapter 4 for more information). You will also need two network switches connected to each other with an interswitch link, and these switches must support this form of topology. If this is the case, after installing the Linux operating system according to the guidelines detailed previously in this chapter, then you will have an active external interface, an active private interconnect interface, and an additional, currently inactive private network interface.

Your goal when configuring bonding is to implement high availability and prevent the presence of a single point failure, whether that failure is in the NICs installed in the system, the interconnect switches, or the network cables. Any one of these components should be able to fail without impacting the availability of the interconnect. The aim of this configuration is not to increase bandwidth or interconnect performance, but to ensure there is no single point of failure due to the aforementioned components. In Chapter 4, we discuss the role of the interconnect in the context of performance of the entire cluster; you should always review the demands of interconnect traffic in this context, rather than in isolation.

The most widely deployed and supported solution for bonding occurs with the channel bonding module installed by default with the Oracle Enterprise Linux operating system. This module includes a number of bonding modes; you can find detailed summary of these modes and their roles in Table 6-7.

Table 6.7. Bonding Modes

Mode

Name

Details

0

balance-rr

Provides round-robin load balancing on packet transmission, regardless of load.

1

active-backup

Singe NIC is active with standby operational on failure of standby.

2

balance-xor

Provides load balancing on XOR value of the source and destination MAC address; this means that the same NIC is used between a particular source and its destination.

3

broadcast

Transmits network traffic transmitted on all interfaces. This mode isn't recommended for a high availability interconnect.

4

802.3ad

Provides 802.3ad dynamic link aggregation on supported network hardware.

5

balance-tlb

Transmits load balancing based on load.

6

balance-alb

Transmits and receives load balancing. ARP replies to other nodes are replaced with specific slave MAC addresses, so replies from these nodes are received on different NICs.

In conjunction with the teamed hardware configuration described in Chapter 4, we recommend that the interconnect be configured in mode 1. This mode utilizes active-backup with one switch by default. All of the NICs attached to that switch act as the primary and the secondary switch, and the NICs configured as the backup become active only in event of the failure of the corresponding primary component. A direct interswitch link is present for a topology with multiple switches. This link provides a form of external routing between the switches, ensuring that network traffic can be transmitted and received between all of the teamed NICs, regardless of which NICs or switches are active at any particular point in time. We also recommend mode 1 for active-backup as the most applicable high availability solution that applies to the widest possible range of networking hardware and software drivers deployed with multiple switches. Once you have configured the bonding devices on the private interconnect for all of the nodes in cluster, we recommend that you also test the setup thoroughly to gauge the system's response to failures at a NIC, cable, or switch level.

At your discretion, you may use another mode, such as one of the load balancing options. However, when doing so, we recommend that you conduct additional tests of your chosen solution to ensure its compatibility with your hardware and software configuration. These tests should also demonstrate that the prime goal of maintaining high availability is achieved. If you're considering this option, you should bear in mind that a wide range of the driver software and related documentation assumes that a teamed configuration will reside within a single switch. Such an approach increases bandwidth and provides failover at the NIC level, as opposed to switch level. For this reason, we advise thorough testing for compatibility before you deploy your solution.

We will assume that the interfaces' device names are eth0, eth1, and eth2, respectively. You may also have additional interfaces, especially if you're using a backup network. You may also consider teaming on the external interface. However, for the sake of simplicity, we will consider only the three devices listed here. To optimally implement bonding, all of the interfaces that are intended to communicate on the private interconnect should be configured in this fully redundant way with bonding, as well. Care should also be taken to ensure that all of the servers are connected to the active and backup switches with the same corresponding interfaces. In other words, if eth1 on london1 is connected to the active switch, then eth1 on all other nodes should also be connected to this same switch.

Because bonding is implemented through a loadable kernel module (similar to the hangcheck-timer module), you need to set configuration options in the file, /etc/modprobe.conf. You also need to explicitly load and set the bonding module options when you initiate the private network interface. Creating an alias for the interface name—bond0, in this case enables parameters to be assigned to the bonding module:

alias bond0 bonding
options bond0 miimon=100 mode=1

Two options are of particular interest. First, you must specify either miimon or the combination of arp_interval and arp_ip_target. We recommend setting miimon because this is the most applicable solution for ensuring high availability, and this parameter determines how often Media Independent Interface (MII) link monitoring occurs (in milliseconds). In the previous example, the value is set to 100. In addition, all modern NICs should support MII link monitoring, which makes it a feasible default option.

As explained previously, the second parameter of mode must be set to the value of 1 or its text equivalent (active-backup), unless you have fully validated the operation of an alternative mode in your environment.

To configure the network devices so they can use the bonding module, you need to modify the file located in the directory, /etc/sysconfig/network-scripts. Assuming that you're using the eth1 and eth2 devices for the interconnect, you can copy the file that corresponds to the currently configured private interconnect interface, such as ifcfg-eth1, to a new file named ifcfg-bond0. Next, edit the device name in this file to reflect the alias name detailed in /etc/modprobe.conf. At this point, ifcfg-bond0 will contain entries similar to the following:

DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
IPADDR=192.168.1.1
NETMASK=255.255.255.0

Next, modify the files ifcfg-eth1 and ifcfg-eth2 to configure the devices as slaves for the master device configured in ifcfg-bond0. Your entries should follow the example illustrated in the next results snippet. The files themselves will be identical, except for the line that specifies the device name that relates to the configuration file and the hardware address:

DEVICE=eth1
HWADDR=00:30:48:D7:D5:43
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no

For testing purposes, you can restart the network service with the following command:

service network restart

However, we recommend rebooting the system as part of the testing process; this helps ensure that the bonding module is loaded during the boot process.

With the bonded network interface activated, the following output from the ifconfig command shows that the private interconnect network is active with its master and slave devices:

[root@lonodn1 ˜]#   ifconfig -a
bond0     Link encap:Ethernet  HWaddr 00:30:48:D7:D5:43
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::230:48ff:fed7:d543/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:1797 errors:0 dropped:0 overruns:0 frame:0
          TX packets:79 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:109087 (106.5 KiB)  TX bytes:15878 (15.5 KiB)

eth0      Link encap:Ethernet  HWaddr 00:30:48:D7:D5:42
          inet addr:172.17.1.101  Bcast:172.17.255.255  Mask:255.255.0.0
          inet6 addr: fe80::230:48ff:fed7:d542/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2917 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1370 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:237686 (232.1 KiB)  TX bytes:205579 (200.7 KiB)

eth1      Link encap:Ethernet  HWaddr 00:30:48:D7:D5:43
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:1545 errors:0 dropped:0 overruns:0 frame:0
          TX packets:60 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:93313 (91.1 KiB)  TX bytes:12420 (12.1 KiB)

eth2      Link encap:Ethernet  HWaddr 00:30:48:D7:D5:43
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:252 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:15774 (15.4 KiB)  TX bytes:3458 (3.3 KiB)

The bonding mode, current status, number of link failures, and the current active link can be viewed at the /proc/net/bonding location, as in this example:

[root@london1 bonding]# cat bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:30:48:d7:d5:43

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:30:48:d7:d5:56

Details related to the configuration of the network are also reported in the system log, which can be viewed either in /var/log/messages or with the command, dmesg:

igb: eth1: igb_watchdog_task: NIC Link is Down
bonding: bond0: link status definitely down for interface eth1, disabling it
bonding: bond0: making interface eth2 the new active one.

Once bonding has been configured and is active on all of the nodes in the cluster, the interface—bond0, in this case—represents the private interconnect interface to use for the Oracle software during installation. The slave device names should not be used directly in any of the configuration steps required.

I/O Fencing with IPMI

The Intelligent Platform Management Interface (IPMI) introduced in Chapter 4 is a standard for remote server management, and it is available on a wide range of systems that support Oracle Enterprise Linux. The key aspect for support of IPMI is the presence of a Baseboard Management Controller (BMC), which is typically located on the motherboard of the server. The BMC includes a service processor distinct from the system processor, and this service processor has its own firmware, power, and network connection. The BMC receives power when the system is connected, even if the server itself is turned off. This means that it is always in a powered on state. Consequently, the BMC is active, regardless of whether the Linux operating system is installed or running. The BMC provides significant functionality. For example, it can monitor the system hardware sensors, such as the state of the processors, memory, fans, power supplies, and temperature. The BMC also enables you to administer the server, letting you power a server on or reset a server remotely. Again, this functionality remains available, regardless of the operating system's status or your access to the system event logs. The BMC also provides serial over LAN (SOL) functionality, enabling access to the server console across the network.

From the preceding description, it should be clear that IPMI provides improved functionality for I/O fencing capabilities. It also provides the ability for one node in the cluster to remotely reset another from which contact has been lost, regardless of whether contact with the operating system on the node to be reset has been lost. Thus IPMI can be used to reset a node even if the operating system is in a hung state. If you have IPMI available on your cluster nodes, then you should configure it so that it is used for I/O fencing by the Oracle Clusterware.

The BMC provides an additional feature relevant to clustered environments: a watchdog timer with functionality similar to the hangcheck timer. However, as noted previously, the system must recover to a suitable operational state before the hangcheck timer can be triggered. The BMC watchdog timer runs independently of the operating system. This means that, even if the system is in a permanently hung state, the timer can still be triggered and perform a hardware-level reset. The BMC watchdog timer operates by setting a timer that counts down from a user determined value to zero. On reaching zero, the system is reset. On the operating system, a daemon is run to periodically reset the timer. Obviously, the value of operating system's daemon is smaller than that for the BMC watchdog timer limit. Consequently, during normal operations, the BMC watchdog timer does not reach zero, and the system remains operational. However, if the operating system remains in a hung state, the timer is not reset, and the system is reset when the BMC timer reaches zero. This reset occurs regardless of the status of the operating system. You may wish to consider setting the BMC watchdog timer functionality only if you are absolutely aware of the necessary timings required for it to operate as an additional reset mechanism to the Clusterware timeout.

Before attempting to configure the BMC, you must ensure that it is connected to the network. Typically, the BMC connection shares the first Ethernet port for IPMI Channel 1, usually eth0. Therefore, the BMC can be reached as long as this port is connected. Regardless, you should keep in mind that the BMC has its own separate MAC address and IP address, so it is only the physical infrastructure that is shared. Also, in a teamed network configuration, the BMC interface will operate independently of the bonded driver. This means that configuring the bonded driver as described previously will not provide resilience at the BMC level.

If your system supports IPMI, then there will usually be a number of configurable options at the BIOS level. These options typically fall under a heading such as Server Management, and they might include the potential to automatically start the BMC-related functionality. However, if this functionality is not enabled automatically, you can begin configuring IPMI by starting the IPMI service. If the system has a BMC, then the service should start successfully, as shown in this example:

[root@london2 modules]# service ipmi start
Starting ipmi drivers:                                     [  OK  ]

You can see the details displayed in /var/log/messages:

Jul 27 17:06:58 london2 kernel: ipmi message handler version 39.1
Jul 27 17:06:58 london2 kernel: IPMI System Interface driver.
Jul 27 17:06:58 london2 kernel: ipmi_si: Trying SMBIOS-specified kcs state machine at i/o address 0xca2, slave address 0x20, irq 0
Jul 27 17:06:58 london2 kernel: ipmi: Found new BMC (man_id: 0x000157,  prod_id: 0x0028,
dev_id: 0x20)
Jul 27 17:06:58 london2 kernel:  IPMI kcs interface initialized

If however a BMC is not present, however, the service will fail to start:

Apr 20 07:34:23 london5 kernel: ipmi_si: Interface detection failed
Apr 20 07:34:23 london5 kernel: ipmi_si: Unable to find any System Interface(s)

If the service fails to start, then you should check your hardware configuration to verify whether a BMC is present and enabled. If the service is running, then you should also use the chkconfig ipmi on command to enable the service so it starts at the current run level:

[root@london2 ˜]# chkconfig --list ipmi
ipmi            0:off   1:off   2:on    3:on    4:on    5:on    6:off

The service status should show the IPMI modules are running before you proceed with configuring IPMI:

[root@london2 modules]# service ipmi status
ipmi_msghandler module loaded.
ipmi_si module loaded.
ipmi_devintf module loaded.
/dev/ipmi0 exists.

To configure IPMI, it is necessary to install the RPM package OpenIPMI-tools. You can do this directly or by using YUM:

[root@london1 ˜]# yum install OpenIPMI-tools.x86_64
Loaded plugins: security
Server                                                   | 1.3 kB     00:00
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package OpenIPMI-tools.x86_64 0:2.0.16-5.el5 set to be updated
...
Installed:
  OpenIPMI-tools.x86_64 0:2.0.16-5.el5

Complete!

Without further configuration, it is possible to communicate with the BMC on the local system with the ipmitool command. This command lets you view the system status and logs, or even power cycle the system:

[root@london2 ˜]# ipmitool power cycle
Chassis Power Control: Cycle
[root@london2 ˜]#
Broadcast message from root (Tue Jul 28 14:32:07 2009):

The system is going down for system halt NOW!

However, IPMI functionality is fully realized only when it is possible to access the BMC remotely. Enabling this functionality requires that you configure the IPMI user and the LAN channel. By default, a NULL administrator level user exists for all channels; the following example shows the details for the NULL user on Channel 1:

[root@london1 ˜]# ipmitool user list 1
ID  Name             Enabled Callin  Link Auth  IPMI Msg   Channel Priv Limit
1                    true    true    false      true       ADMINISTRATOR

You can use the ipmitool command's user set argument to add another user. The user is created and visible for all channels, although the authority levels for that user can be different on different channels. In the following example, the user oraipmi is created with the default privilege levels shown for Channel 1:

[root@london1 ˜]# ipmitool user set name 2 oraipmi
[root@london1 ˜]# ipmitool user list 1
ID  Name             Enabled Callin  Link Auth  IPMI Msg   Channel Priv Limit
1                    true    true    false      true       ADMINISTRATOR
2   oraipmi          false   true    false      false      NO ACCESS

You can use the user priv arguments to change the privilege levels. In the following example, user 2 the oraipmi user is set to the privilege level of 4, Administrator on Channel 1:

[root@london1 ˜]# ipmitool user priv 2 4 1
[root@london1 ˜]# ipmitool user list 1
ID  Name             Enabled Callin  Link Auth  IPMI Msg   Channel Priv Limit
1                    true    true    false      true       ADMINISTRATOR
2   oraipmi          true    true    false      true       ADMINISTRATOR

Finally, you can complete the user configuration by setting the password for the newly configured user, and then set the channel to use password authentication for the Administrator level:

[root@london1 ˜]# ipmitool user set password 2 oracle
[root@london1 ˜]# ipmitool lan set 1 auth ADMIN PASSWORD

With the oraipmi user configured and the authority level set, it is necessary to configure the LAN channel to the preferred IP configuration. The following example illustrates how to configure a static IP address on Channel 1:

[root@london1 ˜]# ipmitool lan set 1 ipsrc static
[root@london1 ˜]# ipmitool lan set 1 ipaddr 172.17.1.10
Setting LAN IP Address to 172.17.1.10
[root@london1 ˜]# ipmitool lan set 1 netmask 255.255.0.0
Setting LAN Subnet Mask to 255.255.0.0
[root@london1 ˜]# ipmitool lan set 1 defgw ipaddr 172.17.1.254
Setting LAN Default Gateway IP to 172.17.1.254
[root@london1 ˜]# ipmitool lan set 1 access on
[root@london1 ˜]# ipmitool lan set 1 snmp gridcommunity
Setting LAN SNMP Community String to gridcommunity

Finally, we can print the channel status to show that the BMC is configured and available for testing remotely:

[root@london1 ˜]# ipmitool lan print 1
Set in Progress         : Set Complete
Auth Type Support       : NONE MD5 PASSWORD
Auth Type Enable        : Callback :
                        : User     :
                        : Operator :
                        : Admin    : PASSWORD
                        : OEM      :
IP Address Source       : Static Address
IP Address              : 172.17.1.10
Subnet Mask             : 255.255.0.0
MAC Address             : 00:04:23:dc:29:52
SNMP Community String   : gridcommunity
...

Note

You cannot test remote IPMI connectivity from the server itself; therefore, you must perform such testing from another node in the cluster or on an additional management server on the network.

In the following example, the ipmitool is used to verify the power status of the chassis from a remote server that specifies the password on the command line. If the password is not specified, then it is prompted for. Alternatively, the password can be set in the environment variable IPMI_PASSWORD and the -E option can be used to list the sensor data. In the following example, some arguments, such as -I for the LAN channel, resolve to the default setting:

[root@london2 ˜]# ipmitool -H 172.17.1.10 -U oraipmi 
 -P oracle chassis power status
Chassis Power is on
[root@london2 ˜]# export IPMI_PASSWORD="oracle"
[root@london2 ˜]# ipmitool -H 172.17.1.10 -U oraipmi -E sdr list
BB +1.2V Vtt     | 1.20 Volts        | ok
BB +1.5V AUX     | 1.48 Volts        | ok
BB +1.5V         | 1.48 Volts        | ok
BB +1.8V         | 1.80 Volts        | ok
BB +3.3V         | 3.32 Volts        | ok
...

The following dialog illustrates that you can now use IPMI to let one node in the cluster or on an external management server control the chassis of another node. For example, the first node might power down the other node. Next, the communication established with the BMC can enable the first node to power up the server from a powered down state:

[root@london2 ˜]# ipmitool -H 172.17.1.10 -U oraipmi -E chassis power off
Chassis Power Control: Down/Off
[root@london2 ˜]# ping 172.17.1.10
PING 172.17.1.10 (172.17.1.10) 56(84) bytes of data.
64 bytes from 172.17.1.10: icmp_seq=3 ttl=30 time=3.16 ms
64 bytes from 172.17.1.10: icmp_seq=4 ttl=30 time=3.05 ms
64 bytes from 172.17.1.10: icmp_seq=5 ttl=30 time=2.94 ms
--- 172.17.1.10 ping statistics ---
5 packets transmitted, 3 received, 40% packet loss, time 4001ms
rtt min/avg/max/mdev = 2.943/3.053/3.161/0.089 ms
[root@london2 ˜]# ipmitool -H 172.17.1.10 -U oraipmi -E chassis power on
Chassis Power Control: Up/On

At this stage, the basic IPMI configuration is complete. However, we still recommend exploring the capabilities of IPMI further, as your requirements dictate. For example, you might use this service to configure SOL to display the system console remotely. You should also complete IPMI configuration for all of the nodes in the cluster and pay particular attention to the IP Addresses for the BMC, as well as the IPMI username and password for the Oracle Grid infrastructure software installation. Another nice feature: The IPMI configuration will persist in your BMC even if you reinstall the host operating system.

Summary

In this chapter, we explored the actions required to install, configure, and understand the Linux operating system before installing the Oracle grid infrastructure software and 11g Release 2 RAC Database Server software. Along the way, we focused on explaining all of the available configuration options, as well as the meanings behind the configuration decisions. At this point, you should have the necessary foundation for to implement successful Oracle software installations that are optimized for your environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset