Chapter 13. Linux Performance Monitoring

In the previous chapter, we looked at Oracle performance monitoring tools. However, performance problems often occur outside the Oracle environment at the processor and memory, network, or storage level. It is therefore important to understand the information provided, not only by the Oracle performance monitoring tools, but also by the standard operating system monitoring tools available on Linux. You can use the information provided by these tools to support the findings from Oracle tools to fully diagnose RAC performance.

There are a number of third-party performance monitoring tools that operate in the Linux environment. However, our focus here is on the operating system monitoring tools available by default with Oracle Enterprise Linux that complement the environment available with the Oracle tools. In this category, we cover the default tools available in the base Oracle Enterprise Linux installation—namely, the command line CPU and memory diagnostics with uptime, last, ps, free, ipcs, pmap, lsof, top, vmstat, and strace; and network tools of netstat, ss, and tcpdump. Additionally, if you have installed and configured Oracle Enterprise Linux as detailed in Chapter 6, you will have run the Oracle Validated RPM. One dependency for the latter is the RPM package, sysstat. sysstat includes the following Linux performance monitoring tools: iostat, mpstat, and sar. Consequently, a default Oracle-validated Enterprise Linux environment includes a number of command-line tools that, if mastered, can rapidly and comprehensively give you insight into the system-level performance.

We also provide an overview of additional optional Oracle-provided Linux monitoring tools, as well as information on another open source tool you may wish to investigate. The Oracle tools are called Oracle Cluster Health Monitor and OSWatcher, respectively. The additional open source tool, which is provided by IBM, is called nmon. These tools are easy to install, but provide both an alternative and complementary environment for monitoring Linux environments.

It is important to note that, as is the case with all software the performance monitoring tools covered in this section, the tools just mentioned all require system resources to run, and you should be aware of the level of resources required by each tool. This information should be considered when deciding upon your Linux performance monitoring toolset; therefore, we do not recommend running all of the tools detailed in this section at the same time. Instead, you should select the ones that will work best in your environment.

The uptime and last Commands

uptime is a standard Linux command that reports the amount of time that a system has been running. The following snippet shows you how to use this command:

[root@london1 ˜]# uptime
 15:36:11 up 3 days,  3:50,  4 users,  load average: 0.13, 0.14, 0.10

uptime provides information on node availability, and it is useful as a command of first resort in diagnosing and troubleshooting node evictions across a RAC cluster. uptime also reports the system load over intervals of 1, 5, and 15 minutes.

In a similar vein, the last command and its -x argument provides a detailed log of system shutdowns and changes in run level, as in this example:

[root@london1 ˜]# last
root     pts/2        172.17.1.81      Fri Feb  5 09:32   still logged in
root     pts/1        london2.example. Thu Feb  4 16:09   still logged in
root     pts/1        london2.example. Thu Feb  4 16:04 - 16:05  (00:00)
root     pts/0        172.17.1.81      Thu Feb  4 16:00   still logged in
reboot   system boot  2.6.18-164.el5   Thu Feb  4 15:52          (17:53)
root     pts/2        172.17.1.81      Thu Feb  4 15:36 - down   (00:13)
root     pts/1        172.17.1.81      Thu Feb  4 13:18 - down   (02:31)
root     pts/3        172.17.1.81      Mon Feb  1 14:37 - down  (3+01:11)
root     pts/1        172.17.1.81      Mon Feb  1 13:51 - 14:47  (00:55)
root     pts/2        172.17.1.81      Mon Feb  1 13:31 - 14:48  (01:17)
root     pts/1        172.17.1.81      Mon Feb  1 13:30 - 13:32  (00:02)
root     pts/0        172.17.1.81      Mon Feb  1 11:48 - down  (3+04:01)
reboot   system boot  2.6.18-164.el5   Mon Feb  1 11:46         (3+04:02)
root     pts/0        172.17.1.81      Fri Jan 29 15:48 - down   (00:09)
reboot   system boot  2.6.18-164.el5   Fri Jan 29 11:03          (04:54)

The ps Command

The ps command is one of the most basic, yet essential tools for analyzing performance on a Linux system. At its simplest, ps shows a list of processes; if called without arguments, it displays the list of processes running under the current session, as shown here:

[root@london1 ˜]# ps
  PID TTY          TIME CMD
 6969 pts/2    00:00:00 bash
 7172 pts/2    00:00:00 ps

Fortunately, ps can do a lot more than this. For example, it accepts a wealth of arguments to present process listings in almost every conceivable form. The arguments to ps can take three forms: standard System V Unix-type options that must be preceded by a dash; BSD-type options that are not preceded by a dash; and GNU long options that are preceded by two dashes. In effect, you may use different combinations of arguments to display similar forms of output. The combination of arguments that you will use most regularly is that of a full listing of all processes that relies on the System V -ef arguments. The following shows the first ten lines of output:

[root@london1 ˜]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Feb04 ?        00:00:02 init [3]
root         2     1  0 Feb04 ?        00:00:00 [migration/0]
root         3     1  0 Feb04 ?        00:00:00 [ksoftirqd/0]
root         4     1  0 Feb04 ?        00:00:00 [watchdog/0]
root         5     1  0 Feb04 ?        00:00:00 [migration/1]
root         6     1  0 Feb04 ?        00:00:00 [ksoftirqd/1]
root         7     1  0 Feb04 ?        00:00:00 [watchdog/1]
root         8     1  0 Feb04 ?        00:00:00 [migration/2]
root         9     1  0 Feb04 ?        00:00:00 [ksoftirqd/2]
root        10     1  0 Feb04 ?        00:00:00 [watchdog/2]

To learn more details about each process, you can use the ps -elf command and its -l argument to see a longer, more complete listing. You can pipe the output through grep to restrict the number of lines returned, as in this example:

[root@london1 ˜]# ps -elf | grep smon | grep -v grep
0 S oracle   13172    1  0  78   0 - 119071 -  Feb01 ? 00:00:00 asm_smon_+ASM1
0 S oracle   23826    1  0  75   0 - 1727611 - Feb01 ? 00:00:14 ora_smon_PROD1

However, you also have an alternative to using ps with grep. pgrep can provide you with the same functionality in a single command. For example, the following extract uses the -flu arguments to display the processes owned by the user oracle:

[root@london1 ˜]# pgrep -flu oracle
6458 ora_pz97_PROD1
7210 ora_w000_PROD1
7516 ora_j000_PROD1
7518 ora_j001_PROD1
12903 /u01/app/11.2.0/grid/bin/oraagent.bin
12918 /u01/app/11.2.0/grid/bin/mdnsd.bin
12929 /u01/app/11.2.0/grid/bin/gipcd.bin
12940 /u01/app/11.2.0/grid/bin/gpnpd.bin
12970 /u01/app/11.2.0/grid/bin/diskmon.bin -d -f
12992 /u01/app/11.2.0/grid/bin/ocssd.bin
13138 asm_pmon_+ASM1

Another useful command is pidof, which can be used to identify processes. It can even be used without arguments. If you know the name of a process, you can quickly find its corresponding process identifier with this snippet:

[root@london1 ˜]# pidof ora_smon_PROD1
23826

free, ipcs, pmap, and lsof

The free command, the /proc file system, the /meminfo file system, and the ipcs, pmap, and lsof commands are useful in diagnosing RAC performance problems. The following sections walk through how to use of each of these items.

The free Command

The free command displays the status of your system's virtual memory at the current point in time. There are three rows of output: the Mem: row shows the utilization of the physical RAM installed in the machine; the -/+ buffers/cache: row shows the amount of memory assigned to system buffers and caches; and the Swap: row shows the amount of swap space used.

The next example shows a system with 16GB of RAM after an Oracle RAC instance has started. At first, it may appear that nearly 13GB has been consumed, with just over 3GB free. However, with free, we can see that the operating system actually assigns memory to buffers and cache if it is not being used for any other purpose; therefore, the actual figure representing free memory is more than 4GB. If you are using any third-party system-monitoring tool that reports memory utilization is high on a Linux system, you should always confirm this with free to ensure that the memory is not simply free in buffers and cache instead:

root@london1 ˜]# free
             total       used       free     shared    buffers     cached
Mem:      16423996   12813584    3610412          0     158820    1045908
-/+ buffers/cache:   11608856    4815140
Swap:     18481144          0   18481144

The preceding example also shows that the system is not using any of the configured swap space at this point in time. As we discussed in Chapter 6, unless you are creating a large number of processes on the system, swap space utilization should be minimal. If you monitor the memory utilization with free, and an increasing amount of swap space is being consumed, this will have a significantly negative performance impact.

By default, the values for free are expressed in kilobytes; however, you can specify the display to be used in bytes, megabytes, or gigabytes with the -b,-m, or -g flag, respectively. The -s flag can be used with an interval value to continually repeat the command according to the interval period. Alternatively, you can use the watch command to refresh the display in place. By default, running watch free will refresh in place every two seconds.

The /proc File System

When working with Oracle, you should also be familiar with the output of /proc/meminfo, which is the location from which the information for free is derived. Within /proc/meminfo, you can also see the amount of memory and swap that is free and used, and the amount of memory assigned to buffers and cache on an individual basis. In addition, /proc/meminfo includes the configuration of huge pages, the setting of which we discuss in Chapter 6.

The following example of /proc/meminfo shows the same system with a total of 16GB of RAM and 5,000 huge pages at 2MB each, which is a 10GB allocation in total. Of these, 3,442 huge pages remain as reserved after the Oracle instance has started. This indicates the difference between the number of pages not already used by the instance, but reserved for future use by the SGA and therefore not being available for standard small pages:

[oracle@london1 ˜]$ cat /proc/meminfo
MemTotal:     16423996 kB
MemFree:       4079840 kB
Buffers:         29668 kB
Cached:         771924 kB
SwapCached:          0 kB
Active:        1509184 kB
Inactive:       456424 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     16423996 kB
LowFree:       4079840 kB
SwapTotal:    18481144 kB
SwapFree:     18481144 kB
Dirty:             924 kB
Writeback:           0 kB
AnonPages:     1166644 kB
Mapped:         192576 kB
Slab:            39444 kB
PageTables:      38872 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  21573140 kB
Committed_AS:  6138472 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    284172 kB
VmallocChunk: 34359453879 kB
HugePages_Total:  5000
HugePages_Free:   4345
HugePages_Rsvd:   3442
Hugepagesize:     2048 kB

In this output, it's important to notice that 4,345 huge pages remain free. By default, the pages will be used on demand, which means the number of pages free will drop as they are used during normal Oracle SGA related database activity, such as when caching data in the buffer cache. Alternatively, setting the Oracle parameter PRE_PAGE_SGA to true will ensure that each process pages the SGA on startup, and all required pages will be allocated on instance startup. If unused pages remain available, these can be freed by setting the vm.nr_hugepages parameter to the utilized level (see Chapter 6 for more information on this).

The /sys/devices/system/node File System

When working on a system with a NUMA memory configuration, you should also be familiar with the meminfo data reported on a per memory node basis. For example, the following shows Node 0 of a 4-node configuration. In this case, a quarter of the total 70,000 huge pages are allocated on this node, which indicates an even distribution of pages across the nodes:

root@london5 node]# cat */meminfo
Node 0 MemTotal:     66036380 kB
Node 0 MemFree:      28338244 kB
Node 0 MemUsed:      37698136 kB
Node 0 Active:         454512 kB
Node 0 Inactive:       716248 kB
Node 0 HighTotal:           0 kB
Node 0 HighFree:            0 kB
Node 0 LowTotal:     66036380 kB
Node 0 LowFree:      28338244 kB
Node 0 Dirty:              76 kB
Node 0 Writeback:           0 kB
Node 0 FilePages:     1010516 kB
Node 0 Mapped:          80612 kB
Node 0 AnonPages:      169328 kB
Node 0 PageTables:      15504 kB
Node 0 NFS_Unstable:        0 kB
Node 0 Bounce:              0 kB
Node 0 Slab:            67836 kB
Node 0 HugePages_Total: 17500
Node 0 HugePages_Free:  15247

Additional NUMA-related commands that can influence and tune this allocation, such as the numactl and numastat commands (you can learn more about these commands in Chapter 4). It is also important to understand, not only how the memory is allocated, but also how it is used by the Oracle instance.

The ipcs Command

Regardless of whether you are using NUMA-based memory allocation, a significant proportion of your system memory will be allocated as shared memory for the SGA. This is true whether you are using manual shared memory management, automatic shared memory management, or automatic memory management. The ipcs command with the -m argument can be used to display the configured shared memory segments on the system. The following example shows a single shared memory segment has been allocated for the SGA:

[oracle@london1 ˜]$ ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0xed304ac0 32768      oracle    660        4096       0
0x90c3be20 1277953    oracle    660        8592031744 42

The corresponding command of ipcrm with the -M argument can be used to manually delete shared memory segments by a user with the appropriate permissions. However, you should use pmap and lsof beforehand to identify the processes using the shared memory segment.

The pmap Command

On an individual process basis, the pmap command details the memory mapped by that particular process, including the total memory utilization by process. The -x argument shows this information in an extended format. This can be used directly with a process number or in conjunction with the pgrep command, as described previously. Doing so returns the process number of a particular process that you can identify by name. For example, the following output illustrates the pmap information for a foreground process:

[root@london1 ˜]# pmap -x 21749
21749:   oraclePROD1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
Address           Kbytes     RSS    Anon  Locked Mode   Mapping
0000000000400000  155144       -       -       - r-x--  oracle
0000000009d81000   12404       -       -       - rwx--  oracle
000000000a99e000     280       -       -       - rwx--    [ anon ]
0000000015522000     556       -       -       - rwx--    [ anon ]
0000000060000000 8390656       -       -       - rwxs-  5 (deleted)
00000031a3600000     112       -       -       - r-x--  ld-2.5.so
00000031a381b000       4       -       -       - r-x--  ld-2.5.so
00000031a381c000       4       -       -       - rwx--  ld-2.5.so
00000031a3a00000    1332       -       -       - r-x--  libc-2.5.so
00000031a3b4d000    2048       -       -       - -----  libc-2.5.so

However, it is important to note that, as explained previously, some implementations of pmap under Oracle Enterprise Linux do not show the full extent of the listing. For example, compare the preceding output to the following example from SUSE Linux, which shows more complete information:

reading1:˜ # pmap 7679
7679: oracle
START       SIZE     RSS     PSS   DIRTY    SWAP PERM MAPPING
08048000 136616K   9988K    683K      0K      0K r-xp /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
105b2000   1004K    208K     63K     56K      0K rwxp /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
106ad000    472K    320K    320K    320K      0K rwxp [heap]
20000000 309248K  76480K  71725K  42072K      0K rwxs /SYSVb0e4b134
b6dfe000    128K    128K     88K     88K      0K rwxp /dev/zero
b6e1e000    384K    384K      0K      0K      0K rwxp /dev/zero
...
b6e7e000    132K    132K    132K    132K      0K rwxp [anon]
bfbb9000     84K     36K     36K     36K      0K rwxp [stack]
ffffe000      4K      0K      0K      0K      0K r-xp [vdso]
Total:   465608K  93988K  77876K  47508K      0K

8704K writable-private, 147440K readonly-private, 309464K shared,
and 92528K referenced

In addition to reporting on shared memory, pmap can also be used to identify the private memory utilized by individual foreground processes to troubleshoot where memory has been allocated across the system on a process-by-process basis. Additional process-based memory utilization information is also available underneath the /proc directory and an individual process number. For example, the status information includes the following summary:

[root@london1 21749]# cat status
Name:   oracle
State:  S (sleeping)
SleepAVG:       85%
Tgid:   21749
Pid:    21749
PPid:   21748
TracerPid:      0
Uid:    500     500     500     500
Gid:    501     501     501     501
FDSize: 64
Groups: 500 501
VmPeak:  8608684 kB
VmSize:   218028 kB
VmLck:         0 kB
VmHWM:     28616 kB
VmRSS:     28616 kB
VmData:     2920 kB
VmStk:       112 kB
VmExe:    155144 kB
VmLib:     12148 kB
VmPTE:       380 kB
StaBrk: 15522000 kB
Brk:    155ad000 kB
StaStk: 7fffac094010 kB

The lsof Command

lsof is an extensive command that lists the open files on the system. It can be used for diagnosing connectivity to a number of resources. For example, it provides information on the usage of standard files, shared memory segments, and network ports. Without arguments, the following example lists the processes under the oracle user attached to the shared memory segment; the id 1277953 is identified from the output of ipcs:

[root@london1 ˜]# lsof -u oracle | grep 1277953
oracle    20508    oracle  DEL       REG               0,13                        1277953 /5
oracle    20510    oracle  DEL       REG               0,13                        1277953 /5
oracle    20514    oracle  DEL       REG               0,13                        1277953 /5
oracle    20516    oracle  DEL       REG               0,13                        1277953 /5
oracle    20518    oracle  DEL       REG               0,13                        1277953 /5
oracle    20520    oracle  DEL       REG               0,13                        1277953 /5
oracle    20522    oracle  DEL       REG               0,13                        1277953 /5
oracle    20524    oracle  DEL       REG               0,13                        1277953 /5
...

If you are also interested in the cached objects in the kernel, you can view them in the output of /proc/slabinfo; however, you will most likely be interested only in specific entries, such as kiobuf related to asynchronous I/O activity. In addition, a utility called slabtop can display kernel slab information in real time. The form of the output of slabtop is similar to that of the more general-purpose top.

top

ps and free are static commands that return information about system processes and memory utilization within individual snapshots. However they are not designed to track usage over a longer period of time. The first tool we will look at with this monitoring capability is top.

If top is called without arguments, it will display output similar to the following result. It will also refresh the screen by default every two seconds, but without requiring that you use watch to enable this functionality:

[root@london1 ˜]# top
top - 11:15:54 up 19:24,  4 users,  load average: 1.27, 0.51, 0.23
Tasks: 285 total,   1 running, 284 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.9%us,  2.8%sy,  0.0%ni, 49.2%id, 26.1%wa,  0.2%hi,  0.9%si,  0.0%st
Mem:  16423996k total, 12921072k used,  3502924k free,   162536k buffers
Swap: 18481144k total,        0k used, 18481144k free,  1073132k cached

  PID USER    PR NI  VIRT  RES  SHR S %CPU %MEM TIME+   COMMAND
24725 oracle  16  0 8413m  27m  22m S 22.6  0.2 0:15.33 oraclePROD1 (LOCAL=NO)
24735 oracle  15  0 8413m  27m  22m S 21.9  0.2 0:16.28 oraclePROD1 (LOCAL=NO)
24723 oracle  15  0 8413m  27m  22m S 19.3  0.2 0:17.44 oraclePROD1 (LOCAL=NO)
24727 oracle  15  0 8411m  25m  20m S 18.9  0.2 0:16.22 oraclePROD1 (LOCAL=NO)
24729 oracle  15  0 8413m  27m  22m S 18.6  0.2 0:17.17 oraclePROD1 (LOCAL=NO)
24733 oracle  15  0 8413m  27m  22m S 17.9  0.2 0:15.29 oraclePROD1 (LOCAL=NO)
24737 oracle  15  0 8413m  27m  22m S 17.6  0.2 0:14.97 oraclePROD1 (LOCAL=NO)
24731 oracle  15  0 8413m  27m  22m S 15.9  0.2 0:15.63 oraclePROD1 (LOCAL=NO)
24743 oracle  15  0 8411m  25m  20m S 13.0  0.2 0:12.59 oraclePROD1 (LOCAL=NO)
20546 oracle  16  0 8424m  38m  19m D  7.3  0.2 0:09.99 ora_dbw0_PROD1
20548 oracle  15  0 8427m  41m  37m S  6.3  0.3 0:14.29 ora_lgwr_PROD1
20532 oracle  −2  0 8415m  31m  18m S  5.0  0.2 0:18.90 ora_lms0_PROD1
20536 oracle  −2  0 8415m  31m  18m S  4.3  0.2 0:19.49 ora_lms1_PROD1

The top display is divided into two main sections. Within the top-level section, the most important information in monitoring an Oracle RAC node is the load average, CPU states, and memory and swap space. The load average shows the average number of processes in the queue waiting to be allocated CPU time over the previous 1, 5, and 15 minutes. During normal operations, the load averages should be maintained at low values. If these values consistently exceed the processor core count of the server, this is an indication that the system load is exceeding capacity. When this is the case, there is the potential that the GCS background processes (LMSn) could become starved of CPU time, resulting in a detrimental effect on the overall performance of the cluster.

The CPU states show the level of utilization for all of the CPUs installed on the system. The oracle user workload will be shown as user time; however, there will be additional levels of system time and iowait time related to Oracle activity. A high level of iowait time may indicate that you should investigate the disk performance because the CPUs are spending the majority of their time simply waiting for I/O requests to be processed. An overall indicator of CPU is the idle value showing spare capacity on the system. A consistently low idle time in conjunction with a high load average provides additional evidence that the workload exceeds the ability of the system to process it.

The memory-related section displays information that bears a close resemblance to the output of free.

The bottom-level section includes statistics related to the processes running on the system. You can use this section to pinpoint which processes are using most of the CPU and memory on the system. In terms of memory, as well as the total percentage utilization on the system, the VIRT field shows how much memory an individual process has allocated, and the RSS field (the Resident Set Size) shows how much memory the process is using at the current time. For Oracle processes, these values should ordinarily be at similar levels.

From the example top output, we can see that the system is processing Oracle activity but is not under excessive workload at the present time.

top is an interactive tool that accepts single-letter commands to tailor the display. For example, you may use the u option to specify viewing processes solely for the oracle user by typing u or oracle, or you may sort tasks by age by typing A. Typing c also lets you also display the full process listing, which is useful in identifying the Oracle processes utilizing the highest levels of CPU. You should remember not to neglect monitoring system process tasks. For example, observing the kswapd process in top output on a regular basis would indicate a potential performance impact from utilizing swap space.

An important aspect of top is that, in addition to displaying information, you may also interact with the processes themselves, such as altering their relative priorities or killing them altogether. Therefore, the Help screen accessed by ? is useful for familiarizing yourself with the capabilities of the tool. You can terminate top by pressing the q key or Ctrl+C.

vmstat

As its name suggests, the vmstat utility focuses on providing output about the usage of virtual memory. When called without arguments, vmstat will output information related to virtual memory utilization since the system was last booted. Therefore, you are most likely to call vmstat with two numerical arguments for the delay between sampling periods and the number of sampling periods in total. If you specify just one numerical argument, this will apply to the delay, and the sampling will continue until the command is canceled with Ctrl+C. For example, the following will produce ten lines of output at three-second intervals:

[root@london1 ˜]# vmstat 3 10
procs ----------memory-------- ---swap-- -----io---- --system-- -----cpu------
 r b swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2 7  0 3498932 162972 1079928    0    0    18    23   66   59  0  0 99  0 0
 2 7  0 3495972 162972 1080028    0    0 13619  1277 8384 17455  8  3 66 23 0
 0 9  0 3499452 162972 1080164    0    0 15457  1821 9120 18800  9  3 64 24 0
 2 9  0 3498976 162972 1080240    0    0 16411  2451 9497 19562 11  3 61 25 0
 2 7  0 3498712 162972 1080368    0    0 15881  8385 10625 21277 12  3 60 25 0
 4 7  0 3498240 162972 1080480    0    0 14400  8495 10734 21287 12  4 58 26 0
 1 7  0 3497916 162972 1080620    0    0 12734 16371 10947 21363 13  4 57 26 0
 3 8  0 3503008 162972 1080668    0    0  9667 14266 9050 18520 11  3 57 28 0
 3 8  0 3503232 162972 1080788    0    0 11739  2818 11426 22608 15  4 54 27 0
 3 6  0 3502612 162972 1080848    0    0 10886  9531 11556 22593 16  4 52 28 0

Within the output, the first two fields under procs show processes waiting for CPU runtime and processes that are in uninterruptible sleep state. A traditional implementation of vmstat on many UNIX systems and earlier Linux versions also showed a w field under the procs section to indicate processes that are swapped out; however, entire processes are not swapped out under Linux, so the w field is no longer included. As with top, the next four fields under the memory section should be familiar from the output of free, which shows the amount of swap space in use, as well as the free and cached memory. The two fields under the swap section show the amount of memory being swapped in and out of disk per second. On an Oracle system, we would expect these values and the amount of swap in use to show a low or zero value. The fields under io show the blocks sent and received from block devices, and the fields under system show the level of interrupts and context switches. In a RAC environment, the levels of interrupts and context switches can be useful in evaluating the impact of the CPU servicing network-related activity, such as interconnect traffic or the usage of network attached storage (NAS).

Finally, the cpu section is similar to top in that it displays the user, system, I/O wait, and idle CPU time. The cpu section differs from top by including this information for all CPUs on the system.

In addition to the default output, vmstat also enables the display to be configured with a number of command-line options. For example, -d displays disk statistics, and -p shows the statistics for a particular disk partition specified at the command line. A summary of memory-related values can be given by the -s option.

strace

strace is a tool that can be used for diagnostics when performance monitoring reveals either errors or performance issues with a particular command or process. For example, if you suspect a particular process is not responding, you can use strace to determine the actions that the process is undertaking.

If the strace command is not available on your system, it can be installed as part of the strace RPM package from your install media:

[root@london1 ˜]# yum install strace
Loaded plugins: security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package strace.x86_64 0:4.5.18-5.el5 set to be updated
...
Installed:
  strace.x86_64 0:4.5.18-5.el5

Complete!

As its names implies, strace records and reports the system calls and signals of a process until the process exits. The information captured is either printed to the standard error channel or (more usefully) to a text file, the name of which is given as an argument to the -o flag. One of the most powerful additional strace options is available with the -e flag, which enables the tracing of particular system calls or groups of system calls, such as those that are network related. You can use strace in one of two ways. First, you can use it to precede a program run at the command line. Second, you can use -p to specify a process to attach to in order to perform the trace. For example, the following snippet shows a trace of the LMS process that is saved to a text file:

[root@london1 ˜]# pidof ora_lms0_PROD1
20532
[root@london1 ˜]# strace -p 20532 -o lms_strace.txt
Process 20532 attached - interrupt to quit
Process 20532 detached

If you examine the text file, you can observe that, on host london1, the LMS process is using the sendmsg and recvmsg system calls. Also, the process is communicating with the private interconnect address on london2 on 192.168.1.2:

sendmsg(12, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42297), sin_addr=inet_addr("192.168.1.2")}, msg_iov(3)=
[{"4321273pMRON3206X353f"..., 76}, {"1?275E377177X2210@275E37717710k,K", 28}, {"v3032333426v25517:-221"
..., 88}], msg_controllen=0, msg_flags=0}, 0) = 192
times({tms_utime=12192, tms_stime=6618, tms_cutime=0, tms_cstime=0})
= 436799560
getrusage(RUSAGE_SELF, {ru_utime={121, 924464}, ru_stime={66, 187937},
 ...}) = 0
times({tms_utime=12192, tms_stime=6618, tms_cutime=0, tms_cstime=0})
= 436799560
times({tms_utime=12192, tms_stime=6618, tms_cutime=0, tms_cstime=0})
= 436799560
poll([{fd=16, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=20, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 30) = 1 ([{fd=20, revents=POLLIN|POLLRDNORM}])
recvmsg(20, {msg_name(16)={sa_family=AF_INET, sin_port=htons(19552), sin_addr=inet_addr("192.168.1.2")},

netstat, ss, and tcpdump

You can use the netstat tool to display information related to the networking configuration and performance of your system, from routing tables to interface statistics and open ports. By default, netstat displays a list of all open sockets on the system. However, a wide variety of command-line options can be given to vary the details shown.

Looking at Interface Statistics

One form of output that you can produce with netstat relies on the -i argument to display interface statistics. This output shows the statistics for a typical RAC configuration:

[root@london1 ˜]# netstat -i
Kernel Interface table
Iface   MTU Met    RX-OK RX-ERR RX-DRP RX-OVR  TX-OK TX-ERR TX-DRP  TX-OVR Flg
eth0   1500   0   137371      0      0      0   582037      0      0    0 BMRU
eth0:1 1500   0      - no statistics available -                          BMRU
eth0:2 1500   0      - no statistics available -                          BMRU
eth1   1500   0  5858628      0      0      0  5290923      0      0    0 BMRU
lo    16436   0   991251      0      0      0   991251      0      0    0 LRU

In addition to the interface details, this command also provides information on the number of packets transmitted and received. You can also combine netstat with the ifconfig command to show errors and dropped packets. The following two examples for eth0 and eth0:1 confirm that, as a VIP address, eth0:1 shares the same hardware configuration as eth0. Therefore, in the netstat example, statistics are not duplicated for the interfaces used for the VIP configuration:

[root@london1 ˜]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:04:23:DC:29:50
          inet addr:172.17.1.101  Bcast:172.17.255.255  Mask:255.255.0.0
          inet6 addr: fe80::204:23ff:fedc:2950/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:137503 errors:0 dropped:0 overruns:0 frame:0
          TX packets:627169 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:34943895 (33.3 MiB)  TX bytes:220187041 (209.9 MiB)
          Memory:b8820000-b8840000
[root@london1 ˜]# ifconfig eth0:1
eth0:1    Link encap:Ethernet  HWaddr 00:04:23:DC:29:50
          inet addr:172.17.1.209  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Memory:b8820000-b8840000

The preceding information can assist you in diagnosing issues that you may suspect are resulting in poor network performance due to hardware errors. You can also observe continually updated values using the -c argument. Most importantly, you should see values in the RX-OK and TX-OK fields increasing on all interfaces as network traffic is communicated, with zero to low numbers in all of the other fields. In particular, increasing values in the RX-ERR and TX-ERR fields is an indication of a possible fault that requires further investigation.

Summary Statistics

For additional diagnostic information, you can run netstat with the -s argument to produce a summary report on statistics for all protocols configured on the system. For Cache Fusion traffic on Linux, you should pay particular attention to the UDP protocol-related information on the packets sent and received, as well as whether packet-receive errors are evident.

Listening Socket Statistics

The default output of netstat does not include listening sockets; these can be shown with the -l option. However, you will be more likely to prefer to display all established and listening socket-related information at the same time. You can accomplish this with the -a argument. The output of netstat -a can be somewhat lengthy; in particular, all information under the section Active Unix domain sockets relates to interprocess communication on the local host, and it is not network related. To restrict the output to network activity, you may also provide the additional --inet argument, as in this example:

[root@london1 ˜]# netstat --inet -a | more
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address      State

tcp        0      0 localhost.locald:bootserver *:*                  LISTEN
tcp        0      0 localhost.localdomain:2208  *:*                  LISTEN
tcp        0      0 *:cypress-stat              *:*                  LISTEN
tcp        0      0 192.168.1.1:59585           *:*                  LISTEN
tcp        0      0 192.168.1.1:62018           *:*                  LISTEN
tcp        0      0 london1.example.com:49795   *:*                  LISTEN
tcp        0      0 192.168.1.1:30056           *:*                  LISTEN
tcp        0      0 *:59468                     *:*                  LISTEN
tcp        0      0 192.168.1.1:62189           *:*                  LISTEN
tcp        0      0 *:sunrpc                    *:*                  LISTEN
tcp        0      0 172.17.1.209:ncube-lm       *:*                  LISTEN
tcp        0      0 london1.example.co:ncube-lm *:*                  LISTEN
tcp        0      0 172.17.1.208:ncube-lm       *:*                  LISTEN

The --inet argument provides a significantly more readable display and a snapshot of all network-related activity on the system. Within the fields, Proto refers to the protocol, which means we can observe the RAC-related communication established under the UDP protocol. As their names suggest, the Recv-Q and Send-Q fields relate to the receiving and sending queues, so they should almost always be zero. If these values are increasing—and increasing for the UDP protocol in particular—then you have evidence that your interconnect cannot sustain your desired workload. The Local address field shows your hostname and port number. Similar to the foreign address of the host to which you are connecting, this field will be *:* until a connection is established. The State field will usually show LISTEN or ESTABLISHED for the TCP protocol; however, UDP is a stateless protocol, so these connections have no state entries. If you also provide the -n argument, no name lookups will be done, and IP addresses for all connections will be displayed.

Looking up Well-Known Ports

If a port is defined as a well-known port in the /etc/services file, the port number will be replaced by the name. Referring to /etc/services, you can see that the port number shown as ncube-lm is in fact the standard Oracle listener port number of 1521:

[root@london1 root]# cat /etc/services | grep ncube-lm
ncube-lm        1521/tcp    # nCube License Manager
ncube-lm        1521/udp    # nCube License Manager

If you change this file so it is more meaningful for diagnosing Oracle network services, the output will be reflected the next time you run netstat, without having to restart any of the services. However, it is important to be aware that, from a strict standpoint, ncube-lm is the correct well-known port for 1521, as defined at the following location: http://www.iana.org/assignments/port-numbers.

Reporting on Socket Statistics Using ss

As an alternative to netstat, you can use the ss utility to report socket statistics. For example, the ss -l command displays listening sockets in a manner similar to that observed with netstat previously. Using ss without further arguments lets you rapidly determine the established connections on your system:

[root@london1 ˜]# ss
State       Recv-Q Send-Q        Local Address:Port          Peer Address:Port
ESTAB       0      0             127.0.0.1:61876            127.0.0.1:6100
ESTAB       0      0             127.0.0.1:61861            127.0.0.1:6100
ESTAB       0      0             127.0.0.1:61864            127.0.0.1:6100
ESTAB       0      0             172.17.1.208:1521          172.17.1.102:39393
ESTAB       0      0             172.17.1.209:1521          172.17.1.209:11402
ESTAB       0      0             172.17.1.209:25333         172.17.1.209:1521
ESTAB       0      0             127.0.0.1:2016             127.0.0.1:10911
ESTAB       0      0             172.17.1.208:1521          172.17.1.208:16542
ESTAB       0      0             127.0.0.1:6100             127.0.0.1:61876
ESTAB       0      0             172.17.1.101:62822         172.17.1.203:1521
ESTAB       0      0             172.17.1.208:16542         172.17.1.208:1521
ESTAB       0      0             127.0.0.1:6100             127.0.0.1:61861
ESTAB       0      0             127.0.0.1:6100             127.0.0.1:61864
ESTAB       0      0             127.0.0.1:10911            127.0.0.1:2016
ESTAB       0      0             172.17.1.209:11402         172.17.1.209:1521
ESTAB       0      0             172.17.1.209:1521          172.17.1.209:25333
ESTAB       0      0             172.17.1.101:17046         172.17.1.102:11585
ESTAB       0      0             192.168.1.1:35959          192.168.1.2:21965
ESTAB       0      0             192.168.1.1:61659          192.168.1.2:27582

Capturing and Displaying Network Packets

You should consider using the tcpdump command for detailed analysis of network traffic. This command's functionality is similar to that provided by strace for an application communicating with the operating system kernel. tcpdump enables you to capture and display the network packets running on the entire system or a particular interface. The tcpdump command's -D option will display the interfaces available to you, as in this example:

[root@london1 ˜]# tcpdump -D
1.eth0
2.eth1
3.any (Pseudo-device that captures on all interfaces)
4.lo

The following example shows the default summary information you see when running tcpdump against the private interconnect interface. Specifically, it shows the packets being transferred across this network:

[root@london1 ˜]# tcpdump -i 2 | more
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
11:43:36.045648 IP 192.168.1.2.7016 > 192.168.1.1.sds-admin:UDP, length 192
11:43:36.046217 IP 192.168.1.2.19552 > 192.168.1.1.26976:UDP, length 224
11:43:36.046237 IP 192.168.1.1.sds-admin > 192.168.1.2.19552:UDP, length 192
11:43:36.046279 IP 192.168.1.1.sds-admin > 192.168.1.2.asc-slmd:UDP,length 256
11:43:36.046368 IP 192.168.1.2.7016 > 192.168.1.1.sds-admin:UDP, length 192
11:43:36.047215 IP 192.168.1.2.19552 > 192.168.1.1.26976:UDP, length 224
11:43:36.047231 IP 192.168.1.1.sds-admin > 192.168.1.2.19552:UDP, length 192
11:43:36.047260 IP 192.168.1.1.sds-admin > 192.168.1.2.asc-slmd:UDP,length 256
11:43:36.047413 IP 192.168.1.2.7016 > 192.168.1.1.sds-admin:UDP, length 192
11:43:36.047762 IP 192.168.1.2.11403 > 192.168.1.1.20890:UDP, length 520
11:43:36.047784 IP 192.168.1.1.18929 > 192.168.1.2.11403:UDP, length 192
11:43:36.047863 IP 192.168.1.1.18929 > 192.168.1.2.22580:UDP, length 8328
11:43:36.047865 IP 192.168.1.1 > 192.168.1.2: udp
11:43:36.047867 IP 192.168.1.1 > 192.168.1.2: udp
11:43:36.047868 IP 192.168.1.1 > 192.168.1.2: udp
11:43:36.047869 IP 192.168.1.1 > 192.168.1.2: udp
11:43:36.047870 IP 192.168.1.1 > 192.168.1.2: udp
11:43:36.048689 IP 192.168.1.2.11403 > 192.168.1.1.20890:UDP, length 448
11:43:36.048704 IP 192.168.1.1.18929 > 192.168.1.2.11403:UDP, length 192
11:43:36.048754 IP 192.168.1.1.18929 > 192.168.1.2.22580:UDP, length 8328

Similar to the strace -o option, the -w option can be used to write the data to an output file. Subsequently, the -r option can be used to read from that file, while -A can be used to print the contents of each packet.

iostat

iostat is the first of a number of utilities we will discuss that are installed with the sysstat RPM package. Other utilities we will discuss include mpstat and sar. The iostat utility also displays information related to CPU utilization, but it focuses on providing detailed I/O statistics. Like vmstat, iostat can be run without any command-line arguments to report statistics for average CPU utilization and disk devices since the most recent boot time. The format of the CPU utilization contains the same fields we have seen with top and vmstat. The disk statistics show the device name, the number of I/O operations per second, the number of 512-byte blocks read and written per second, and the total number of 512-byte blocks read and written. iostat can also be supplied with one or two numerical arguments to represent the interval between sampling periods and the number of sampling periods in total. You may also specify statistics for a specific device using the -p argument, such as -p sde for device sde. If you only wish to view disk utilization information, you can use the -d option. Alternatively, you can use the -c option to view information for the CPU only. The -k option displays disk information in kilobytes, as opposed to blocks. The following example shows the results from running iostat against an individual disk device:

[root@london1 ˜]# iostat -p sde 3 10
Linux 2.6.18-164.el5 (london1.example.com)      02/05/2010

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.56    0.00    0.24    0.42    0.00   98.77

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sde              46.02       376.57       676.68   26774930   48113635
sde1            389.43         0.54        63.60      38074    4522265

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          15.52    0.00    2.88   17.72    0.00   63.88

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sde            4122.00     20955.67     72453.00      62867     217359
sde1          21653.33         0.00      1863.67          0       5591

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          25.88    0.00    4.75   20.79    0.00   48.58

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sde            3096.33      9361.67     58477.67      28085     175433
sde1           9752.00         0.00      2734.67          0       8204

When using iostat to observe disk statistics in a RAC environment, you should be keenly aware of the infrastructure that lies between the operating system and the actual disk devices. For example, the levels of abstraction can range from multipathing device drivers and host bus adapters to cache on the storage and a disk RAID configuration. The disk devices you're most interested in using are shared between all of the nodes in the cluster, and any useful information that you can derive on any individual node is likely to be limited. Therefore, iostat may prove useful in providing a highly generalized overview of disk activity on the system; however, there is no substitute for using the specialized storage analysis tools provided by the vendor of your chosen storage subsystem.

mpstat

By default, the mpstat command shows a CPU utilization report similar to that produced by iostat for all statistics since boot time. It also includes an additional field that shows the number of interrupts per second. mpstat also accepts the same number and type of numeric arguments as vmstat and iostat, which it uses to produce output at sampled intervals, as in this example:

[root@london1 ˜]# mpstat 3 10
Linux 2.6.18-164.el5 (london1.example.com)      02/05/2010

11:38:12 AM  CPU  %user  %nice   %sys %iowait   %irq  %soft %steal  %idle    intr/s
11:38:15 AM  all  18.47   0.00   2.54   15.76   0.21   0.76   0.00  62.25   6977.00
11:38:18 AM  all  28.65   0.00   3.44   22.62   0.25   1.19   0.00  43.85   9674.75
11:38:21 AM  all  22.21   0.00   2.71   20.09   0.17   0.93   0.00  53.88   7495.99
11:38:24 AM  all  30.91   0.00   3.78   20.68   0.30   1.44   0.00  42.89  10392.59
11:38:27 AM  all  31.62   0.00   3.78   20.37   0.34   1.23   0.00  42.66  10206.08
11:38:30 AM  all  12.40   0.00   1.69   19.35   0.21   0.76   0.00  65.58   6710.37
11:38:33 AM  all  17.24   0.00   2.33   15.97   0.21   0.76   0.00  63.49   8070.23
11:38:36 AM  all  27.96   0.00   3.65   18.33   0.30   1.15   0.00  48.62   8732.89
11:38:39 AM  all  17.35   0.00   2.42   14.76   0.17   0.72   0.00  64.59   7200.34
11:38:42 AM  all  30.41   0.00   3.73   14.89   0.25   1.19   0.00  49.53   9328.86
Average:     all  23.72   0.00   3.01   18.28   0.24   1.01   0.00  53.74   8475.07

By default, mpstat reports CPU statistics averaged for all processors; however, the most significant difference compared to iostat is that mpstat uses the -P argument in conjunction with either the CPU number starting at 0 or with -P ALL, which displays output for all processors on an individual basis. When analyzing CPU performance with mpstat or other monitoring tools, you need to keep in mind that if you have a system equipped with multicore CPUs, each CPU core will be presented to the monitoring tool as a distinct CPU, even though the cores share some system resources (see Chapter 4 for more details on multicore CPUs). Similarly, Intel's hyperthreaded CPUs will also present each CPU physically installed in the system as two CPUs for each physical core. This enables processes to be scheduled by the Linux operating system simultaneously with the same core. /proc/cpuinfo should be your first reference for CPU architecture (Chapter 4 explains how to precisely map the CPU architecture to the representation by the operating system). The following example shows an extract of the first processor of /proc/cpuinfo:

[root@london8 ˜]# cat /proc/cpuinfo | more
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 46
model name      : Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz
stepping        : 5
cpu MHz         : 2261.066
cache size      : 24576 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp
lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr
popcnt lahf_lm
bogomips        : 4522.13
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management: [8]

sar and kSar

The system activity reporter (sar) is a powerful tool that can encompass virtually all of the performance information generated by the other performance tools discussed in this chapter. In fact, some of the statistics from sar may look familiar to users of Oracle EM because sar underpins most of the host-based performance views. This is why the sysstat package must be installed on managed targets.

Configuring sar

As its name suggests, the system activity reporter is the front-end reporting tool. This tool is accompanied by the system activity data collector (sadc). Reports can be generated by sar in an interactive manner or written to a file for longer-term data collection. When you install the sysstat package, it sets sadc to run periodically by configuring the sa1 script from the cron scheduled script, /etc/cron.d/sysstat, as in this example:

[root@london1 root]# cat /etc/cron.d/sysstat
# run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib/sa/sa1 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib/sa/sa2 -A

By default, this script is run every ten minutes, capturing all system statistics for a one-second period. Next, the script appends the data to the current data file in the /var/log/sa directory, where the file is named sa, with a suffix that corresponds to the current date, as in this example:

[root@london1 sa]# ls
sa01  sa02  sa03  sa04  sa05  sar01  sar02  sar03  sar04

At the same location as sa1, you can find the file sa2, which by default runs once per day. sa2 runs sar to generate a full report on all of the data captured during the previous day by sadc.

A sar report presents system performance data divided into 17 separate sections. Each section contains data related to a specific aspect of system performance; this information is ordered by time throughout a 24-hour period, based on the ten-minute collection interval.

Invoking sar Directly

The standard statistics collection is useful for long-term performance monitoring and capacity planning trending activities; however, the one-second collection period at ten-minute intervals may not be sufficient for pinpointing specific performance issues. For this reason, you can also invoke sar directly to produce performance information on one or more of the specific performance-related areas to the screen.

This interactive performance requires two numerical arguments: one for the interval between sampling periods and one for the number of sampling periods in total. sar is different from the statistics commands that we have already seen, such as vmstat. For example, if you specify just one numerical argument, sar will report statistics for the time interval specified by the argument once, and then exit. You may also provide arguments to specify the type of performance information to view. If you do not provide any arguments, by default you will be shown performance information for all CPUs. The following extract shows the first output of the CPU performance information for a three-second sampling period, which will be collected ten times:

[root@london1 sa]# sar 3 10
Linux 2.6.18-164.el5 (london1.example.com)      02/05/2010

12:01:36 PM       CPU     %user     %nice  %system  %iowait   %steal    %idle
12:01:39 PM       all     12.30      0.00     3.31    19.51     0.00    64.89
12:01:42 PM       all     14.53      0.00     3.82    22.78     0.00    58.86
12:01:45 PM       all     14.98      0.00     3.70    23.53     0.00    57.79
12:01:48 PM       all     14.39      0.00     3.91    24.08     0.00    57.62
12:01:51 PM       all     13.65      0.00     3.57    23.35     0.00    59.42
12:01:54 PM       all     14.40      0.00     3.57    22.81     0.00    59.22
12:01:57 PM       all     12.95      0.00     3.86    24.19     0.00    59.00
12:02:00 PM       all     15.86      0.00     3.14    20.40     0.00    60.60
12:02:03 PM       all      8.93      0.00     0.21     0.72     0.00    90.14
12:02:06 PM       all     15.52      0.00     4.00    20.79     0.00    59.69
Average:          all     13.75      0.00     3.31    20.21     0.00    62.73

You can view additional or alternative performance information by providing other arguments, such as sar -n for network statistics or sar -b for I/O statistics. The full range of options is detailed in the sar man page. To produce performance information on all sections interactively, you can call sar -A; however, be aware that the output is extensive. In conjunction with sar -A, you may also find the -o option useful in directing the output to a file. The default file location is the same as the regularly sampled sar data. Therefore, we recommend that you specify a file name for detailed sar performance analysis work. For example, the following command collects all sar statistics at three-second intervals for a five-minute period into the london1.sa file:

[root@london1 root]# sar -A -o london1.sa 3 100

The file generated is in the sar binary format. This means sar will need to read the results file at a later point, which can be accomplished using the -f option:

[root@london1 root]# sar -f london1.sa

As you would expect, using the -f option excludes also using the -o option. However, it accepts the same command-line arguments, such as when called in an interactive manner. Thus, the following example shows the CPU-information only:

[root@london1 root]# sar -A -f london1.sa

To display all of the information collected in the file, you will need to specify the -A option.

Graphing the Results

The text-based sar output provides you with all of the recorded performance information you require. However, simply browsing sar -A output may prove difficult when attempting to diagnose any system performance issues that have occurred.

Fortunately, there are a number of tools available for graphing the output from sar. For example, the Interactive System Activity Grapher (isag) utility is available for graphing the data recorded in sar files. isag is no longer included automatically with the systtat RPM package, primarily due to its additional dependence on the gnuplot package. However, you can easily download and install the latest versions of isag and gnuplot to view your sar statistics, and isag is still included with current versions of the sysstat source code.

Alternatively, the kSar tool can be used to graph captured sar information; you can download this tool at http://ksar.atomique.net/. To use kSar, unzip the downloaded zip file, change the directory to the extracted kSar directory, and run the tool with the following command:

[root@london1 kSar-5.0.6]# java -jar kSar.jar

In the graphical interface, click the Data menu option and select Run local command to display the dialog window shown in Figure 13-1.

Executing a local command from kSar

Figure 13.1. Executing a local command from kSar

Next, specify the command to extract the sar data from your captured file, as in this example:

sar -A -f /var/log/sa/london1.sa

If the extraction is successful, after a short period of time kSar will report that the data import is finished. It will also display summary information, as well as potential system bottlenecks. The example shown in Figure 13-2 reports that CPU utilization is more than 25%, which makes it worth investigating further.

A summary view provided by kSar

Figure 13.2. A summary view provided by kSar

At this point, you can select from the options in the Menu tab on the left of the screen to display information for particular areas. Subjects you can drill down on include I/O, interface traffic by interface, and CPU utilization. Figure 13-3 shows an example summary of CPU utilization across the capture period.

CPU utilization summarized across a reporting period

Figure 13.3. CPU utilization summarized across a reporting period

Oracle Cluster Health Monitor

The Oracle Cluster Health Monitor is an Oracle-provided tool for monitoring resource utilization on a cluster basis. In particular, the Oracle Cluster Health Monitor runs in one of two modes. In the first, it observes the system in real time. In the second, it collects data in a Berkeley DB repository on a node-by-node basis, enabling the review of data collected over time. This data can be used to pinpoint the causes of performance issues.

Installing the Oracle Cluster Health Monitor

The Oracle Cluster Health Monitor for Linux can be downloaded from the Oracle Technology Network web site at the following location: www.oracle.com/technology/products/database/clustering/ipd_download_homepage.html. The resulting downloaded is called crfpack-linux.zip. To install the tool, begin by creating a dedicated user for the tool on all nodes in the cluster. The following example illustrates how to create crfuser on the london1 node:

[root@london1 ˜]# useradd -g oinstall crfuser
[root@london1 ˜]# passwd crfuser
Changing password for user crfuser.
New UNIX password:
Retype new UNIX password:
passwd: all authentication tokens updated successfully.

Once the user has been created on all nodes, secure shell (ssh) must be configured without password prompts or warnings received when connecting between hosts. This can be performed with the same manual configuration steps required for configuring ssh for the oracle user (see Chapter 6 for information on the steps required to do this). Once you have tested ssh connectivity between nodes, copy and unzip the file crfpack-linux.zip into the /home/crfuser/ directory, designating ownership by the crfuser:

[crfuser@london1 ˜]$ unzip crfpack-linux.zip
[crfuser@london1 ˜]$ ls
admin  bin  crfpack-linux.zip  install  jdk  jlib  lib  log  mesg

Before installing the software, it is necessary to have a non-root file system available on which you can create the Berkeley DB database. If you opt for a default file system configuration (as we recommend), then you will not have a non-root file system available. Therefore, you can either mount a directory from an external source such as iSCSI; or, if you are using ASM, you can create an ACFS file system (see Chapter 9 for more information on how to do this). In the example shown in Figure 13-4, a 10GB ACFS file system is created for a two-node cluster, allocating 5GB of storage per node for that two-node cluster.

A 10GB ACFS file system for the Oracle Cluster Health Monitor

Figure 13.4. A 10GB ACFS file system for the Oracle Cluster Health Monitor

You must then mount the ACFS file system on all nodes in the cluster:

[root@london1 ˜]# /sbin/mount.acfs -o all
[root@london1 ˜]# df -h
Filesystem         Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                   886G   19G  822G   3% /
/dev/sda1           99M   13M   82M  14% /boot
tmpfs              7.9G  192M  7.7G   3% /dev/shm
/dev/asm/crfdb-61   10G   85M   10G   1% /u01/app/oracle/acfsmounts/data_crfdb

Next, you need to make a separate directory for the Berkeley DB Database for all nodes in the cluster. Note that, although the file system is shared between the nodes, the Berkeley DB Database is not cluster-aware, so it cannot be shared between nodes. The following example creates the directory for the first node only:

[crfuser@london1 install]$ mkdir 
> /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb1

On the first node, run the installation script from the install directory as the crfuser. Do this in conjunction with the -i option, specifying the nodes to be installed, the location of the Berkeley DB database, and the name of the master node, as shown here:

[crfuser@london1 install]$ ./crfinst.pl -i london1,london2 -b /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb1 -m london1

Performing checks on nodes: "london1 london2" ...
Assigning london2 as replica

Generating cluster wide configuration file...

Creating a bundle for remote nodes...

Installing on nodes "london2 london1" ...

Configuration complete on nodes "london2 london1" ...

Once the initial installation has completed on the first node, you can finish the installation by rerunning the install script as the root user with the -f option on all nodes, including the first node in the cluster, where you specify the Berkeley DB directory for each node. If the Berkeley DB directory is local, then it can have the same name on each node. In this example, which uses ACFS, the directory name is distinct per node. For example, the installation is completed as follows on node 1:

[root@london1 install]# ./crfinst.pl -f -b /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb1
Removing contents of BDB Directory
/u01/app/oracle/acfsmounts/data_crfdb/oracrfdb1

Installation completed successfully at /usr/lib/oracrf...

Similarly, the installation is completed as follows on node 2:

[root@london2 install]# ./crfinst.pl -f -b  /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb2/
Removing contents of BDB Directory
/u01/app/oracle/acfsmounts/data_crfdb/oracrfdb2/

Installation completed successfully at /usr/lib/oracrf...

A log of installation activity is maintained in the crfinst.log file in the crfuser home directory.

Starting and Stopping the Oracle Cluster Health Monitor

After installation is complete, the Oracle Cluster Health Monitor can be started on all nodes with the /etc/init.d/init.crfd script, as in this example:

[root@london2 init.d]# ./init.crfd enable

You can verify a successful startup by issuing the command again with the status argument:

[root@london1 init.d]# ./init.crfd status

OSysmond running with PID=3571.
OLoggerd running with PID=3623.

oproxyd running with PID=3626.

To stop the Oracle Cluster Health Monitor from running, use init.crfd with the disable argument. disable is preferable to the stop argument in this case because the daemons will be restarted when stop is used.

Understanding the Architecture

The Oracle Cluster Health Monitor starts three daemon processes: osysmond, ologgerd, and oproxyd. osysmond collects the monitoring data from the local system, while ologgerd receives the data from all nodes and populates the Berkeley DB database. Thus, ologgerd is only active on the master node with another node acting as a standby. You may also observe that the Berkeley DB database directories are now populated with data:

[root@london1 data_crfdb]# ls *
lost+found:

oracrfdb1:
crfalert.bdb  crfcpu.bdb     crfts.bdb  __db.003  __db.006
crfclust.bdb  crfhosts.bdb   __db.001   __db.004  log.0000000001
crfconn.bdb   crfloclts.bdb  __db.002   __db.005  london1.ldb

oracrfdb2:
crfalert.bdb  crfloclts.bdb  __db.002  __db.006
crfclust.bdb  crfrep.bdb     __db.003  log.0000000001
crfcpu.bdb    crfts.bdb      __db.004  london2.ldb
crfhosts.bdb  __db.001       __db.005  repdhosts.bdb

Installing the Client-Side GUI

In addition to the server-side installation, it is also possible to use the same installation software to install a client-side graphical interface. Oracle recommends installing this graphical interface on a separate node from the cluster. On the server, the daemon process oproxyd listens for network connections, such as connections from this interface. The client installation should be performed as the root user and not the crfuser, as in this example:

[root@london5 install]# ./crfinst.pl -g

Installation completed sucessfully at /usr/lib/oracrf...

The installation locates the files in the /usr/lib/oracrf directory, and the graphical client can be run from within the bin directory by specifying the cluster node to connect to:

[root@london5 bin]# ./crfgui -m london1
Cluster Health Analyzer V1.10
        Look for Loggerd via node london1
 ...Connected to Loggerd on london1
Note: Node london1 is now up
Cluster 'MyCluster', 2 nodes. Ext time=2010-02-08 12:30:57
Making Window: IPD Cluster Monitor V1.10 on nehep1,
Logger V1.04.20091223, Cluster "MyCluster"  (View 0), Refresh rate: 1 sec

Viewing Current and Captured Activity

By default, the client-side graphical tool runs in real-time mode, displaying the current activity across the cluster (see Figure 13-5).

The current activity, as shown in the client-side GUI

Figure 13.5. The current activity, as shown in the client-side GUI

There is also the option to replay captured data from a previous period of time. For example, the following command replays the data from the previous five minutes:

[root@london5 bin]# ./crfgui -d "00:05:00" -m london1

In addition to the graphical tool, there is a command-line tool called oclumon that can be used either in real time or to browse historical data. oclumon can be called either from the server or from the client environment. By default, oclumon reports data in real time for the local node, as in this example:

[root@london1 ˜]# oclumon dumpnodeview

----------------------------------------
Node: london1 Clock: '02-09-10 12.08.01 UTC' SerialNo:77669
----------------------------------------
SYSTEM:
#cpus: 8 cpu: 8.66 cpuq: 3 physmemfree: 1506256 mcache: 3318060 swapfree: 17141300 ior: 12474 iow: 882 ios: 994 netr: 785.5 netw: 862.11 procs: 291 rtprocs: 26 #fds: 3752 #sysfdlimit: 6553600 #disks: 5 #nics: 3  nicErrors: 0

TOP CONSUMERS:
topcpu: 'oraclePROD1(8943) 7.75' topprivmem: 'ocssd.bin(4615) 224436' topshm: 'ora_mman_PROD1(26453) 462752' topfd: 'ocssd.bin(4615) 95' topthread: 'crsd.bin(4822) 54'

With the -allnodes option, oclumon will report data from all of the nodes in the cluster. Additionally, you can use the -v option for verbose output. oclumon also accepts querying for historical data. For example, the following command will report verbose output for all nodes for the previous five minutes:

[root@london1 ˜]# oclumon dumpnodeview -v -allnodes -last "00:05:00"

If oclumon is entered with no arguments, it returns a query prompt to enter commands interactively. From this, you can conclude that the strength of the Oracle Cluster Health Monitor lies in two things. First, it can review historical data to diagnose issues such as node evictions that may have occurred at a previous point in time. Second, it can provide a central location for recording performance monitoring data for all nodes in the cluster simultaneously.

OSWatcher

OSWatcher is not really a performance monitoring tool in its own right. Rather, it is a framework for capturing, storing, and analyzing data generated by a number of the standard command-line performance monitoring tools that we have previously covered in this chapter. OSWatcher also includes a utility called OSWg that graphs the captured data. As such, it offers similar functionality to the combination of sar and kSar. OSWatcher has been developed by Oracle, and it can be downloaded from the My Oracle Support web site as a .tar archive.

Installing OSWatcher

Follow these steps to install OSWatcher. Begin by extracting the archive into a directory of a user with permissions to run the standard command-line performance monitoring tools, such as the oracle user:

[oracle@london1 ˜]$ tar xvf osw3b.tar
./
./osw/
./osw/Exampleprivate.net
./osw/OSWatcher.sh
./osw/OSWatcherFM.sh
./osw/OSWgREADME.txt
./osw/README.txt
...

Starting OSWatcher

Next, start OSWatcher with the startOSW.sh script. This script can take a couple arguments. The first argument specifies a snapshot interval for how regularly the command-line tools should be run to gather the data. The second argument specifies the number of hours of data to collect. If no arguments are given, the default values are selected. The following example collects information using the default values:

[oracle@london1 osw]$ ./startOSW.sh
[oracle@london1 osw]$

Info...You did not enter a value for snapshotInterval.
Info...Using default value = 30
Info...You did not enter a value for archiveInterval.
Info...Using default value = 48

Testing for discovery of OS Utilities...

VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
NETSTAT found on your system.
TOP found on your system.

Discovery completed.

Starting OSWatcher v3.0   on Tue Feb 9 15:25:49 GMT 2010
With SnapshotInterval = 30
With ArchiveInterval = 48

OSWatcher - Written by Carl Davis, Center of Expertise, Oracle Corporation

Starting Data Collection...

osw heartbeat:Tue Fe b 9 15:25:49 GMT 2010
osw heartbeat:Tue Feb 9 15:26:19 GMT 2010
osw heartbeat:Tue Feb 9 15:26:49 GMT 2010
osw heartbeat:Tue Feb 9 15:27:19 GMT 2010

Stopping OSWatcher

To stop the data collection, run the stopOSW.sh script as the same user. At this point, the captured data can be browsed manually within the archive directory, as in this example:

[oracle@london1 archive]$ ls *
oswiostat:
london1.example.com_iostat_10.02.09.1500.dat
oswmeminfo:
london1.example.com_meminfo_10.02.09.1500.dat
...

Viewing Results Graphically

Alternatively, the extracted files include a Java utility for viewing the data in a graphical form. First, Java must be specified in the user's PATH environment variable, and the utility must be run as follows, by specifying the archive directory that contains the collected data:

[oracle@london1 osw]$ export PATH=$ORACLE_HOME/jdk/bin:$PATH
[oracle@london1 osw]$ java -jar oswg.jar -i archive

Starting OSWg V3.0.0
OSWatcher Graph Written by Oracle Center of Expertise
Copyright (c)  2008 by Oracle Corporation

Parsing Data. Please Wait...

Parsing file london1.example.com_iostat_10.02.09.1500.dat ...

Parsing file london1.example.com_vmstat_10.02.09.1500.dat ...

Parsing Completed.

When the parsing of data is complete, OSWg presents a number of options in the terminal window that you can use to choose the graph to display:

Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter 6 to Generate All CPU Gif Files
Enter 7 to Generate All Memory Gif Files
Enter 8 to Generate All Disk Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter T to Specify Different Time Scale
Enter D to Return to Default Time Scale
Enter R to Remove Currently Displayed Graphs
Enter P to Generate A Profile
Enter Q to Quit Program

Please Select an Option:

Selecting an option displays a graph in an individual window, as shown in Figure 13-6.

Results from OSWatcher

Figure 13.6. Results from OSWatcher

If you choose to run OSWatcher on a regular basis, it is important to be aware that, by default, the tool will not restart after the system reboots. Therefore, Oracle also provide an RPM package downloadable from My Oracle Support to install a service to start OSWatcher when the system boots.

nmon

In contrast to the Oracle Cluster Health Monitor and OSWatcher, nmon is a performance monitoring tool developed by IBM initially for AIX-based systems, but which has been extended to Linux environments. nmon was released as open source, and it is available for download from the following location: http://nmon.sourceforge.net. Available downloads include precompiled binaries and the standard x86 32-bit binary for Red Hat systems. Once downloaded and given executable permissions against, the binary file will run against the standard Oracle-validated RPM installation for both x86 and x86-64 environments.

By default, nmon runs in interactive mode; pressing the h key displays the information available (you can see nmon's menu in Figure 13-7). Figure 13-8 shows how nmon can provide detailed output for the processor, memory, network, and storage with multiple sections simultaneously.

nmon's menu

Figure 13.7. nmon's menu

For example, pressing n displays information related to network traffic, thus enabling the RAC DBA to observe the utilization levels of the cluster interconnect.

nmon's detailed statistics output

Figure 13.8. nmon's detailed statistics output

Additionally, nmon enables the capture of information over periods of time. Later, you can use this information to graph the data with spreadsheet tools. Thus, nmon is a useful open source tool that is easy to install, while also providing comprehensive coverage that complements the standard Linux- and Oracle-provided performance-monitoring tools.

Summary

In this chapter, we described some of the tools and techniques available to you for monitoring the performance of your RAC cluster at the Linux level. We also detailed the most common Linux command-line and graphical tools, which you can use to confirm your earlier findings gleaned from using the Oracle-specific performance monitoring tools described in Chapter 12.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset