In the previous chapter, we looked at Oracle performance monitoring tools. However, performance problems often occur outside the Oracle environment at the processor and memory, network, or storage level. It is therefore important to understand the information provided, not only by the Oracle performance monitoring tools, but also by the standard operating system monitoring tools available on Linux. You can use the information provided by these tools to support the findings from Oracle tools to fully diagnose RAC performance.
There are a number of third-party performance monitoring tools that operate in the Linux environment. However, our focus here is on the operating system monitoring tools available by default with Oracle Enterprise Linux that complement the environment available with the Oracle tools. In this category, we cover the default tools available in the base Oracle Enterprise Linux installation—namely, the command line CPU and memory diagnostics with uptime, last, ps
, free, ipcs, pmap, lsof, top
, vmstat
, and strace
; and network tools of netstat, ss
, and tcpdump
. Additionally, if you have installed and configured Oracle Enterprise Linux as detailed in Chapter 6, you will have run the Oracle Validated RPM. One dependency for the latter is the RPM package, sysstat
. sysstat
includes the following Linux performance monitoring tools: iostat, mpstat
, and sar
. Consequently, a default Oracle-validated Enterprise Linux environment includes a number of command-line tools that, if mastered, can rapidly and comprehensively give you insight into the system-level performance.
We also provide an overview of additional optional Oracle-provided Linux monitoring tools, as well as information on another open source tool you may wish to investigate. The Oracle tools are called Oracle Cluster Health Monitor
and OSWatcher
, respectively. The additional open source tool, which is provided by IBM, is called nmon
. These tools are easy to install, but provide both an alternative and complementary environment for monitoring Linux environments.
It is important to note that, as is the case with all software the performance monitoring tools covered in this section, the tools just mentioned all require system resources to run, and you should be aware of the level of resources required by each tool. This information should be considered when deciding upon your Linux performance monitoring toolset; therefore, we do not recommend running all of the tools detailed in this section at the same time. Instead, you should select the ones that will work best in your environment.
uptime
is a standard Linux command that reports the amount of time that a system has been running. The following snippet shows you how to use this command:
[root@london1 ˜]# uptime 15:36:11 up 3 days, 3:50, 4 users, load average: 0.13, 0.14, 0.10
uptime
provides information on node availability, and it is useful as a command of first resort in diagnosing and troubleshooting node evictions across a RAC cluster. uptime
also reports the system load over intervals of 1, 5, and 15 minutes.
In a similar vein, the last
command and its -x
argument provides a detailed log of system shutdowns and changes in run level, as in this example:
[root@london1 ˜]# last root pts/2 172.17.1.81 Fri Feb 5 09:32 still logged in root pts/1 london2.example. Thu Feb 4 16:09 still logged in root pts/1 london2.example. Thu Feb 4 16:04 - 16:05 (00:00) root pts/0 172.17.1.81 Thu Feb 4 16:00 still logged in reboot system boot 2.6.18-164.el5 Thu Feb 4 15:52 (17:53) root pts/2 172.17.1.81 Thu Feb 4 15:36 - down (00:13) root pts/1 172.17.1.81 Thu Feb 4 13:18 - down (02:31) root pts/3 172.17.1.81 Mon Feb 1 14:37 - down (3+01:11) root pts/1 172.17.1.81 Mon Feb 1 13:51 - 14:47 (00:55) root pts/2 172.17.1.81 Mon Feb 1 13:31 - 14:48 (01:17) root pts/1 172.17.1.81 Mon Feb 1 13:30 - 13:32 (00:02) root pts/0 172.17.1.81 Mon Feb 1 11:48 - down (3+04:01) reboot system boot 2.6.18-164.el5 Mon Feb 1 11:46 (3+04:02) root pts/0 172.17.1.81 Fri Jan 29 15:48 - down (00:09) reboot system boot 2.6.18-164.el5 Fri Jan 29 11:03 (04:54)
The ps
command is one of the most basic, yet essential tools for analyzing performance on a Linux system. At its simplest, ps
shows a list of processes; if called without arguments, it displays the list of processes running under the current session, as shown here:
[root@london1 ˜]# ps PID TTY TIME CMD 6969 pts/2 00:00:00 bash 7172 pts/2 00:00:00 ps
Fortunately, ps
can do a lot more than this. For example, it accepts a wealth of arguments to present process listings in almost every conceivable form. The arguments to ps
can take three forms: standard System V Unix-type options that must be preceded by a dash; BSD-type options that are not preceded by a dash; and GNU long options that are preceded by two dashes. In effect, you may use different combinations of arguments to display similar forms of output. The combination of arguments that you will use most regularly is that of a full listing of all processes that relies on the System V -ef
arguments. The following shows the first ten lines of output:
[root@london1 ˜]# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 Feb04 ? 00:00:02 init [3] root 2 1 0 Feb04 ? 00:00:00 [migration/0] root 3 1 0 Feb04 ? 00:00:00 [ksoftirqd/0] root 4 1 0 Feb04 ? 00:00:00 [watchdog/0] root 5 1 0 Feb04 ? 00:00:00 [migration/1] root 6 1 0 Feb04 ? 00:00:00 [ksoftirqd/1]
root 7 1 0 Feb04 ? 00:00:00 [watchdog/1] root 8 1 0 Feb04 ? 00:00:00 [migration/2] root 9 1 0 Feb04 ? 00:00:00 [ksoftirqd/2] root 10 1 0 Feb04 ? 00:00:00 [watchdog/2]
To learn more details about each process, you can use the ps -elf
command and its -l
argument to see a longer, more complete listing. You can pipe the output through grep
to restrict the number of lines returned, as in this example:
[root@london1 ˜]# ps -elf | grep smon | grep -v grep 0 S oracle 13172 1 0 78 0 - 119071 - Feb01 ? 00:00:00 asm_smon_+ASM1 0 S oracle 23826 1 0 75 0 - 1727611 - Feb01 ? 00:00:14 ora_smon_PROD1
However, you also have an alternative to using ps
with grep
. pgrep
can provide you with the same functionality in a single command. For example, the following extract uses the -flu
arguments to display the processes owned by the user oracle
:
[root@london1 ˜]# pgrep -flu oracle 6458 ora_pz97_PROD1 7210 ora_w000_PROD1 7516 ora_j000_PROD1 7518 ora_j001_PROD1 12903 /u01/app/11.2.0/grid/bin/oraagent.bin 12918 /u01/app/11.2.0/grid/bin/mdnsd.bin 12929 /u01/app/11.2.0/grid/bin/gipcd.bin 12940 /u01/app/11.2.0/grid/bin/gpnpd.bin 12970 /u01/app/11.2.0/grid/bin/diskmon.bin -d -f 12992 /u01/app/11.2.0/grid/bin/ocssd.bin 13138 asm_pmon_+ASM1
Another useful command is pidof
, which can be used to identify processes. It can even be used without arguments. If you know the name of a process, you can quickly find its corresponding process identifier with this snippet:
[root@london1 ˜]# pidof ora_smon_PROD1 23826
The free
command, the /proc
file system, the /meminfo
file system, and the ipcs
, pmap
, and lsof
commands are useful in diagnosing RAC performance problems. The following sections walk through how to use of each of these items.
The free
command displays the status of your system's virtual memory at the current point in time. There are three rows of output: the Mem:
row shows the utilization of the physical RAM installed in the machine; the -/+ buffers/cache:
row shows the amount of memory assigned to system buffers and caches; and the Swap:
row shows the amount of swap space used.
The next example shows a system with 16GB of RAM after an Oracle RAC instance has started. At first, it may appear that nearly 13GB has been consumed, with just over 3GB free. However, with free
, we can see that the operating system actually assigns memory to buffers and cache if it is not being used for any other purpose; therefore, the actual figure representing free memory is more than 4GB. If you are using any third-party system-monitoring tool that reports memory utilization is high on a Linux system, you should always confirm this with free
to ensure that the memory is not simply free in buffers and cache instead:
root@london1 ˜]# free total used free shared buffers cached Mem: 16423996 12813584 3610412 0 158820 1045908 -/+ buffers/cache: 11608856 4815140 Swap: 18481144 0 18481144
The preceding example also shows that the system is not using any of the configured swap space at this point in time. As we discussed in Chapter 6, unless you are creating a large number of processes on the system, swap space utilization should be minimal. If you monitor the memory utilization with free
, and an increasing amount of swap space is being consumed, this will have a significantly negative performance impact.
By default, the values for free
are expressed in kilobytes; however, you can specify the display to be used in bytes, megabytes, or gigabytes with the -b
,-m
, or -g
flag, respectively. The -s
flag can be used with an interval value to continually repeat the command according to the interval period. Alternatively, you can use the watch
command to refresh the display in place. By default, running watch free
will refresh in place every two seconds.
When working with Oracle, you should also be familiar with the output of /proc/meminfo
, which is the location from which the information for free
is derived. Within /proc/meminfo
, you can also see the amount of memory and swap that is free and used, and the amount of memory assigned to buffers and cache on an individual basis. In addition, /proc/meminfo
includes the configuration of huge pages, the setting of which we discuss in Chapter 6.
The following example of /proc/meminfo
shows the same system with a total of 16GB of RAM and 5,000 huge pages at 2MB each, which is a 10GB allocation in total. Of these, 3,442 huge pages remain as reserved after the Oracle instance has started. This indicates the difference between the number of pages not already used by the instance, but reserved for future use by the SGA and therefore not being available for standard small pages:
[oracle@london1 ˜]$ cat /proc/meminfo MemTotal: 16423996 kB MemFree: 4079840 kB Buffers: 29668 kB Cached: 771924 kB SwapCached: 0 kB Active: 1509184 kB Inactive: 456424 kB HighTotal: 0 kB HighFree: 0 kB
LowTotal: 16423996 kB LowFree: 4079840 kB SwapTotal: 18481144 kB SwapFree: 18481144 kB Dirty: 924 kB Writeback: 0 kB AnonPages: 1166644 kB Mapped: 192576 kB Slab: 39444 kB PageTables: 38872 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 21573140 kB Committed_AS: 6138472 kB VmallocTotal: 34359738367 kB VmallocUsed: 284172 kB VmallocChunk: 34359453879 kB HugePages_Total: 5000 HugePages_Free: 4345 HugePages_Rsvd: 3442 Hugepagesize: 2048 kB
In this output, it's important to notice that 4,345 huge pages remain free. By default, the pages will be used on demand, which means the number of pages free will drop as they are used during normal Oracle SGA related database activity, such as when caching data in the buffer cache. Alternatively, setting the Oracle parameter PRE_PAGE_SGA
to true
will ensure that each process pages the SGA on startup, and all required pages will be allocated on instance startup. If unused pages remain available, these can be freed by setting the vm.nr_hugepages
parameter to the utilized level (see Chapter 6 for more information on this).
When working on a system with a NUMA memory configuration, you should also be familiar with the meminfo
data reported on a per memory node basis. For example, the following shows Node 0 of a 4-node configuration. In this case, a quarter of the total 70,000 huge pages are allocated on this node, which indicates an even distribution of pages across the nodes:
root@london5 node]# cat */meminfo Node 0 MemTotal: 66036380 kB Node 0 MemFree: 28338244 kB Node 0 MemUsed: 37698136 kB Node 0 Active: 454512 kB Node 0 Inactive: 716248 kB Node 0 HighTotal: 0 kB Node 0 HighFree: 0 kB Node 0 LowTotal: 66036380 kB Node 0 LowFree: 28338244 kB Node 0 Dirty: 76 kB Node 0 Writeback: 0 kB Node 0 FilePages: 1010516 kB
Node 0 Mapped: 80612 kB Node 0 AnonPages: 169328 kB Node 0 PageTables: 15504 kB Node 0 NFS_Unstable: 0 kB Node 0 Bounce: 0 kB Node 0 Slab: 67836 kB Node 0 HugePages_Total: 17500 Node 0 HugePages_Free: 15247
Additional NUMA-related commands that can influence and tune this allocation, such as the numactl
and numastat
commands (you can learn more about these commands in Chapter 4). It is also important to understand, not only how the memory is allocated, but also how it is used by the Oracle instance.
Regardless of whether you are using NUMA-based memory allocation, a significant proportion of your system memory will be allocated as shared memory for the SGA. This is true whether you are using manual shared memory management, automatic shared memory management, or automatic memory management. The ipcs
command with the -m
argument can be used to display the configured shared memory segments on the system. The following example shows a single shared memory segment has been allocated for the SGA:
[oracle@london1 ˜]$ ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0xed304ac0 32768 oracle 660 4096 0 0x90c3be20 1277953 oracle 660 8592031744 42
The corresponding command of ipcrm
with the -M
argument can be used to manually delete shared memory segments by a user with the appropriate permissions. However, you should use pmap
and lsof
beforehand to identify the processes using the shared memory segment.
On an individual process basis, the pmap
command details the memory mapped by that particular process, including the total memory utilization by process. The -x
argument shows this information in an extended format. This can be used directly with a process number or in conjunction with the pgrep
command, as described previously. Doing so returns the process number of a particular process that you can identify by name. For example, the following output illustrates the pmap
information for a foreground process:
[root@london1 ˜]# pmap -x 21749 21749: oraclePROD1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) Address Kbytes RSS Anon Locked Mode Mapping 0000000000400000 155144 - - - r-x-- oracle 0000000009d81000 12404 - - - rwx-- oracle 000000000a99e000 280 - - - rwx-- [ anon ]
0000000015522000 556 - - - rwx-- [ anon ] 0000000060000000 8390656 - - - rwxs- 5 (deleted) 00000031a3600000 112 - - - r-x-- ld-2.5.so 00000031a381b000 4 - - - r-x-- ld-2.5.so 00000031a381c000 4 - - - rwx-- ld-2.5.so 00000031a3a00000 1332 - - - r-x-- libc-2.5.so 00000031a3b4d000 2048 - - - ----- libc-2.5.so
However, it is important to note that, as explained previously, some implementations of pmap
under Oracle Enterprise Linux do not show the full extent of the listing. For example, compare the preceding output to the following example from SUSE Linux, which shows more complete information:
reading1:˜ # pmap 7679 7679: oracle START SIZE RSS PSS DIRTY SWAP PERM MAPPING 08048000 136616K 9988K 683K 0K 0K r-xp /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle 105b2000 1004K 208K 63K 56K 0K rwxp /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle 106ad000 472K 320K 320K 320K 0K rwxp [heap] 20000000 309248K 76480K 71725K 42072K 0K rwxs /SYSVb0e4b134 b6dfe000 128K 128K 88K 88K 0K rwxp /dev/zero b6e1e000 384K 384K 0K 0K 0K rwxp /dev/zero ... b6e7e000 132K 132K 132K 132K 0K rwxp [anon] bfbb9000 84K 36K 36K 36K 0K rwxp [stack] ffffe000 4K 0K 0K 0K 0K r-xp [vdso] Total: 465608K 93988K 77876K 47508K 0K 8704K writable-private, 147440K readonly-private, 309464K shared, and 92528K referenced
In addition to reporting on shared memory, pmap
can also be used to identify the private memory utilized by individual foreground processes to troubleshoot where memory has been allocated across the system on a process-by-process basis. Additional process-based memory utilization information is also available underneath the /proc
directory and an individual process number. For example, the status information includes the following summary:
[root@london1 21749]# cat status Name: oracle State: S (sleeping) SleepAVG: 85% Tgid: 21749 Pid: 21749 PPid: 21748 TracerPid: 0 Uid: 500 500 500 500 Gid: 501 501 501 501 FDSize: 64 Groups: 500 501 VmPeak: 8608684 kB
VmSize: 218028 kB VmLck: 0 kB VmHWM: 28616 kB VmRSS: 28616 kB VmData: 2920 kB VmStk: 112 kB VmExe: 155144 kB VmLib: 12148 kB VmPTE: 380 kB StaBrk: 15522000 kB Brk: 155ad000 kB StaStk: 7fffac094010 kB
lsof
is an extensive command that lists the open files on the system. It can be used for diagnosing connectivity to a number of resources. For example, it provides information on the usage of standard files, shared memory segments, and network ports. Without arguments, the following example lists the processes under the oracle
user attached to the shared memory segment; the id 1277953 is identified from the output of ipcs
:
[root@london1 ˜]# lsof -u oracle | grep 1277953 oracle 20508 oracle DEL REG 0,13 1277953 /5 oracle 20510 oracle DEL REG 0,13 1277953 /5 oracle 20514 oracle DEL REG 0,13 1277953 /5 oracle 20516 oracle DEL REG 0,13 1277953 /5 oracle 20518 oracle DEL REG 0,13 1277953 /5 oracle 20520 oracle DEL REG 0,13 1277953 /5 oracle 20522 oracle DEL REG 0,13 1277953 /5 oracle 20524 oracle DEL REG 0,13 1277953 /5 ...
If you are also interested in the cached objects in the kernel, you can view them in the output of /proc/slabinfo
; however, you will most likely be interested only in specific entries, such as kiobuf
related to asynchronous I/O activity. In addition, a utility called slabtop
can display kernel slab information in real time. The form of the output of slabtop
is similar to that of the more general-purpose top
.
ps
and free
are static commands that return information about system processes and memory utilization within individual snapshots. However they are not designed to track usage over a longer period of time. The first tool we will look at with this monitoring capability is top
.
If top
is called without arguments, it will display output similar to the following result. It will also refresh the screen by default every two seconds, but without requiring that you use watch
to enable this functionality:
[root@london1 ˜]# top top - 11:15:54 up 19:24, 4 users, load average: 1.27, 0.51, 0.23
Tasks: 285 total, 1 running, 284 sleeping, 0 stopped, 0 zombie Cpu(s): 20.9%us, 2.8%sy, 0.0%ni, 49.2%id, 26.1%wa, 0.2%hi, 0.9%si, 0.0%st Mem: 16423996k total, 12921072k used, 3502924k free, 162536k buffers Swap: 18481144k total, 0k used, 18481144k free, 1073132k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24725 oracle 16 0 8413m 27m 22m S 22.6 0.2 0:15.33 oraclePROD1 (LOCAL=NO) 24735 oracle 15 0 8413m 27m 22m S 21.9 0.2 0:16.28 oraclePROD1 (LOCAL=NO) 24723 oracle 15 0 8413m 27m 22m S 19.3 0.2 0:17.44 oraclePROD1 (LOCAL=NO) 24727 oracle 15 0 8411m 25m 20m S 18.9 0.2 0:16.22 oraclePROD1 (LOCAL=NO) 24729 oracle 15 0 8413m 27m 22m S 18.6 0.2 0:17.17 oraclePROD1 (LOCAL=NO) 24733 oracle 15 0 8413m 27m 22m S 17.9 0.2 0:15.29 oraclePROD1 (LOCAL=NO) 24737 oracle 15 0 8413m 27m 22m S 17.6 0.2 0:14.97 oraclePROD1 (LOCAL=NO) 24731 oracle 15 0 8413m 27m 22m S 15.9 0.2 0:15.63 oraclePROD1 (LOCAL=NO) 24743 oracle 15 0 8411m 25m 20m S 13.0 0.2 0:12.59 oraclePROD1 (LOCAL=NO) 20546 oracle 16 0 8424m 38m 19m D 7.3 0.2 0:09.99 ora_dbw0_PROD1 20548 oracle 15 0 8427m 41m 37m S 6.3 0.3 0:14.29 ora_lgwr_PROD1 20532 oracle −2 0 8415m 31m 18m S 5.0 0.2 0:18.90 ora_lms0_PROD1 20536 oracle −2 0 8415m 31m 18m S 4.3 0.2 0:19.49 ora_lms1_PROD1
The top
display is divided into two main sections. Within the top-level section, the most important information in monitoring an Oracle RAC node is the load average, CPU states, and memory and swap space. The load average shows the average number of processes in the queue waiting to be allocated CPU time over the previous 1, 5, and 15 minutes. During normal operations, the load averages should be maintained at low values. If these values consistently exceed the processor core count of the server, this is an indication that the system load is exceeding capacity. When this is the case, there is the potential that the GCS background processes (LMSn
) could become starved of CPU time, resulting in a detrimental effect on the overall performance of the cluster.
The CPU states show the level of utilization for all of the CPUs installed on the system. The oracle
user workload will be shown as user time; however, there will be additional levels of system time and iowait
time related to Oracle activity. A high level of iowait
time may indicate that you should investigate the disk performance because the CPUs are spending the majority of their time simply waiting for I/O requests to be processed. An overall indicator of CPU is the idle value showing spare capacity on the system. A consistently low idle time in conjunction with a high load average provides additional evidence that the workload exceeds the ability of the system to process it.
The memory-related section displays information that bears a close resemblance to the output of free
.
The bottom-level section includes statistics related to the processes running on the system. You can use this section to pinpoint which processes are using most of the CPU and memory on the system. In terms of memory, as well as the total percentage utilization on the system, the VIRT
field shows how much memory an individual process has allocated, and the RSS
field (the Resident Set Size) shows how much memory the process is using at the current time. For Oracle processes, these values should ordinarily be at similar levels.
From the example top
output, we can see that the system is processing Oracle activity but is not under excessive workload at the present time.
top
is an interactive tool that accepts single-letter commands to tailor the display. For example, you may use the u
option to specify viewing processes solely for the oracle
user by typing u or oracle, or you may sort tasks by age by typing A
. Typing c
also lets you also display the full process listing, which is useful in identifying the Oracle processes utilizing the highest levels of CPU. You should remember not to neglect monitoring system process tasks. For example, observing the kswapd
process in top
output on a regular basis would indicate a potential performance impact from utilizing swap space.
An important aspect of top
is that, in addition to displaying information, you may also interact with the processes themselves, such as altering their relative priorities or killing them altogether. Therefore, the Help screen accessed by ?
is useful for familiarizing yourself with the capabilities of the tool. You can terminate top
by pressing the q
key or Ctrl+C.
As its name suggests, the vmstat
utility focuses on providing output about the usage of virtual memory. When called without arguments, vmstat
will output information related to virtual memory utilization since the system was last booted. Therefore, you are most likely to call vmstat
with two numerical arguments for the delay between sampling periods and the number of sampling periods in total. If you specify just one numerical argument, this will apply to the delay, and the sampling will continue until the command is canceled with Ctrl+C. For example, the following will produce ten lines of output at three-second intervals:
[root@london1 ˜]# vmstat 3 10 procs ----------memory-------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 7 0 3498932 162972 1079928 0 0 18 23 66 59 0 0 99 0 0 2 7 0 3495972 162972 1080028 0 0 13619 1277 8384 17455 8 3 66 23 0 0 9 0 3499452 162972 1080164 0 0 15457 1821 9120 18800 9 3 64 24 0 2 9 0 3498976 162972 1080240 0 0 16411 2451 9497 19562 11 3 61 25 0 2 7 0 3498712 162972 1080368 0 0 15881 8385 10625 21277 12 3 60 25 0 4 7 0 3498240 162972 1080480 0 0 14400 8495 10734 21287 12 4 58 26 0 1 7 0 3497916 162972 1080620 0 0 12734 16371 10947 21363 13 4 57 26 0 3 8 0 3503008 162972 1080668 0 0 9667 14266 9050 18520 11 3 57 28 0 3 8 0 3503232 162972 1080788 0 0 11739 2818 11426 22608 15 4 54 27 0 3 6 0 3502612 162972 1080848 0 0 10886 9531 11556 22593 16 4 52 28 0
Within the output, the first two fields under procs
show processes waiting for CPU runtime and processes that are in uninterruptible sleep state. A traditional implementation of vmstat
on many UNIX systems and earlier Linux versions also showed a w
field under the procs
section to indicate processes that are swapped out; however, entire processes are not swapped out under Linux, so the w
field is no longer included. As with top
, the next four fields under the memory
section should be familiar from the output of free
, which shows the amount of swap space in use, as well as the free and cached memory. The two fields under the swap
section show the amount of memory being swapped in and out of disk per second. On an Oracle system, we would expect these values and the amount of swap in use to show a low or zero value. The fields under io
show the blocks sent and received from block devices, and the fields under system
show the level of interrupts and context switches. In a RAC environment, the levels of interrupts and context switches can be useful in evaluating the impact of the CPU servicing network-related activity, such as interconnect traffic or the usage of network attached storage (NAS).
Finally, the cpu
section is similar to top
in that it displays the user, system, I/O wait, and idle CPU time. The cpu
section differs from top
by including this information for all CPUs on the system.
In addition to the default output, vmstat
also enables the display to be configured with a number of command-line options. For example, -d
displays disk statistics, and -p
shows the statistics for a particular disk partition specified at the command line. A summary of memory-related values can be given by the -s
option.
strace
is a tool that can be used for diagnostics when performance monitoring reveals either errors or performance issues with a particular command or process. For example, if you suspect a particular process is not responding, you can use strace
to determine the actions that the process is undertaking.
If the strace
command is not available on your system, it can be installed as part of the strace
RPM package from your install media:
[root@london1 ˜]# yum install strace Loaded plugins: security Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package strace.x86_64 0:4.5.18-5.el5 set to be updated ... Installed: strace.x86_64 0:4.5.18-5.el5 Complete!
As its names implies, strace
records and reports the system calls and signals of a process until the process exits. The information captured is either printed to the standard error channel or (more usefully) to a text file, the name of which is given as an argument to the -o
flag. One of the most powerful additional strace
options is available with the -e
flag, which enables the tracing of particular system calls or groups of system calls, such as those that are network related. You can use strace
in one of two ways. First, you can use it to precede a program run at the command line. Second, you can use -p
to specify a process to attach to in order to perform the trace. For example, the following snippet shows a trace of the LMS process that is saved to a text file:
[root@london1 ˜]# pidof ora_lms0_PROD1 20532 [root@london1 ˜]# strace -p 20532 -o lms_strace.txt Process 20532 attached - interrupt to quit Process 20532 detached
If you examine the text file, you can observe that, on host london1, the LMS process is using the sendmsg
and recvmsg
system calls. Also, the process is communicating with the private interconnect address on london2 on 192.168.1.2:
sendmsg(12, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42297), sin_addr=inet_addr("192.168.1.2")}, msg_iov(3)= [{"4321273p MRON 3 206X353f "..., 76}, {"1?275E377177 X 2 210@275E377177 10k,K", 28}, {" v 3032333426 v25517 :-221 " ..., 88}], msg_controllen=0, msg_flags=0}, 0) = 192 times({tms_utime=12192, tms_stime=6618, tms_cutime=0, tms_cstime=0})
= 436799560 getrusage(RUSAGE_SELF, {ru_utime={121, 924464}, ru_stime={66, 187937}, ...}) = 0 times({tms_utime=12192, tms_stime=6618, tms_cutime=0, tms_cstime=0}) = 436799560 times({tms_utime=12192, tms_stime=6618, tms_cutime=0, tms_cstime=0}) = 436799560 poll([{fd=16, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=20, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 30) = 1 ([{fd=20, revents=POLLIN|POLLRDNORM}]) recvmsg(20, {msg_name(16)={sa_family=AF_INET, sin_port=htons(19552), sin_addr=inet_addr("192.168.1.2")},
You can use the netstat
tool to display information related to the networking configuration and performance of your system, from routing tables to interface statistics and open ports. By default, netstat
displays a list of all open sockets on the system. However, a wide variety of command-line options can be given to vary the details shown.
One form of output that you can produce with netstat
relies on the -i
argument to display interface statistics. This output shows the statistics for a typical RAC configuration:
[root@london1 ˜]# netstat -i Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 137371 0 0 0 582037 0 0 0 BMRU eth0:1 1500 0 - no statistics available - BMRU eth0:2 1500 0 - no statistics available - BMRU eth1 1500 0 5858628 0 0 0 5290923 0 0 0 BMRU lo 16436 0 991251 0 0 0 991251 0 0 0 LRU
In addition to the interface details, this command also provides information on the number of packets transmitted and received. You can also combine netstat
with the ifconfig
command to show errors and dropped packets. The following two examples for eth0 and eth0:1 confirm that, as a VIP address, eth0:1 shares the same hardware configuration as eth0. Therefore, in the netstat
example, statistics are not duplicated for the interfaces used for the VIP configuration:
[root@london1 ˜]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:04:23:DC:29:50 inet addr:172.17.1.101 Bcast:172.17.255.255 Mask:255.255.0.0 inet6 addr: fe80::204:23ff:fedc:2950/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:137503 errors:0 dropped:0 overruns:0 frame:0 TX packets:627169 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:34943895 (33.3 MiB) TX bytes:220187041 (209.9 MiB) Memory:b8820000-b8840000 [root@london1 ˜]# ifconfig eth0:1
eth0:1 Link encap:Ethernet HWaddr 00:04:23:DC:29:50 inet addr:172.17.1.209 Bcast:172.17.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Memory:b8820000-b8840000
The preceding information can assist you in diagnosing issues that you may suspect are resulting in poor network performance due to hardware errors. You can also observe continually updated values using the -c
argument. Most importantly, you should see values in the RX-OK
and TX-OK
fields increasing on all interfaces as network traffic is communicated, with zero to low numbers in all of the other fields. In particular, increasing values in the RX-ERR
and TX-ERR
fields is an indication of a possible fault that requires further investigation.
For additional diagnostic information, you can run netstat
with the -s
argument to produce a summary report on statistics for all protocols configured on the system. For Cache Fusion traffic on Linux, you should pay particular attention to the UDP protocol-related information on the packets sent and received, as well as whether packet-receive errors are evident.
The default output of netstat
does not include listening sockets; these can be shown with the -l
option. However, you will be more likely to prefer to display all established and listening socket-related information at the same time. You can accomplish this with the -a
argument. The output of netstat -a
can be somewhat lengthy; in particular, all information under the section Active Unix domain sockets
relates to interprocess communication on the local host, and it is not network related. To restrict the output to network activity, you may also provide the additional --inet
argument, as in this example:
[root@london1 ˜]# netstat --inet -a | more Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 localhost.locald:bootserver *:* LISTEN tcp 0 0 localhost.localdomain:2208 *:* LISTEN tcp 0 0 *:cypress-stat *:* LISTEN tcp 0 0 192.168.1.1:59585 *:* LISTEN tcp 0 0 192.168.1.1:62018 *:* LISTEN tcp 0 0 london1.example.com:49795 *:* LISTEN tcp 0 0 192.168.1.1:30056 *:* LISTEN tcp 0 0 *:59468 *:* LISTEN tcp 0 0 192.168.1.1:62189 *:* LISTEN tcp 0 0 *:sunrpc *:* LISTEN tcp 0 0 172.17.1.209:ncube-lm *:* LISTEN tcp 0 0 london1.example.co:ncube-lm *:* LISTEN tcp 0 0 172.17.1.208:ncube-lm *:* LISTEN
The --inet
argument provides a significantly more readable display and a snapshot of all network-related activity on the system. Within the fields, Proto
refers to the protocol, which means we can observe the RAC-related communication established under the UDP protocol. As their names suggest, the Recv-Q
and Send-Q
fields relate to the receiving and sending queues, so they should almost always be zero. If these values are increasing—and increasing for the UDP protocol in particular—then you have evidence that your interconnect cannot sustain your desired workload. The Local address
field shows your hostname and port number. Similar to the foreign address of the host to which you are connecting, this field will be *:*
until a connection is established. The State
field will usually show LISTEN
or ESTABLISHED
for the TCP protocol; however, UDP is a stateless protocol, so these connections have no state entries. If you also provide the -n
argument, no name lookups will be done, and IP addresses for all connections will be displayed.
If a port is defined as a well-known port in the /etc/services
file, the port number will be replaced by the name. Referring to /etc/services
, you can see that the port number shown as ncube-lm
is in fact the standard Oracle listener port number of 1521:
[root@london1 root]# cat /etc/services | grep ncube-lm ncube-lm 1521/tcp # nCube License Manager ncube-lm 1521/udp # nCube License Manager
If you change this file so it is more meaningful for diagnosing Oracle network services, the output will be reflected the next time you run netstat
, without having to restart any of the services. However, it is important to be aware that, from a strict standpoint, ncube-lm
is the correct well-known port for 1521, as defined at the following location: http://www.iana.org/assignments/port-numbers
.
As an alternative to netstat
, you can use the ss
utility to report socket statistics. For example, the ss -l
command displays listening sockets in a manner similar to that observed with netstat
previously. Using ss
without further arguments lets you rapidly determine the established connections on your system:
[root@london1 ˜]# ss State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 127.0.0.1:61876 127.0.0.1:6100 ESTAB 0 0 127.0.0.1:61861 127.0.0.1:6100 ESTAB 0 0 127.0.0.1:61864 127.0.0.1:6100 ESTAB 0 0 172.17.1.208:1521 172.17.1.102:39393 ESTAB 0 0 172.17.1.209:1521 172.17.1.209:11402 ESTAB 0 0 172.17.1.209:25333 172.17.1.209:1521 ESTAB 0 0 127.0.0.1:2016 127.0.0.1:10911 ESTAB 0 0 172.17.1.208:1521 172.17.1.208:16542 ESTAB 0 0 127.0.0.1:6100 127.0.0.1:61876 ESTAB 0 0 172.17.1.101:62822 172.17.1.203:1521 ESTAB 0 0 172.17.1.208:16542 172.17.1.208:1521 ESTAB 0 0 127.0.0.1:6100 127.0.0.1:61861 ESTAB 0 0 127.0.0.1:6100 127.0.0.1:61864 ESTAB 0 0 127.0.0.1:10911 127.0.0.1:2016 ESTAB 0 0 172.17.1.209:11402 172.17.1.209:1521 ESTAB 0 0 172.17.1.209:1521 172.17.1.209:25333
ESTAB 0 0 172.17.1.101:17046 172.17.1.102:11585 ESTAB 0 0 192.168.1.1:35959 192.168.1.2:21965 ESTAB 0 0 192.168.1.1:61659 192.168.1.2:27582
You should consider using the tcpdump
command for detailed analysis of network traffic. This command's functionality is similar to that provided by strace
for an application communicating with the operating system kernel. tcpdump
enables you to capture and display the network packets running on the entire system or a particular interface. The tcpdump
command's -D
option will display the interfaces available to you, as in this example:
[root@london1 ˜]# tcpdump -D 1.eth0 2.eth1 3.any (Pseudo-device that captures on all interfaces) 4.lo
The following example shows the default summary information you see when running tcpdump
against the private interconnect interface. Specifically, it shows the packets being transferred across this network:
[root@london1 ˜]# tcpdump -i 2 | more tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes 11:43:36.045648 IP 192.168.1.2.7016 > 192.168.1.1.sds-admin:UDP, length 192 11:43:36.046217 IP 192.168.1.2.19552 > 192.168.1.1.26976:UDP, length 224 11:43:36.046237 IP 192.168.1.1.sds-admin > 192.168.1.2.19552:UDP, length 192 11:43:36.046279 IP 192.168.1.1.sds-admin > 192.168.1.2.asc-slmd:UDP,length 256 11:43:36.046368 IP 192.168.1.2.7016 > 192.168.1.1.sds-admin:UDP, length 192 11:43:36.047215 IP 192.168.1.2.19552 > 192.168.1.1.26976:UDP, length 224 11:43:36.047231 IP 192.168.1.1.sds-admin > 192.168.1.2.19552:UDP, length 192 11:43:36.047260 IP 192.168.1.1.sds-admin > 192.168.1.2.asc-slmd:UDP,length 256 11:43:36.047413 IP 192.168.1.2.7016 > 192.168.1.1.sds-admin:UDP, length 192 11:43:36.047762 IP 192.168.1.2.11403 > 192.168.1.1.20890:UDP, length 520 11:43:36.047784 IP 192.168.1.1.18929 > 192.168.1.2.11403:UDP, length 192 11:43:36.047863 IP 192.168.1.1.18929 > 192.168.1.2.22580:UDP, length 8328 11:43:36.047865 IP 192.168.1.1 > 192.168.1.2: udp 11:43:36.047867 IP 192.168.1.1 > 192.168.1.2: udp 11:43:36.047868 IP 192.168.1.1 > 192.168.1.2: udp 11:43:36.047869 IP 192.168.1.1 > 192.168.1.2: udp 11:43:36.047870 IP 192.168.1.1 > 192.168.1.2: udp 11:43:36.048689 IP 192.168.1.2.11403 > 192.168.1.1.20890:UDP, length 448 11:43:36.048704 IP 192.168.1.1.18929 > 192.168.1.2.11403:UDP, length 192 11:43:36.048754 IP 192.168.1.1.18929 > 192.168.1.2.22580:UDP, length 8328
Similar to the strace -o
option, the -w
option can be used to write the data to an output file. Subsequently, the -r
option can be used to read from that file, while -A
can be used to print the contents of each packet.
iostat
is the first of a number of utilities we will discuss that are installed with the sysstat
RPM package. Other utilities we will discuss include mpstat
and sar
. The iostat
utility also displays information related to CPU utilization, but it focuses on providing detailed I/O statistics. Like vmstat
, iostat
can be run without any command-line arguments to report statistics for average CPU utilization and disk devices since the most recent boot time. The format of the CPU utilization contains the same fields we have seen with top
and vmstat
. The disk statistics show the device name, the number of I/O operations per second, the number of 512-byte blocks read and written per second, and the total number of 512-byte blocks read and written. iostat
can also be supplied with one or two numerical arguments to represent the interval between sampling periods and the number of sampling periods in total. You may also specify statistics for a specific device using the -p
argument, such as -p sde
for device sde
. If you only wish to view disk utilization information, you can use the -d
option. Alternatively, you can use the -c
option to view information for the CPU only. The -k
option displays disk information in kilobytes, as opposed to blocks. The following example shows the results from running iostat
against an individual disk device:
[root@london1 ˜]# iostat -p sde 3 10 Linux 2.6.18-164.el5 (london1.example.com) 02/05/2010 avg-cpu: %user %nice %system %iowait %steal %idle 0.56 0.00 0.24 0.42 0.00 98.77 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sde 46.02 376.57 676.68 26774930 48113635 sde1 389.43 0.54 63.60 38074 4522265 avg-cpu: %user %nice %system %iowait %steal %idle 15.52 0.00 2.88 17.72 0.00 63.88 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sde 4122.00 20955.67 72453.00 62867 217359 sde1 21653.33 0.00 1863.67 0 5591 avg-cpu: %user %nice %system %iowait %steal %idle 25.88 0.00 4.75 20.79 0.00 48.58 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sde 3096.33 9361.67 58477.67 28085 175433 sde1 9752.00 0.00 2734.67 0 8204
When using iostat
to observe disk statistics in a RAC environment, you should be keenly aware of the infrastructure that lies between the operating system and the actual disk devices. For example, the levels of abstraction can range from multipathing device drivers and host bus adapters to cache on the storage and a disk RAID configuration. The disk devices you're most interested in using are shared between all of the nodes in the cluster, and any useful information that you can derive on any individual node is likely to be limited. Therefore, iostat
may prove useful in providing a highly generalized overview of disk activity on the system; however, there is no substitute for using the specialized storage analysis tools provided by the vendor of your chosen storage subsystem.
By default, the mpstat
command shows a CPU utilization report similar to that produced by iostat
for all statistics since boot time. It also includes an additional field that shows the number of interrupts per second. mpstat
also accepts the same number and type of numeric arguments as vmstat
and iostat
, which it uses to produce output at sampled intervals, as in this example:
[root@london1 ˜]# mpstat 3 10 Linux 2.6.18-164.el5 (london1.example.com) 02/05/2010 11:38:12 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 11:38:15 AM all 18.47 0.00 2.54 15.76 0.21 0.76 0.00 62.25 6977.00 11:38:18 AM all 28.65 0.00 3.44 22.62 0.25 1.19 0.00 43.85 9674.75 11:38:21 AM all 22.21 0.00 2.71 20.09 0.17 0.93 0.00 53.88 7495.99 11:38:24 AM all 30.91 0.00 3.78 20.68 0.30 1.44 0.00 42.89 10392.59 11:38:27 AM all 31.62 0.00 3.78 20.37 0.34 1.23 0.00 42.66 10206.08 11:38:30 AM all 12.40 0.00 1.69 19.35 0.21 0.76 0.00 65.58 6710.37 11:38:33 AM all 17.24 0.00 2.33 15.97 0.21 0.76 0.00 63.49 8070.23 11:38:36 AM all 27.96 0.00 3.65 18.33 0.30 1.15 0.00 48.62 8732.89 11:38:39 AM all 17.35 0.00 2.42 14.76 0.17 0.72 0.00 64.59 7200.34 11:38:42 AM all 30.41 0.00 3.73 14.89 0.25 1.19 0.00 49.53 9328.86 Average: all 23.72 0.00 3.01 18.28 0.24 1.01 0.00 53.74 8475.07
By default, mpstat
reports CPU statistics averaged for all processors; however, the most significant difference compared to iostat
is that mpstat
uses the -P
argument in conjunction with either the CPU number starting at 0 or with -P ALL
, which displays output for all processors on an individual basis. When analyzing CPU performance with mpstat
or other monitoring tools, you need to keep in mind that if you have a system equipped with multicore CPUs, each CPU core will be presented to the monitoring tool as a distinct CPU, even though the cores share some system resources (see Chapter 4 for more details on multicore CPUs). Similarly, Intel's hyperthreaded CPUs will also present each CPU physically installed in the system as two CPUs for each physical core. This enables processes to be scheduled by the Linux operating system simultaneously with the same core. /proc/cpuinfo
should be your first reference for CPU architecture (Chapter 4 explains how to precisely map the CPU architecture to the representation by the operating system). The following example shows an extract of the first processor of /proc/cpuinfo
:
[root@london8 ˜]# cat /proc/cpuinfo | more processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 46 model name : Intel(R) Xeon(R) CPU X7560 @ 2.27GHz stepping : 5 cpu MHz : 2261.066 cache size : 24576 KB physical id : 0 siblings : 16 core id : 0 cpu cores : 8 apicid : 0 fpu : yes
fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm bogomips : 4522.13 clflush size : 64 cache_alignment : 64 address sizes : 44 bits physical, 48 bits virtual power management: [8]
The system activity reporter (sar
) is a powerful tool that can encompass virtually all of the performance information generated by the other performance tools discussed in this chapter. In fact, some of the statistics from sar
may look familiar to users of Oracle EM because sar
underpins most of the host-based performance views. This is why the sysstat
package must be installed on managed targets.
As its name suggests, the system activity reporter is the front-end reporting tool. This tool is accompanied by the system activity data collector (sadc)
. Reports can be generated by sar
in an interactive manner or written to a file for longer-term data collection. When you install the sysstat
package, it sets sadc
to run periodically by configuring the sa1
script from the cron
scheduled script, /etc/cron.d/sysstat
, as in this example:
[root@london1 root]# cat /etc/cron.d/sysstat # run system activity accounting tool every 10 minutes */10 * * * * root /usr/lib/sa/sa1 1 1 # generate a daily summary of process accounting at 23:53 53 23 * * * root /usr/lib/sa/sa2 -A
By default, this script is run every ten minutes, capturing all system statistics for a one-second period. Next, the script appends the data to the current data file in the /var/log/sa
directory, where the file is named sa
, with a suffix that corresponds to the current date, as in this example:
[root@london1 sa]# ls sa01 sa02 sa03 sa04 sa05 sar01 sar02 sar03 sar04
At the same location as sa1
, you can find the file sa2
, which by default runs once per day. sa2
runs sar
to generate a full report on all of the data captured during the previous day by sadc
.
A sar
report presents system performance data divided into 17 separate sections. Each section contains data related to a specific aspect of system performance; this information is ordered by time throughout a 24-hour period, based on the ten-minute collection interval.
The standard statistics collection is useful for long-term performance monitoring and capacity planning trending activities; however, the one-second collection period at ten-minute intervals may not be sufficient for pinpointing specific performance issues. For this reason, you can also invoke sar
directly to produce performance information on one or more of the specific performance-related areas to the screen.
This interactive performance requires two numerical arguments: one for the interval between sampling periods and one for the number of sampling periods in total. sar
is different from the statistics commands that we have already seen, such as vmstat
. For example, if you specify just one numerical argument, sar
will report statistics for the time interval specified by the argument once, and then exit. You may also provide arguments to specify the type of performance information to view. If you do not provide any arguments, by default you will be shown performance information for all CPUs. The following extract shows the first output of the CPU performance information for a three-second sampling period, which will be collected ten times:
[root@london1 sa]# sar 3 10 Linux 2.6.18-164.el5 (london1.example.com) 02/05/2010 12:01:36 PM CPU %user %nice %system %iowait %steal %idle 12:01:39 PM all 12.30 0.00 3.31 19.51 0.00 64.89 12:01:42 PM all 14.53 0.00 3.82 22.78 0.00 58.86 12:01:45 PM all 14.98 0.00 3.70 23.53 0.00 57.79 12:01:48 PM all 14.39 0.00 3.91 24.08 0.00 57.62 12:01:51 PM all 13.65 0.00 3.57 23.35 0.00 59.42 12:01:54 PM all 14.40 0.00 3.57 22.81 0.00 59.22 12:01:57 PM all 12.95 0.00 3.86 24.19 0.00 59.00 12:02:00 PM all 15.86 0.00 3.14 20.40 0.00 60.60 12:02:03 PM all 8.93 0.00 0.21 0.72 0.00 90.14 12:02:06 PM all 15.52 0.00 4.00 20.79 0.00 59.69 Average: all 13.75 0.00 3.31 20.21 0.00 62.73
You can view additional or alternative performance information by providing other arguments, such as sar -n
for network statistics or sar -b
for I/O statistics. The full range of options is detailed in the sar
man page. To produce performance information on all sections interactively, you can call sar -A
; however, be aware that the output is extensive. In conjunction with sar -A
, you may also find the -o
option useful in directing the output to a file. The default file location is the same as the regularly sampled sar
data. Therefore, we recommend that you specify a file name for detailed sar
performance analysis work. For example, the following command collects all sar
statistics at three-second intervals for a five-minute period into the london1.sa
file:
[root@london1 root]# sar -A -o london1.sa 3 100
The file generated is in the sar
binary format. This means sar
will need to read the results file at a later point, which can be accomplished using the -f
option:
[root@london1 root]# sar -f london1.sa
As you would expect, using the -f
option excludes also using the -o
option. However, it accepts the same command-line arguments, such as when called in an interactive manner. Thus, the following example shows the CPU-information only:
[root@london1 root]# sar -A -f london1.sa
To display all of the information collected in the file, you will need to specify the -A
option.
The text-based sar
output provides you with all of the recorded performance information you require. However, simply browsing sar -A
output may prove difficult when attempting to diagnose any system performance issues that have occurred.
Fortunately, there are a number of tools available for graphing the output from sar
. For example, the Interactive System Activity Grapher (isag
) utility is available for graphing the data recorded in sar
files. isag
is no longer included automatically with the systtat
RPM package, primarily due to its additional dependence on the gnuplot
package. However, you can easily download and install the latest versions of isag
and gnuplot
to view your sar
statistics, and isag
is still included with current versions of the sysstat
source code.
Alternatively, the kSar
tool can be used to graph captured sar
information; you can download this tool at http://ksar.atomique.net/
. To use kSar
, unzip the downloaded zip file, change the directory to the extracted kSar
directory, and run the tool with the following command:
[root@london1 kSar-5.0.6]# java -jar kSar.jar
In the graphical interface, click the Data menu option and select Run local command to display the dialog window shown in Figure 13-1.
Next, specify the command to extract the sar
data from your captured file, as in this example:
sar -A -f /var/log/sa/london1.sa
If the extraction is successful, after a short period of time kSar
will report that the data import is finished. It will also display summary information, as well as potential system bottlenecks. The example shown in Figure 13-2 reports that CPU utilization is more than 25%, which makes it worth investigating further.
At this point, you can select from the options in the Menu tab on the left of the screen to display information for particular areas. Subjects you can drill down on include I/O, interface traffic by interface, and CPU utilization. Figure 13-3 shows an example summary of CPU utilization across the capture period.
The Oracle Cluster Health Monitor
is an Oracle-provided tool for monitoring resource utilization on a cluster basis. In particular, the Oracle Cluster Health Monitor
runs in one of two modes. In the first, it observes the system in real time. In the second, it collects data in a Berkeley DB repository on a node-by-node basis, enabling the review of data collected over time. This data can be used to pinpoint the causes of performance issues.
The Oracle Cluster Health Monitor for Linux
can be downloaded from the Oracle Technology Network web site at the following location: www.oracle.com/technology/products/database/clustering/ipd_download_homepage.html
. The resulting downloaded is called crfpack-linux.zip
. To install the tool, begin by creating a dedicated user for the tool on all nodes in the cluster. The following example illustrates how to create crfuser
on the london1 node:
[root@london1 ˜]# useradd -g oinstall crfuser [root@london1 ˜]# passwd crfuser Changing password for user crfuser. New UNIX password: Retype new UNIX password: passwd: all authentication tokens updated successfully.
Once the user has been created on all nodes, secure shell (ssh) must be configured without password prompts or warnings received when connecting between hosts. This can be performed with the same manual configuration steps required for configuring ssh for the oracle
user (see Chapter 6 for information on the steps required to do this). Once you have tested ssh connectivity between nodes, copy and unzip the file crfpack-linux.zip
into the /home/crfuser/
directory, designating ownership by the crfuser
:
[crfuser@london1 ˜]$ unzip crfpack-linux.zip [crfuser@london1 ˜]$ ls admin bin crfpack-linux.zip install jdk jlib lib log mesg
Before installing the software, it is necessary to have a non-root file system available on which you can create the Berkeley DB database. If you opt for a default file system configuration (as we recommend), then you will not have a non-root file system available. Therefore, you can either mount a directory from an external source such as iSCSI; or, if you are using ASM, you can create an ACFS file system (see Chapter 9 for more information on how to do this). In the example shown in Figure 13-4, a 10GB ACFS file system is created for a two-node cluster, allocating 5GB of storage per node for that two-node cluster.
You must then mount the ACFS file system on all nodes in the cluster:
[root@london1 ˜]# /sbin/mount.acfs -o all [root@london1 ˜]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 886G 19G 822G 3% / /dev/sda1 99M 13M 82M 14% /boot tmpfs 7.9G 192M 7.7G 3% /dev/shm /dev/asm/crfdb-61 10G 85M 10G 1% /u01/app/oracle/acfsmounts/data_crfdb
Next, you need to make a separate directory for the Berkeley DB Database for all nodes in the cluster. Note that, although the file system is shared between the nodes, the Berkeley DB Database is not cluster-aware, so it cannot be shared between nodes. The following example creates the directory for the first node only:
[crfuser@london1 install]$ mkdir > /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb1
On the first node, run the installation script from the install directory as the crfuser
. Do this in conjunction with the -i
option, specifying the nodes to be installed, the location of the Berkeley DB database, and the name of the master node, as shown here:
[crfuser@london1 install]$ ./crfinst.pl -i london1,london2 -b /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb1 -m london1 Performing checks on nodes: "london1 london2" ... Assigning london2 as replica Generating cluster wide configuration file... Creating a bundle for remote nodes... Installing on nodes "london2 london1" ... Configuration complete on nodes "london2 london1" ...
Once the initial installation has completed on the first node, you can finish the installation by rerunning the install script as the root user with the -f
option on all nodes, including the first node in the cluster, where you specify the Berkeley DB directory for each node. If the Berkeley DB directory is local, then it can have the same name on each node. In this example, which uses ACFS, the directory name is distinct per node. For example, the installation is completed as follows on node 1:
[root@london1 install]# ./crfinst.pl -f -b /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb1 Removing contents of BDB Directory /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb1 Installation completed successfully at /usr/lib/oracrf...
Similarly, the installation is completed as follows on node 2:
[root@london2 install]# ./crfinst.pl -f -b /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb2/ Removing contents of BDB Directory /u01/app/oracle/acfsmounts/data_crfdb/oracrfdb2/ Installation completed successfully at /usr/lib/oracrf...
A log of installation activity is maintained in the crfinst.log
file in the crfuser
home directory.
After installation is complete, the Oracle Cluster Health Monitor
can be started on all nodes with the /etc/init.d/init.crfd
script, as in this example:
[root@london2 init.d]# ./init.crfd enable
You can verify a successful startup by issuing the command again with the status argument:
[root@london1 init.d]# ./init.crfd status OSysmond running with PID=3571. OLoggerd running with PID=3623. oproxyd running with PID=3626.
To stop the Oracle Cluster Health Monitor
from running, use init.crfd
with the disable
argument. disable
is preferable to the stop
argument in this case because the daemons will be restarted when stop
is used.
The Oracle Cluster Health Monitor
starts three daemon processes: osysmond
, ologgerd
, and oproxyd
. osysmond
collects the monitoring data from the local system, while ologgerd
receives the data from all nodes and populates the Berkeley DB database. Thus, ologgerd
is only active on the master node with another node acting as a standby. You may also observe that the Berkeley DB database directories are now populated with data:
[root@london1 data_crfdb]# ls * lost+found: oracrfdb1: crfalert.bdb crfcpu.bdb crfts.bdb __db.003 __db.006 crfclust.bdb crfhosts.bdb __db.001 __db.004 log.0000000001 crfconn.bdb crfloclts.bdb __db.002 __db.005 london1.ldb oracrfdb2: crfalert.bdb crfloclts.bdb __db.002 __db.006 crfclust.bdb crfrep.bdb __db.003 log.0000000001 crfcpu.bdb crfts.bdb __db.004 london2.ldb crfhosts.bdb __db.001 __db.005 repdhosts.bdb
In addition to the server-side installation, it is also possible to use the same installation software to install a client-side graphical interface. Oracle recommends installing this graphical interface on a separate node from the cluster. On the server, the daemon process oproxyd
listens for network connections, such as connections from this interface. The client installation should be performed as the root user and not the crfuser
, as in this example:
[root@london5 install]# ./crfinst.pl -g Installation completed sucessfully at /usr/lib/oracrf...
The installation locates the files in the /usr/lib/oracrf
directory, and the graphical client can be run from within the bin directory by specifying the cluster node to connect to:
[root@london5 bin]# ./crfgui -m london1 Cluster Health Analyzer V1.10 Look for Loggerd via node london1 ...Connected to Loggerd on london1 Note: Node london1 is now up Cluster 'MyCluster', 2 nodes. Ext time=2010-02-08 12:30:57 Making Window: IPD Cluster Monitor V1.10 on nehep1, Logger V1.04.20091223, Cluster "MyCluster" (View 0), Refresh rate: 1 sec
By default, the client-side graphical tool runs in real-time mode, displaying the current activity across the cluster (see Figure 13-5).
There is also the option to replay captured data from a previous period of time. For example, the following command replays the data from the previous five minutes:
[root@london5 bin]# ./crfgui -d "00:05:00" -m london1
In addition to the graphical tool, there is a command-line tool called oclumon
that can be used either in real time or to browse historical data. oclumon
can be called either from the server or from the client environment. By default, oclumon
reports data in real time for the local node, as in this example:
[root@london1 ˜]# oclumon dumpnodeview ---------------------------------------- Node: london1 Clock: '02-09-10 12.08.01 UTC' SerialNo:77669 ----------------------------------------
SYSTEM: #cpus: 8 cpu: 8.66 cpuq: 3 physmemfree: 1506256 mcache: 3318060 swapfree: 17141300 ior: 12474 iow: 882 ios: 994 netr: 785.5 netw: 862.11 procs: 291 rtprocs: 26 #fds: 3752 #sysfdlimit: 6553600 #disks: 5 #nics: 3 nicErrors: 0 TOP CONSUMERS: topcpu: 'oraclePROD1(8943) 7.75' topprivmem: 'ocssd.bin(4615) 224436' topshm: 'ora_mman_PROD1(26453) 462752' topfd: 'ocssd.bin(4615) 95' topthread: 'crsd.bin(4822) 54'
With the -allnodes
option, oclumon
will report data from all of the nodes in the cluster. Additionally, you can use the -v
option for verbose output. oclumon
also accepts querying for historical data. For example, the following command will report verbose output for all nodes for the previous five minutes:
[root@london1 ˜]# oclumon dumpnodeview -v -allnodes -last "00:05:00"
If oclumon
is entered with no arguments, it returns a query prompt to enter commands interactively. From this, you can conclude that the strength of the Oracle Cluster Health Monitor
lies in two things. First, it can review historical data to diagnose issues such as node evictions that may have occurred at a previous point in time. Second, it can provide a central location for recording performance monitoring data for all nodes in the cluster simultaneously.
OSWatcher
is not really a performance monitoring tool in its own right. Rather, it is a framework for capturing, storing, and analyzing data generated by a number of the standard command-line performance monitoring tools that we have previously covered in this chapter. OSWatcher
also includes a utility called OSWg
that graphs the captured data. As such, it offers similar functionality to the combination of sar
and kSar
. OSWatcher
has been developed by Oracle, and it can be downloaded from the My Oracle Support web site as a .tar
archive.
Follow these steps to install OSWatcher
. Begin by extracting the archive into a directory of a user with permissions to run the standard command-line performance monitoring tools, such as the oracle
user:
[oracle@london1 ˜]$ tar xvf osw3b.tar ./ ./osw/ ./osw/Exampleprivate.net ./osw/OSWatcher.sh ./osw/OSWatcherFM.sh ./osw/OSWgREADME.txt ./osw/README.txt ...
Next, start OSWatcher
with the startOSW.sh
script. This script can take a couple arguments. The first argument specifies a snapshot interval for how regularly the command-line tools should be run to gather the data. The second argument specifies the number of hours of data to collect. If no arguments are given, the default values are selected. The following example collects information using the default values:
[oracle@london1 osw]$ ./startOSW.sh [oracle@london1 osw]$ Info...You did not enter a value for snapshotInterval. Info...Using default value = 30 Info...You did not enter a value for archiveInterval. Info...Using default value = 48 Testing for discovery of OS Utilities... VMSTAT found on your system. IOSTAT found on your system. MPSTAT found on your system. NETSTAT found on your system. TOP found on your system. Discovery completed. Starting OSWatcher v3.0 on Tue Feb 9 15:25:49 GMT 2010 With SnapshotInterval = 30 With ArchiveInterval = 48 OSWatcher - Written by Carl Davis, Center of Expertise, Oracle Corporation Starting Data Collection... osw heartbeat:Tue Fe b 9 15:25:49 GMT 2010 osw heartbeat:Tue Feb 9 15:26:19 GMT 2010 osw heartbeat:Tue Feb 9 15:26:49 GMT 2010 osw heartbeat:Tue Feb 9 15:27:19 GMT 2010
To stop the data collection, run the stopOSW.sh
script as the same user. At this point, the captured data can be browsed manually within the archive directory, as in this example:
[oracle@london1 archive]$ ls * oswiostat: london1.example.com_iostat_10.02.09.1500.dat oswmeminfo: london1.example.com_meminfo_10.02.09.1500.dat ...
Alternatively, the extracted files include a Java utility for viewing the data in a graphical form. First, Java must be specified in the user's PATH environment variable, and the utility must be run as follows, by specifying the archive directory that contains the collected data:
[oracle@london1 osw]$ export PATH=$ORACLE_HOME/jdk/bin:$PATH [oracle@london1 osw]$ java -jar oswg.jar -i archive Starting OSWg V3.0.0 OSWatcher Graph Written by Oracle Center of Expertise Copyright (c) 2008 by Oracle Corporation Parsing Data. Please Wait... Parsing file london1.example.com_iostat_10.02.09.1500.dat ... Parsing file london1.example.com_vmstat_10.02.09.1500.dat ... Parsing Completed.
When the parsing of data is complete, OSWg
presents a number of options in the terminal window that you can use to choose the graph to display:
Enter 1 to Display CPU Process Queue Graphs Enter 2 to Display CPU Utilization Graphs Enter 3 to Display CPU Other Graphs Enter 4 to Display Memory Graphs Enter 5 to Display Disk IO Graphs Enter 6 to Generate All CPU Gif Files Enter 7 to Generate All Memory Gif Files Enter 8 to Generate All Disk Gif Files Enter L to Specify Alternate Location of Gif Directory Enter T to Specify Different Time Scale Enter D to Return to Default Time Scale Enter R to Remove Currently Displayed Graphs Enter P to Generate A Profile Enter Q to Quit Program Please Select an Option:
Selecting an option displays a graph in an individual window, as shown in Figure 13-6.
If you choose to run OSWatcher
on a regular basis, it is important to be aware that, by default, the tool will not restart after the system reboots. Therefore, Oracle also provide an RPM package downloadable from My Oracle Support to install a service to start OSWatcher
when the system boots.
In contrast to the Oracle Cluster Health Monitor
and OSWatcher
, nmon
is a performance monitoring tool developed by IBM initially for AIX-based systems, but which has been extended to Linux environments. nmon
was released as open source, and it is available for download from the following location: http://nmon.sourceforge.net
. Available downloads include precompiled binaries and the standard x86 32-bit binary for Red Hat systems. Once downloaded and given executable permissions against, the binary file will run against the standard Oracle-validated RPM installation for both x86 and x86-64 environments.
By default, nmon
runs in interactive mode; pressing the h
key displays the information available (you can see nmon
's menu in Figure 13-7). Figure 13-8 shows how nmon
can provide detailed output for the processor, memory, network, and storage with multiple sections simultaneously.
For example, pressing n
displays information related to network traffic, thus enabling the RAC DBA to observe the utilization levels of the cluster interconnect.
Additionally, nmon
enables the capture of information over periods of time. Later, you can use this information to graph the data with spreadsheet tools. Thus, nmon
is a useful open source tool that is easy to install, while also providing comprehensive coverage that complements the standard Linux- and Oracle-provided performance-monitoring tools.
In this chapter, we described some of the tools and techniques available to you for monitoring the performance of your RAC cluster at the Linux level. We also detailed the most common Linux command-line and graphical tools, which you can use to confirm your earlier findings gleaned from using the Oracle-specific performance monitoring tools described in Chapter 12.