Chapter 12. Oracle Performance Monitoring

In this chapter, we will look at using the tools available for monitoring the performance of an Oracle 11g RAC cluster at the Oracle RAC level. In the next chapter, we'll look at performance monitoring under Linux. At an Oracle Database level, we concentrate on the tools and features provided under the framework of the 11g Manageability Infrastructure. These tools and features range from Oracle Enterprise Manager Database Control to the 11g Automatic Diagnostic Repository (ADR).

Within an Oracle 11g database, the Manageability Infrastructure is a collection of database features that enable you to realize self-tuning and self-management functionality directly within the Oracle database itself. The primary aspect of the Manageability Infrastructure is the automated gathering and processing of base statistics related to database operations. A key enabling feature of the Manageability Infrastructure is the background process called the Manageability Monitor (MMON). This process performs most of the manageability-related tasks on the database, a central part of which is to capture and store database statistics. The MMON process captures base statistics every 60 seconds. It also accesses the SGA directly, so it works efficiently and with minimal overhead. These statistics are compiled and initially stored in memory to be accessed by dynamic performance views. Subsequently, they are flushed to disk in the form of snapshots. Both the in-memory and on-disk collections of statistics are referred to as the Automatic Workload Repository (AWR). A number of performance views with the prefix DBA_HIST_ can be used to access the data; DBA_HIST_SNAPSHOT, for example, shows information regarding all of the snapshots that have been taken.

The MMON process is also assisted by the Manageability Monitor Light (MMNL) process. It is used primarily for performing tasks related to the Active Session History (ASH). The ASH contains session statistics sampled at one-second intervals to record what the sessions are waiting for. The information is stored in memory, where it can be accessed in the view V$ACTIVE_SESSON_HISTORY. The in-memory view is implemented as a rolling buffer, and older information is overwritten as newer information is added. Therefore, new information is also written to disk by the MMNL process every 30 minutes. By default, seven days of ASH information is retained on disk, and it can be seen in the view DBA_HIST_ACTIVE_SESS_HISTORY.

The MMON process also computes and stores metric values based on the statistics themselves, and these can be seen directly in the views V$SYSMETRIC and V$SERVICEMETRIC. For example, V$SYSMETRIC contains information such as the calculated buffer cache hit ratio and user transactions per second.

Given the features enabled by the Manageability Infrastructure, our focus for performance monitoring in an Oracle 11g RAC Database environment is on the tools that take advantage of this infrastructure. Therefore, we will cover performance monitoring using the standard Oracle graphical interface environment called Oracle Enterprise Manager Database Control. We will also review captured performance data with AWR reports and ASH, and then look at using the Automatic Database Diagnostic Monitor (ADDM) and analyzing the performance of individual statements using AWR SQL reports. Next, we will cover performance monitoring using SQL*Plus. Finally, we will look at some of the performance monitoring features and fault diagnosis available with the Automatic Diagnostic Repository (ADR).

Enterprise Manager Database Control

Oracle Enterprise Manager 11g Database Control is supplied and installed with the Oracle Database software, so it is available for all Oracle 11g RAC environments. Database Control is installed and configured wholly within the cluster to manage that cluster only, which distinguishes it from the separate Oracle Enterprise Manager 11g Grid Control that is installed and configured on a standalone management server. Database Control can be used to manage multiple Oracle and non-Oracle software environments.

If you are creating a new Oracle 11g database, and you wish to use Database Control, then we recommend that you install and configure it using DBCA. Provided that no Grid Control Management Agent is previously installed, DBCA will provide you with the option to automatically install, configure, and start all of the components you require, including a Management Repository schema owned by the user SYSMAN within the SYSAUX tablespace of the target database, a local Management Service (including an HTTP server), and Management Agents on all nodes.

EM Database Control uses the EM Agents installed on the nodes and associated Management Services to communicate with the database, instances, and other processes. You can use the emctl command to check whether these are currently running using the emctl command, assuming you have set your ORACLE_UNQNAME to your database name beforehand. The following example shows that EM is currently not running:

[oracle@london1 ˜]$ emctl status dbconsole
Oracle Enterprise Manager 11g Database Control Release 11.2.0.1.0
Copyright (c) 1996, 2009 Oracle Corporation.  All rights reserved.
https://london1.example.com:1158/em/console/aboutApplication
Oracle Enterprise Manager 11g is not running.
------------------------------------------------------------------
Logs are generated in directory
/u01/app/oracle/product/11.2.0/dbhome_1/london1_PROD/sysman/log

If the Management Agent is not running, then you can start it, as follows:

[oracle@london1 ˜]$ emctl start dbconsole
Oracle Enterprise Manager 11g Database Control Release 11.2.0.1.0
Copyright (c) 1996, 2009 Oracle Corporation.  All rights reserved.
https://london1.example.com:1158/em/console/aboutApplication
Starting Oracle Enterprise Manager 11g Database Control .......... started.
------------------------------------------------------------------
Logs are generated in directory
/u01/app/oracle/product/11.2.0/dbhome_1/london1_PROD/sysman/log

The Management Agent can be stopped again using the stop argument.

By default, DBCA configures Database Control on each node in the cluster. However, after installation, the Database Control Console is only started on the node where you ran DBCA. On the remaining nodes, only the Management Agent component is started, and these nodes upload their data to the Database Control Console node.

If you are upgrading an existing Oracle database, and have created a database without the use of DBCA or wish to reconfigure your Database Control, you can do so using the emca utility.

A wide range of configuration options can be performed with emca. The most straightforward method to install Database Control is to call the command emca, specifying -config dbcontrol db, while also ensuring that you use the -cluster argument.

In common with the way you configure Database Control with DBCA, you will be running the Database Control Console on the node where you ran emca only. Subsequently, calling emctl start dbconsole and emctl stop dbconsole will start and stop the Management Service on this node only. The same commands will start and stop the Management Agent only on the remaining nodes in the cluster.

If the Database Control Console could run on only one node, it would clearly not be a high-availability configuration. Therefore, you can reconfigure the Console to start on a different node (or on more than one if you desire) by using the command emca -reconfig dbcontrol -cluster. By default, running this command without additional arguments starts the Console on the node that you ran the command from, and it directs the Management Agents on all other nodes in the cluster to upload their data to this Console. You can customize the process by providing the arguments EM_NODE and EM_SID_LIST to start the Console on the list of nodes specified and instruct the Management Agents listed to upload data to these nodes. You can view the configuration at any time using emca -displayConfig dbcontrol -cluster, as in this example:

[oracle@london1 ˜]$ emca -displayConfig dbcontrol -cluster

STARTED EMCA at Feb 15, 2010 5:56:27 PM
EM Configuration Assistant, Version 11.2.0.0.2 Production
Copyright (c) 2003, 2005, Oracle.  All rights reserved.

Enter the following information:
Database unique name: PROD
Service name: PROD1
Do you wish to continue? [yes(Y)/no(N)]: Y
Feb 15, 2010 5:56:42 PM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at /u01/app/oracle/cfgtoollogs/emca/PROD/emca_2010_02_15_17_56_26.log.
Feb 15, 2010 5:56:44 PM oracle.sysman.emcp.EMDBPostConfig
showClusterDBCAgentMessage
INFO:
****************  Current Configuration  ****************
 INSTANCE            NODE           DBCONTROL_UPLOAD_HOST
----------        ----------        ---------------------

PROD              london1           london1.example.com
PROD              london2           london1.example.com


Enterprise Manager configuration completed successfully
FINISHED EMCA at Feb 15, 2010 5:56:44 PM

Finally, if you have also added a new node to the cluster, you may use the command emca -addInst db to add it to the database configuration. Similarly, you can use the command emca -deleteInst db to remove it, as long as you remember to run the command to remove the instance from a different node in the cluster.

The Cluster Tab

Upon logging in to Database Control at the URL for the specified node, such as https://london1.example.com:1158/em, you are presented at the highest level with two tabs to view the database configuration or the cluster configuration, as shown in Figure 12-1. Your browser requires the Adobe Flash plugin to view the EM performance graphs.

Cluster information from Database Control

Figure 12.1. Cluster information from Database Control

The Cluster configuration provides details on the hosts in the cluster and clicking the Performance link drills down to a view of CPU, Memory, and I/O Utilization on all of the nodes in the cluster. It also provides a summary of the status of those nodes. You can use this Performance page to observe whether the cluster workload is evenly distributed across the cluster, as well as whether particular nodes are more heavily utilized than others. Similarly, the Interconnects link provides information on the Private Interconnect Transfer Rate in MB/sec for the cluster as a whole, and it shows the Total Error rate for the individual nodes. You should use this information in conjunction with the Linux operating system network performance utilities to ensure that the interconnect performance is not having a disproportionate impact on the performance of the cluster.

The Database Tab

The Database tab (see Figure 12-2) shows the performance of the cluster from a database, as opposed to the perspective from the host and operating system.

The Database tab from Database Control

Figure 12.2. The Database tab from Database Control

The Performance Tab

The first two graphs on the Performance tab show the Cluster Host Load Average and the Global Cache Block Access Latency. The third graph shows the number of Average Active Sessions throughout the reporting period. The final graph on the Performance tab shows performance from the perspective of Database Throughput, I/O, Parallel Execution, Services, and Instances.

At the bottom of the Performance tab, there are six Additional Monitoring Links. The page you should use to monitor your inter-instance communication is the Cluster Cache Coherency page, as shown in Figure 12-3.

Cluster Cache Coherency page

Figure 12.3. Cluster Cache Coherency page

The Cluster Cache Coherency page includes information on Global Cache Block Access Latency (this information also appears on the Performance tab), Global Cache Block Transfer Rate, and Global Cache Block Transfers and Physical Reads. Selecting the appropriate radio button makes the information available in both tabular numeric and graphical form. The Global Cache Block Access Latency is a useful initial statistic for observing the performance of your cluster. For releases of RAC, Oracle has typically recommended that the latency should be less than ten milliseconds before investigating potential performance issues; however, ideally you should expect these values to be no more than two milliseconds. If the latency is exceeding more than two milliseconds, we recommend investigating the performance in context of the workload, as well as looking at the latency of the entire hardware components of your cluster (see Chapter 4 for information on processing your workload more efficiently to reduce inter-instance traffic).

In addition to the overall interconnect latency, the metrics you will refer to most when reviewing your RAC's performance are observable within the Global Cache Block Transfer Rate view. In this view, you can observe the operation of Cache Fusion across the cluster. The constituent metrics are GC CR Blocks Received and GC Current Blocks Received.

The first metric, GC CR Blocks Received, reflects the implementation of read consistency across the entire cluster. In other words, when a SELECT statement is in operation on one node in the cluster, the session will request the data block that contained the data when the query began if another session on any node in the cluster has changed those rows since the query started. If the consistent read request cannot be satisfied from the local buffer cache, but it can be satisfied by shipping a data block from another instance in the cluster, then it is recorded in this metric.

The second metric, GC Current Blocks Received, shows the inter-instance transfer of data blocks that are in a current state. A data block is in a current state when it has been created or modified by an INSERT, UPDATE, or DELETE statement. The most recent data reflects that this action resides in the buffer cache of an instance. If this current block is requested and successfully transferred directly to another instance in the cluster, it is recorded in this metric.

The Global Cache Block Transfers and Physical Reads graph compares the number of block transfers across the interconnect with the number of physical reads. Block transfers across the interconnect are usually more efficient than physical reads, so they are favored by RAC, where possible.

The Global Cache Block Transfers and Physical Reads graph is the Active Session History (ASH) from a Cluster Cache Coherency perspective. Figure 12-4 shows an example of this graph. This data can be viewed in realtime or by dragging the shaded box to observe historical data. in the graph directly below on the same page the workload being processed is shown in relation to the events generated. Clicking a particular Cache Coherency event will drill down to a chart and summary data focused upon that particular event.

The Active Session History graph.

Figure 12.4. The Active Session History graph.

In addition to viewing the Active Session History dynamically with EM, you can also generate reports for information related to time periods for which you are interested. The Active Session History (ASH) is a component of the AWR repository. In addition to reports that can be viewed with EM, the following types of reports can be generated directly: AWR reports, ASH reports, and Automatic Database Diagnostic Monitor (ADDM) reports. We will detail how to do so in the following sections; doing so will enable you to generate these reports, even if the EM graphical interface is not available in a particular environment.

AWR Reports

In this section, we describe the AWR and show how to generate reports from its contents. AWR snapshots of the current values of various database statistics are generated by the MMON background process using non-SQL kernel calls. AWR reporting tools allow you to generate a report in text or HTML format using either EM or SQL*Plus to compare the delta between two snapshots. This can help you determine the database workload during the intervening period.

The AWR is created automatically if you have created the RAC database using DBCA, as detailed in Chapter 10. Alternatively, you can create the AWR manually using the script, $ORACLE_HOME/rdbms/admin/catawr.sql.

A set of tables is created for the AWR under the SYS schema in the SYSAUX tablespace, and the AWR is implemented using three distinct sets of tables in the data dictionary. There are five control tables: WRM$_WR_CONTROL, WRM$_DATABASE_INSTANCE, WRM$_SNAPSHOT, WRM$SNAP_ERROR, and WRM$_BASELINE. There is also a DBA_HIST_% view for each of these tables.

There is also a set of tables to hold the snapshots. These tables have the WRH$ prefix, such as WRH$_SEG_STAT and WRH$_SQL_PLAN. There are approximately 60 of these tables, and each table has a DBA_HIST_% view with a similar name, such as DBA_HIST_SEG_STAT and DBA_HIST_SQL_PLAN.

Baselines are held in a separate set of tables, each of which has the same name as the snapshot table, but with a _BL suffix. For example, the snapshot table is called WRH$_LATCH, while the baseline table is called WRH$_LATCH_BL. There are no DBA_HIST_% views for the baseline tables.

At regular intervals, the MMON background processes trigger a snapshot of the current state of the database. The actual snapshot is performed by a MMON slave processes named M000. By default, snapshots are performed every 60 minutes and retained in the workload repository for seven days, after which they are automatically purged.

AWR snapshots are performed if the STATISTICS_LEVEL parameter is set to the default value of TYPICAL or to ALL. To disable AWR snapshots, you must set the STATISTICS_LEVEL parameter to BASIC; however, we do not recommended doing so because it disables all other forms of statistics gathering.

You can check the current snapshot interval and retention time with this snippet:

SQL> SELECT snap_interval, retention FROM dba_hist_wr_control;

You can modify these values using the MODIFY_SNAPSHOT_SETTINGS procedure in DBMS_WORKLOAD_REPOSITORY. For example, you can use this line to change the snapshot interval to 30 minutes and set the retention period to two weeks:

SQL> EXECUTE dbms_workload_repository.modify_snapshot_settings -
> (interval => 30, retention => 20160);

Both the interval and retention parameters should be specified in minutes. If you specify an interval of 0, Oracle will set the snapshot interval to 1 year; if you specify a retention period of 0, Oracle will set the retention period to 100 years.

You can take a snapshot manually at any time for the particular instance you are connected to using the CREATE_SNAPSHOT procedure at the SQL*Plus prompt, as in this example:

SQL> EXECUTE dbms_workload_repository.create_snapshot;

This procedure can accept a single parameter specifying the flush level for the snapshot, which is either TYPICAL or ALL. You can also create snapshots from within EM; you can access this functionality by navigating from the Administration page to the Workload Repository page, and then to the Snapshots page.

When you have a stable workload, you can take a baseline, which records the difference between two snapshots. First, identify a suitable pair of snapshot IDs using the DBA_HIST_SNAPSHOT view:

SQL> SELECT snap_id, instance_number, startup_time,
> begin_interval_time,end_interval_time
> FROM dba_hist_snapshot;

Both snapshots must belong to the same instance. They must also have the same instance start-up time. For example, assuming that the start and end snapshots for the period of interest are 234 and 236, you can generate a baseline:

SQL> EXECUTE dbms_workload_repository.create_baseline -
> ( -
>   start_snap_id => 234, -
>   end_snap_id => 236, -
>   baseline_name => 'Morning Peak' -
> );

You can check the current baselines in the DBA_HIST_BASELINE view. Unlike automatic snapshots, which will be purged at the end of the retention period, baselines will not be removed automatically. Baselines can be removed manually using the DROP_BASELINE procedure:

SQL> EXECUTE dbms_workload_repository.drop_baseline -
> ( -
>   baseline_name => 'Morning Peak' -
> );

You can extract reports from the AWR tables with the script, $ORACLE_HOME/rdbms/admin/awrrpt.sql. As shown in the following code example, this script asks whether you want the report output to be created in HTML or text format. It then asks you to specify the number of days of snapshots that you wish to view. Based on your response, it displays a list of the snapshots taken during this period. The script prompts you for the start and end snapshots for the report, as well as for a file name to write the report to:

SQL> @?/rdbms/admin/awrrpt

Current Instance
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜

   DB Id    DB Name      Inst Num Instance
----------- ------------ -------- ------------
  121667269 PROD                1 PROD1

Specify the Report Type
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
Would you like an HTML report, or a plain text report?
Enter 'html' for an HTML report, or 'text' for plain text
Defaults to 'html'
Enter value for report_type:

Type Specified:  html

Instances in this Workload Repository schema
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜

   DB Id     Inst Num DB Name      Instance     Host
------------ -------- ------------ ------------ ------------
  121667269         2 PROD         PROD2        london2.exam
ple.com
* 121667269         1 PROD         PROD1        london1.exam
                                                ple.com

Using  121667269 for database Id
Using          1 for instance number

Specify the number of days of snapshots to choose from
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed.  Pressing <return> without
specifying a number lists all completed snapshots.

Enter value for num_days:

Listing all Completed Snapshots

                                                        Snap
Instance     DB Name        Snap Id    Snap Started    Level
------------ ------------ --------- ------------------ -----
PROD1        PROD               110 08 Feb 2010 12:46      1
                                111 08 Feb 2010 14:00      1
...
                                278 15 Feb 2010 10:53      1
                                279 15 Feb 2010 10:58      1

Specify the Begin and End Snapshot Ids
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
Enter value for begin_snap:278
Begin Snapshot Id specified: 278

Enter value for end_snap: 279
End   Snapshot Id specified: 279

Specify the Report Name
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
The default report file name is awrrpt_1_278_279.html.  To use this name,
press <return> to continue, otherwise enter an alternative.

Enter value for report_name:

Using the report name awrrpt_1_278_279.html
...
Report written to awrrpt_1_278_279.html

If no file name is specified, a default name based on the start and end snapshots will be used. Keep in mind that you cannot create valid reports for a period during which an instance shutdown has been performed.

The awrrpt.sql script will only generate an AWR report for the instance to which you are connected. In a RAC environment, if you wish to generate an AWR report for another node in the cluster, you should use the awrrpti.sql script instead. This script prompts you for the same information as the standard script. However, it also gives you the option of specifying both the Database ID and the instance. Thus it lets you generate a report for any instance in the cluster without needing to run the command from the individual nodes:

SQL> @?/rdbms/admin/awrrpti
...
Instances in this Workload Repository schema
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜

   DB Id     Inst Num DB Name      Instance     Host
------------ -------- ------------ ------------ ------------
  121667269         2 PROD         PROD2        london2.exam
                                                dple.com
* 121667269         1 PROD         PROD1        london1.exam
                                                ple.com

Enter value for dbid:121667269
Using  for database Id
Enter value for inst_num: 2
...

Interpreting the RAC Statistics of an AWR Report

You have now generated an AWR report to review the statistics captured across a period of time for an instance. If you're reviewing the performance, you can and should generate reports for all instances within the cluster. The statistics contained within a report are for one instance only, and this instance and the host environment will be recorded directly under the section heading, "WORKLOAD REPOSITORY report for":

WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Startup Time    Release     RAC
------------ ----------- ------------ -------- --------------- ----------- ---
PROD           121667269 PROD1               1 12-Feb-10 13:36 11.2.0.1.0  YES

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
london1.example. Linux x86 64-bit                    8     8       2      15.66

Please take some time to examine the "Load Profile" section of this report. In particular, you should make sure that it covers the expected period and that meaningful activity has taken place on the database during this period.

Note

In the following sections, we concentrate on the differences between an AWR report for a single-instance database and a RAC database. Much of the content of an AWR report is identical for both.

Within an AWR report, there are two main sections containing RAC statistics. The first section appears immediately after the "Top 5 Timed Foreground Events" and includes Global Cache Load Profile, Global Cache Efficiency Percentages, Global Cache and Enqueue Services (Workload Characteristics, Global Cache, and Enqueue Services), Messaging Statistics, and the Cluster interconnect. The second section appears at the end of the report, and it includes Global Messaging Statistics, Global CR Served Stats and Global CURRENT Served Stats, Global Cache Transfer Statistics and Transfer Times, Interconnect Latency, and Throughput and Dynamic Remastering Stats.

Top 5 Timed Foreground Events

The "Top 5 Timed Foreground Events" section will normally show the amount of CPU consumed in addition to the top four wait events. For a given workload, this list of events should be relatively stable; you should investigate any significant variation. You will generally see CPU at the top of the list. However, CPU usage is not necessarily an indication of a healthy system, as the application may be CPU-bound. The most common wait events are db file sequential read, db file scattered read, db file parallel read, and log file sync.

For RAC databases, watch for wait events related to Global Cache Service (GCS), which are identified by the prefix gc or the Cluster wait. In a RAC environment, you should expect to see GCS events, but they should not be consuming the majority of the time on the system. The following example shows the Top 5 Timed Foreground Events from a RAC system that is experiencing a degree of inter-instance activity:

Top 5 Timed Foreground Events
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
                                                         Avg
                                                        wait   % DB
Event                               Waits     Time(s)   (ms)   time Wait Class
------------------------------ ---------- ----------- ------ ------ ----------
DB CPU                                            588          28.4
enq: TX - row lock contention      41,498         293      7   14.2 Applicatio
log file sync                     126,489         231      2   11.1 Commit
gc current block 2-way            409,784         230      1   11.1 Cluster
gc cr block 2-way                 407,726         220      1   10.6 Cluster

If your global cache events are prominent—and, in particular, if they appear above CPU time in the report—then this is an indication that you should drill down in the RAC statistics further to identify whether the amount of global cache traffic can be reduced.

Immediately after the "Top 5 Timed Foreground Events" section, the first RAC-specific section summarizes the number of instances for the start and end snapshots, as in this example:

Begin   End
                                ----- -----
           Number of Instances:     2     2

Differing values indicate that an instance has been started or stopped during the snapshot period. If the number of instances is equal at the end of the period, this does not mean that one or more instances have joined or left the cluster during the snapshot period. Starting and stopping instances causes a higher than normal level of inter-instance messaging; therefore, the report should be treated with caution.

Global Cache Load Profile

This section of the AWR report presents a summary of the traffic across the interconnect in terms of blocks exchanged by GCS and messages exchanged by both GCS and Global Enqueue Service (GES). Consider this example:

Global Cache Load Profile
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜                  Per Second       Per Transaction
                                      ---------------       ---------------
  Global Cache blocks received:              3,510.45                  5.81
    Global Cache blocks served:              3,526.89                  5.84
     GCS/GES messages received:              7,447.04                 12.32
         GCS/GES messages sent:              6,774.80                 11.21
            DBWR Fusion writes:                578.61                  0.96
 Estd Interconnect traffic (KB)             59,076.44

Pay particular attention to the estimated interconnect traffic per second value at the end of this section. You should reference this value against the bandwidth of your network interconnect to ensure that your configuration has sufficient capacity for your requirements. In this example, utilization is 59MB/sec of a gigabit Ethernet interconnect, which has a theoretical maximum limit of 100MB/sec. Thus, this example shows a reasonably high level of utilization.

Global Cache Efficiency Percentages

This section of the AWR report shows the percentage of blocks accessed from local cache, remote cache, and disk. In an optimum system, the percentage of local cache accesses should approach 100%, while the percentage of remote cache accesses and disk accesses should both approach 0%. The following example shows a level of efficiency in terms of the percentages that you should expect to see:

Global Cache Efficiency Percentages (Target local+remote 100%)
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
Buffer access -  local cache %:   94.38
Buffer access - remote cache %:    5.52
Buffer access -         disk %:    0.10

Bear in mind that it may take longer to read a block from disk than from a remote cache, although advancements in disk technology such as Solid State Disks (SSDs), are likely to reduce the margin of advantage towards the interconnect (see Chapter 4 for more information on SSDs). Therefore, you should concentrate on reducing both the amount of disk I/O and the global cache communication.

Global Cache and Enqueue Services - Workload Characteristics

This section of the AWR report describes the average times required to perform various GCS and GES tasks:

Global Cache and Enqueue Services - Workload Characteristics
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
                     Avg global enqueue get time (ms):      0.3
Avg global cache cr block receive time (ms):      1.2
     Avg global cache current block receive time (ms):      1.0

            Avg global cache cr block build time (ms):      0.0
             Avg global cache cr block send time (ms):      0.0
      Global cache log flushes for cr blocks served %:      4.5
            Avg global cache cr block flush time (ms):      1.6

         Avg global cache current block pin time (ms):      0.1
        Avg global cache current block send time (ms):      0.0
 Global cache log flushes for current blocks served %:      3.2
       Avg global cache current block flush time (ms):      2.0

Of these statistics, the most significant are the enqueue get time, which should ideally be below 1 ms; and the global cache cr and current block receive times.

Global Cache and Enqueue Services - Messaging Statistics

The "Global Cache and Enqueue Services" section of the AWR report describes average time to exchange different categories of inter-instance messages, as in this example:

Global Cache and Enqueue Services - Messaging Statistics
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
                     Avg message sent queue time (ms):      0.1
             Avg message sent queue time on ksxp (ms):      0.6
                 Avg message received queue time (ms):      0.0
                    Avg GCS message process time (ms):      0.0
                    Avg GES message process time (ms):      0.0

                            % of direct sent messages:    65.85
                          % of indirect sent messages:    33.18
                        % of flow controlled messages:     0.96

Cluster Interconnect

The "Cluster Interconnect" section of the report should be used to confirm that the interconnect network is running across the configured private interconnect, as opposed to a public network:

Cluster Interconnect
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
                                    Begin                             End
             --------------------------------------------------   -----------
Interface    IP Address      Pub Source                           IP  Pub Src
----------   --------------- --- ------------------------------   --- --- ---
eth1         192.168.1.1     N   Oracle Cluster Repository

Foreground Wait Class

The "Foreground Wait Class" enables you to observe wait events in a grouped manner, as opposed to individual wait events. Thus it provides a simpler diagnosis on where performance issues may lie. In the following example, nearly 40% of all wait events are related to cluster activity; these events are likely to be impacting performance:

Foreground Wait Class                      DB/Inst: PROD/PROD1  Snaps: 278-279

                                                                 Avg
                                     %Time       Total Wait     wait
Wait Class                     Waits -outs         Time (s)     (ms)  %DB time
-------------------- --------------- ----- ---------------- -------- ---------
Cluster                    1,068,216     0              814        1      39.3
DB CPU                                                  588               28.4
Application                   41,503     0              293        7      14.2
Commit                       126,489     0              231        2      11.1
User I/O                      17,300     0               73        4       3.5
Configuration                    198     0               26      131       1.3
Other                      1,050,015     0               20        0       1.0
Concurrency                    8,493     2                4        0       0.2
Network                      141,279     0                0        0       0.0
System I/O                        87     0                0        0       0.0

Wait Event Histogram

The "Wait Event Histogram" section enables you to observe all RAC-related wait events. It shows the time consumed by the wait events, which are grouped into classes by the time consumed. These wait events range from less than 1 millisecond to longer than a second. In the following example, you can see that block busy and congested related events consume the most time on an individual event basis:

Total
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ---
...
gc cr block 2-way          407.3  91.3   5.9   2.4    .4    .0    .0    .0
gc cr block busy           27.9K  25.4  50.1  16.8   3.3   2.3   1.8    .3
gc cr block congested        644  66.5   9.0  15.5   8.9    .2
gc cr block lost               2                                     100.0
gc cr failure                377  91.5   5.6   2.7    .3
gc cr grant 2-way           7309  91.8   5.6   2.2    .3    .0
gc cr grant congested         13  53.8   7.7  15.4  23.1
gc cr multi block request  77.3K  97.4   1.8    .8    .1    .0    .0    .0
gc current block 2-way     410.2  91.1   6.1   2.4    .3    .0    .0    .0  .0
gc current block busy      18.6K  19.3  31.0  16.7  14.4  13.4   3.0   2.1  .0
gc current block congested   669  68.5   7.9  14.5   9.0                .1
gc current block lost          1                                     100.0
...

"SQL Statement" Sections

The AWR report contains a series of sections listing SQL statements in the library cache that exceed predefined thresholds. The report does not show the statements per se, but it does show their statement ID numbers. Of particular interest to RAC users are the SQL statements ordered in the "By Cluster Wait Time" section, which details the amount of time a statement was involved in waits for cluster resources. The cluster wait time is reported as a percentage of total elapsed time for the statement, along with the elapsed time, CPU time, and number of executions.

For example, the statement being reported on in the following extract has experienced 384.3 seconds of cluster wait time over 61,113 executions, and the cluster wait time represents 47% of total elapsed time, which is significant enough to warrant further investigation. To do so, you should note the SQL Id used as input into the awrsqlrpt.sql script, as explained later in this chapter:

SQL ordered by Cluster Wait Time          DB/Inst: PROD/PROD1  Snaps: 278-279
     Cluster                        Elapsed
 Wait Time (s) Executions %Total    Time(s)   %Clu   %CPU    %IO    SQL Id
-------------- ---------- ------ ---------- ------ ------ ------ -------------
       384.3       61,113   47.8      804.5   47.2   37.8    1.7 16dhat4ta7xs9
Module: wish8.5@loadgen1 (TNS V1-V3)
begin neword(:no_w_id,:no_max_w_id,:no_d_id,:no_c_id,:no_o_ol_cnt,
:no_c_discount,:no_c_last,:no_c_credit,:no_d_tax,:no_w_tax,:no_d_next_o_id,
TO_DATE(:timestamp,'YYYYMMDDHH24MISS')); END;

RAC-Related Segment Statistics

It is possible to observe the RAC-related segment statistics when viewed in combination with the SQL related data. These statistics include: global cache buffer busy, CR blocks received, and current blocks received. In the following example, you can observe that a large degree of the Global Cache Busy related statistics are generated by SQL running against the WAREHOUSE table and the CR and Current Blocks against the STOCK table:

Segments by Global Cache Buffer Busy       DB/Inst: PROD/PROD1  Snaps: 278-279

                                                                  GC
         Tablespace                      Subobject  Obj.        Buffer    % of
Owner       Name    Object Name            Name     Type          Busy Capture
-------- ---------- -------------------- ---------- ----- ------------ -------
TPCC     TPCCTAB    WAREHOUSE                       TABLE        6,199   55.66
TPCC     TPCCTAB    INORD                           INDEX        3,440   30.89
TPCC     TPCCTAB    ORDERS                          TABLE          630    5.66
TPCC     TPCCTAB    IORDL                           INDEX          441    3.96
TPCC     TPCCTAB    DISTRICT                        TABLE          251    2.25

Segments by CR Blocks Received            DB/Inst: PROD/PROD1  Snaps: 278-279

                                                                 CR
         Tablespace                      Subobject  Obj.       Blocks
Owner       Name    Object Name            Name     Type      Received  %Total
-------- ---------- -------------------- ---------- ----- ------------ -------
TPCC     TPCCTAB    STOCK                           TABLE      321,799   54.74
TPCC     TPCCTAB    CUSTOMER                        TABLE       97,031   16.51
TPCC     TPCCTAB    INORD                           INDEX       61,230   10.42
TPCC     TPCCTAB    IORDL                           INDEX       38,203    6.50
TPCC     TPCCTAB    ORDERS                          TABLE       17,544    2.98


Segments by Current Blocks Received        DB/Inst: PROD/PROD1  Snaps: 278-279

                                                              Current
         Tablespace                      Subobject  Obj.       Blocks
Owner       Name    Object Name            Name     Type      Received  %Total
-------- ---------- -------------------- ---------- ----- ------------ -------
TPCC     TPCCTAB    STOCK                           TABLE      220,543   45.90
TPCC     TPCCTAB    IORDL                           INDEX       57,394   11.94
TPCC     TPCCTAB    INORD                           INDEX       42,573    8.86
TPCC     TPCCTAB    DISTRICT                        TABLE       38,989    8.11
TPCC     TPCCTAB    CUSTOMER                        TABLE       37,746    7.86

Dictionary Cache Stats (RAC)

For RAC instances, a subsection of the "Dictionary Cache Stats" section reports the number of GES requests, conflicts, and releases for each object type, as in this example:

Dictionary Cache Stats (RAC)               DB/Inst: PROD/PROD1  Snaps: 278-279

                                   GES          GES          GES
Cache                         Requests    Conflicts     Releases
------------------------- ------------ ------------ ------------
dc_awr_control                       2            2            0
dc_objects                          10            0            6
dc_segments                         10           12            0
dc_sequences                        26            3            0

In this report, you should look for an excessive numbers of GES conflicts, especially for sequences.

Library Cache Activity (RAC)

For RAC instances, a subsection of the "Library Cache Activity" section reports the number of GES lock requests, pin requests, pin releases, invalidation requests, and invalidations. Here's an example of that subsection:

Library Cache Activity (RAC)               DB/Inst: PROD/PROD1  Snaps: 278-279

                    GES Lock      GES Pin      GES Pin   GES Inval GES Invali-
Namespace           Requests     Requests     Releases    Requests     dations
--------------- ------------ ------------ ------------ ----------- -----------
BODY                       0       69,471       69,471           0           0
CLUSTER                    3            3            3           0           0
DBLINK                    20            0            0           0           0
EDITION                   18           18           18           0           0
INDEX                     84           84           84           0           0
QUEUE                      0          223          223           0           0
RULESET                    0            4            4           0           0
SCHEMA                    16            0            0           0           0
SUBSCRIPTION               0            2            2           0           0
TABLE/PROCEDURE        1,089      211,136      211,136           0           0
TRIGGER                    0           89           89           0           0

Global Messaging Statistics

In the event that any serious Global Messaging problems appear in the "Top 5 Timed Foreground Events" or in the "Wait Events" sections, the "Global Messaging Statistics" section can be used to investigate further, as in this example:

Global Messaging Statistics               DB/Inst: PROD/PROD1  Snaps: 278-279

Statistic                                    Total   per Second    per Trans
--------------------------------- ---------------- ------------ ------------
acks for commit broadcast(actual)           99,813        328.0          0.5
acks for commit broadcast(logical          191,511        629.2          1.0
broadcast msgs on commit(actual)           154,497        507.6          0.8
broadcast msgs on commit(logical)          159,559        524.3          0.9
broadcast msgs on commit(wasted)            36,032        118.4          0.2
broadcast on commit wait time(ms)            6,075         20.0          0.0
broadcast on commit waits                    9,799         32.2          0.1

Global CR Served Statistics

The "Global CR Served Statistics" section contains the statistics for the number of blocks served by GCS. These statistics give an overview of GCS activity during the reporting period, and they are derived from the V$CR_BLOCK_SERVER dynamic performance view, as in this example:

Global CR Served Stats                    DB/Inst: PROD/PROD1  Snaps: 278-279

Statistic                                   Total
------------------------------ ------------------
CR Block Requests                         566,707
CURRENT Block Requests                     29,833
Data Block Requests                       566,710
Undo Block Requests                         3,317
TX Block Requests                          26,341

Global Current Served Statistics

The "Global Current Served Statistics" section of the AWR report contains histograms of the GCS operations required to support current read requests, including block pinning, flushing redo to disk, and write operations. The report is derived from the V$CURRENT_BLOCK_SERVER dynamic performance view. You should check this view if you believe that Cache Fusion is causing a bottleneck on systems with high levels of DML activity, as in this example:

Global CURRENT Served Stats                DB/Inst: PROD/PROD1  Snaps: 278-279
-> Pins    = CURRENT Block Pin Operations
-> Flushes = Redo Flush before CURRENT Block Served Operations
-> Writes  = CURRENT Block Fusion Write Operations

Statistic         Total   % <1ms  % <10ms % <100ms    % <1s   % <10s
---------- ------------ -------- -------- -------- -------- --------
Pins            477,147    99.65     0.32     0.02     0.01     0.00
Flushes          15,128    51.65    44.53     3.81     0.00     0.00
Writes          176,102     0.23    15.83    26.38    57.18     0.38

Global Cache Transfer Statistics

The "Global Cache Transfer Statistics" section of the report gives an overview of Global Cache transfer activity for each class of database buffer, as in the following example. This data is derived from the V$INSTANCE_CACHE_TRANSFER dynamic performance view; it is especially useful in diagnosing whether cache transfer activity is affected by contention or a high level of system load. You can use this data to analyze both circumstances. For Clusters with more than two nodes, these statistics can also be used to determine the impact of messaging between multiple nodes in the cluster:

Global Cache Transfer Stats                DB/Inst: PROD/PROD1  Snaps: 278-279
                               CR                         Current
                 ----------------------------- -----------------------------
Inst Block         Blocks      %      %      %   Blocks      %      %      %
  No Class       Received  Immed   Busy Congst Received  Immed   Busy Congst
---- ----------- -------- ------ ------ ------ -------- ------ ------ ------
   2 data block   556,975   95.6    3.2    1.2  478,247   95.8    3.7     .5
   2 undo header   27,929   66.7   33.2     .1    1,717   52.6   47.4     .0
   2 undo block     2,894   68.6   31.3     .0        0    N/A    N/A    N/A
   2 Others           181   97.8    1.7     .6      653   98.9     .6     .5
...

Global Cache DB/Inst: PROD/PROD1  Snaps: 278-279
-> Blocks Lost, 2-hop and 3-hop Average times in (ms)
-> ordered by CR + Current Blocks Received desc

                            CR Avg Time (ms)           Current Avg Time (ms)
                    ---------------------------- ----------------------------
 Src Block     Lost
Inst Class     Time      Immed     2hop     3hop      Immed     2hop     3hop
---- -------- ----- ---------- -------- -------- ---------- -------- --------
   2 data blo              1.1      1.1      N/A        0.8      0.8      N/A
   2 undo hea              0.6      0.6      N/A        0.7      0.7      N/A
   2 undo blo              0.6      0.6      N/A        N/A      N/A      N/A
   2 others                0.6      0.6      N/A        0.6      0.6      N/A

Interconnect Statistics

The "Interconnect Statistics" section enables you to observe the latency and throughput of your interconnect against the potential of the hardware selected for your cluster (see Chapter 4 for more information on this). You can use these statistics to determine whether the physical interconnect infrastructure is a bottleneck in your environment.

The following example shows the Interconnect Statistics section of the report. It provides ping latency statistics, throughput by client statistics, and device statistics:

Interconnect Ping Latency Stats            DB/Inst: PROD/PROD1  Snaps: 278-279
  Target 500B Pin Avg Latency      Stddev  8K Ping Avg Latency      Stddev
 Instance    Count    500B msg    500B msg    Count      8K msg      8K msg
--------- -------- ----------- ----------- -------- ----------- -----------
        1       24         .14         .03       24         .12         .03
        2       24         .49         .56       24         .59         .54

Interconnect Throughput by Client          DB/Inst: PROD/PROD1  Snaps: 278-279
-> Throughput of interconnect usage by major consumers
-> All throughput numbers are megabytes per second

                        Send     Receive
Used By           Mbytes/sec  Mbytes/sec
---------------- ----------- -----------
Global Cache           24.93       24.78
Parallel Query           .00         .00
DB Locks                1.88        2.04
DB Streams               .00         .00
Other                    .00         .00

Dynamic Remastering Statistics

The "Dynamic Remastering Statistics" section gives you information on the internal Oracle workload in the mastering or ownership of the status of the Database objects. You also get information on Oracle workload reconfiguration, which you can use to help reduce the amount of messaging required by your workload, as in this example:

Dynamic Remastering Stats                  DB/Inst: PROD/PROD1  Snaps: 278-279
-> times are in seconds
-> Affinity objects - objects mastered due to affinity at begin/end snap

                                                 per    Begin      End
Name                             Total   Remaster Op     Snap     Snap
------------------------- ------------ ------------- -------- --------
remaster ops                         1          1.00
remastered objects                   2          2.00
replayed locks received         92,841     92,841.00
replayed locks sent            131,129    131,129.00
resources cleaned                    0          0.00
remaster time (s)                  3.3          3.30
quiesce time (s)                   1.6          1.55
freeze time (s)                    0.0          0.01
cleanup time (s)                   0.4          0.36
replay time (s)                    0.6          0.58
fixwrite time (s)                  0.2          0.19
sync time (s)                      0.6          0.60
affinity objects                                 N/A        7        5

Active Session History

The Active Session History (ASH) is a component of the AWR repository. ASH samples all sessions once per second and records information about those sessions that are currently waiting. This information is used by ADDM to classify any problems that have been identified.

For example, ADDM may be aware that a significant amount of time is being consumed waiting for I/O through waits for db file sequential read. ASH can identify the specific files and blocks that are causing the waits. This data is used by ADDM to produce more accurate recommendations.

ASH acquires information directly by sampling the session state objects. The default sampling interval is 1,000 milliseconds (1 second). ASH only records information about active sessions. It does not include information about recursive sessions or sessions waiting for idle events.

The information collected by ASH is flushed to disk periodically. By default, only one out of every ten active session samples is flushed to disk. Information flushed from the ASH buffer to disk is written to the workload repository history table, WRH$ACTIVE_SESSION_HISTORY, which is owned by SYS and stored in the SYSAUX tablespace.

Previously, you saw how you can use the EM interface to dynamically view ASH-based information. Additionally, a script is provided to report on the contents of the ASH repository. You can find this script in $ORACLE_HOME/rdbms/admin/ashrpt.sql. The script should be run in SQL*Plus, as in this example:

SQL> @?/rdbms/admin/ashrpt

Current Instance
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜

   DB Id    DB Name      Inst Num Instance
----------- ------------ -------- ------------
  121667269 PROD                1 PROD1


Specify the Report Type
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
Enter 'html' for an HTML report, or 'text' for plain text
Defaults to 'html'
Enter value for report_type:

Type Specified:  html


Instances in this Workload Repository schema
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜

   DB Id     Inst Num DB Name      Instance     Host
------------ -------- ------------ ------------ ------------
121667269         2 PROD         PROD2        london2.exam
                                                ple.com

* 121667269         1 PROD         PROD1        london1.exam
                                                ple.com


Defaults to current database

Using database id: 121667269

Enter instance numbers. Enter 'ALL' for all instances in a
RAC cluster or explicitly specify list of instances (e.g., 1,2,3).
Defaults to current instance.

Using instance number(s): 1
...
Summary of All User Input
-------------------------
Format         : HTML
DB Id          : 121667269
Inst num       : 1
Begin time     : 15-Feb-10 11:03:01
End time       : 15-Feb-10 11:18:02
Slot width     : Default
Report targets : 0
Report name    : ashrpt_1_0215_1118.html

Much as when generating an AWR, by default ashrpt.sql will generate a report for the local instance only. To generate a report for an alternate instance, you can use the ashrpti.sql script. The script will generate output in HTML or text format, and you can specify a time-based interval for the duration of the period covered by the report. You specify this interval in minutes, which is 15 minutes by default. The following example shows the heading section of the generated ASH report:

ASH Report For PROD/PROD1

DB Name         DB Id    Instance     Inst Num Release     RAC Host
------------ ----------- ------------ -------- ----------- --- ------------
PROD           121667269 PROD1               1 11.2.0.1.0  YES london1.exam

CPUs         SGA Size       Buffer Cache        Shared Pool    ASH Buffer Size
---- ---------------- ------------------ ------------------ ------------------
   8    8,155M (100%)     6,656M (81.6%)     1,344M (16.5%)       15.5M (0.2%)

The first section of the ASH report describes the environment in which the report was created, including the database, instance, release, node, and the number of CPUs. It also describes the sizes of the SGA, buffer cache, shared pool, and ASH buffer. The remainder of the report details the information specific to sessions active during the period of time for which the report was generated, and you should look for cluster-related activity within this session information.

Automatic Database Diagnostic Monitor

The Automatic Database Diagnostic Monitor (ADDM) uses data captured in the AWR to diagnose database performance, identify any problems, and suggest potential solutions. ADDM is built directly into the kernel, which minimizes any performance overhead.

ADDM analyzes database performance holistically. In other words, it considers all activity across the database before making recommendations about specific areas of the workload.

Executing an ADDM Report

ADDM runs automatically after each AWR snapshot, and the results are saved in the database. If ADDM detects any issues, then alerts are generated that can be inspected in the EM tools.

You can also run an ADDM report manually using the following script:

$ORACLE_HOME/rdbms/admin/addmrpt.sql

This script is similar to those used to generate AWR reports. It prompts for a start and end snapshot, as well as for the name of an output file. If no output file name is specified, then the report will be written to a file called:

addmrpt_<instance>_<start_snapshot>_<stop_snapshot>.txt

For example, the file name with its bracketed information filled in might look like this:

addmrpt_1_278_279.txt

Although the procedure for running ADDM reports is similar to that used with AWR reports, the content of an ADDM report is significantly different.

Controlling ADDM

The DBMS_ADVISOR and DBMS_ADDM packages allow you to control ADDM. The simplest way to access the DBMS_ADVISOR package is from EM, which shows a complete performance overview, including any recommendations, on a single page. The addmprt.sql script also calls the DBMS_ADVISOR package, allowing you to access ADDM manually. You can access the DBMS_ADVISOR API directly using PL/SQL calls. In addition to DBMS_ADVISOR, the DBMS_ADDM package enables the generation of reports for all instances of a RAC environment.

ADDM stores information in a set of tables owned by SYS in the SYSAUX tablespace. The base tables have a prefix of WRI$%. These tables are accessible through a number of views with the prefix, DBA_ADVISOR_%.

ADDM needs at least two snapshots in the AWR before it can perform any analysis. The ADDM report header describes the analysis period, database and instance names, hostname, and database version:

ADDM Report for Task 'TASK_875'
          -------------------------------
Analysis Period
---------------
AWR snapshot range from 278 to 279.
Time period starts at 15-FEB-10 10.39.53 AM
Time period ends at 15-FEB-10 10.58.19 AM

Analysis Target
---------------
Database 'PROD' with DB ID 121667269.
Database version 11.2.0.1.0.
ADDM performed an analysis of instance PROD1, numbered 1 and hosted at
london1.example.com.

Activity During the Analysis Period
-----------------------------------
Total database time was 2068 seconds.
The average number of active sessions was 1.87.

Summary of Findings
-------------------
Description                               Active Sessions      Recommendations
                                          Percent of Activity
----------------------------------------  -------------------  --------------

1  Top SQL Statements                        1.1 | 59.02       8
2  Global Cache Messaging                    .72 | 38.46       1
3  Top Segments by "User I/O" and "Cluster"  .71 | 37.74       5
4  Buffer Busy - Hot Objects                 .29 | 15.73       0
5  Row Lock Waits                            .26 | 14.17       2
6  Commits and Rollbacks                     .22 | 11.72       2

The header specifies the measured time period, and the ADDM report is based on this statistic. The report header also summarizes the average database load during the snapshot period and a summary of the findings.

The DBMS_ADDM package provides similar functionality, except for multiple instances in a RAC environment. For example, calling DBMS_ADDM.ANALYZE_DB with the database name and start and end snapshot ids enables the analysis of all of the instances in the cluster. DBMS_ADDM.ANALYZE_INST does the analysis on one instance in the cluster, while DBMS_ADDM.ANALYZE_PARTIAL does the analysis on a selected number of instances in the cluster. For example, the following excerpt shows the generation of a partial report for instances 1, 2, and 3 of a 4-node cluster:

Begin
   :name := 'Partial ADDM RAC';   dbms_addm.analyze_partial(:name,'1,2,3',1012,1387);
End;
set long 1000000 pagesize 0;
select dbms_addm.get_report(:name) from dual;

The Report Format

An ADDM report is divided up into findings, which are listed in descending order of their perceived impact on database time. The following example shows a finding from an ADDM report. The finding highlights RAC-based wait events under the "Cluster wait" category. In this case, the recommendation was to investigate an UPDATE statement responsible for a significant proportion of the cluster wait time:

Finding 2: Global Cache Messaging
Impact is 2.62 active sessions, 38.46% of total activity.
---------------------------------------------------------
Inter-instance messaging was consuming significant database time on this
instance.

   Recommendation 1: Application Analysis
   Estimated benefit is .72 active sessions, 38.46% of total activity.
   -------------------------------------------------------------------
   Action
      Look at the "Top SQL Statements" finding for SQL statements consuming
      significant time on Cluster waits. For example, the UPDATE statement
      with SQL_ID "82tfppq8s0dc2" is responsible for 20% of Cluster wait
      during the analysis period.

   Symptoms That Led to the Finding:
   ---------------------------------
      Wait class "Cluster" was consuming significant database time.
      Impact is .74 active sessions, 39.33% of total activity.

AWR SQL Report

If your performance monitoring investigations with AWR and ADDM reports reveal particular SQL Statements to investigate, you can view these in detail with this script:

$ORACLE_HOME/rdbms/admin/awrsqrpt.sql

You run the script just as you would to generate an AWR, except that in this case you specify a particular SQL ID. For example, this ID might be identified by the AWR report's "SQL Statements" section or an ADDM report:

SQL> @?/rdbms/admin/awrsqrpt

...

Specify the Begin and End Snapshot Ids
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
Enter value for begin_snap: 278
Begin Snapshot Id specified: 278
Enter value for end_snap: 279
End   Snapshot Id specified: 279

Specify the SQL Id
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
Enter value for sql_id: 16dhat4ta7x

...

Viewing the SQL Report provides a summary of the AWR Report capture period, the SQL ID, and the Elapsed time. In following example, the SQL ID identified in the AWR report is a simple UPDATE statement:

WORKLOAD REPOSITORY SQL Report

Snapshot Period Summary

DB Name         DB Id    Instance     Inst Num Startup Time    Release     RAC
------------ ----------- ------------ -------- --------------- ----------- ---
PROD           121667269 PROD1               1 12-Feb-10 13:36 11.2.0.1.0  YES

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:       278 15-Feb-10 10:53:14        55       2.9
  End Snap:       279 15-Feb-10 10:58:18        55       2.9
   Elapsed:                5.07 (mins)
   DB Time:               34.47 (mins)

SQL Summary                                DB/Inst: PROD/PROD1  Snaps: 278-279

                Elapsed
   SQL Id      Time (ms)
------------- ----------
g5u7xuchhfu62    293,546
Module: wish8.5@loadgen1 (TNS V1-V3)
UPDATE WAREHOUSE SET W_YTD = W_YTD + :B2 WHERE W_ID = :B1
...

The remainder of the report details the SQL Plan statistics, the Execution Plan, and the full SQL Statement, which is useful when longer statements are truncated within the AWR report.

In our examples, we have used the information from the AWR to pinpoint a workload with significant cluster wait class activity. This activity is due to a number of UPDATE statements being processed by a several sessions simultaneously running the same UPDATE statement on multiple nodes at the same time.

Performance Monitoring Using SQL*Plus

Since early versions of Oracle 7, the two main sources of information for performance tuning an Oracle database have been system statistics and wait events. Both have been extended with additional statistics and events in subsequent releases. In releases up to and including 11g, the kernel has been much more thoroughly instrumented. This has resulted in a large increase in the number of latches and wait events that are individually reported. Execution time information is also available in the form of time-based metrics.

GV$ Views

Before reviewing the SQL*Plus queries, we will briefly review the relationship between the X$ tables, GV$ views, and V$ views. All are compiled into the Oracle executable and exist for both single-instance and RAC databases.

X$ tables present instance or session memory structures as tables. Although they still exist, RAC-specific X$ tables are not populated for single-instance databases; instead, X$ tables are defined in V$FIXED_TABLE.

GV$ views are also built into the Oracle executable. Therefore, they exist for all databases (including single-instance ones), and they reference the X$ tables.

The standard V$ views are also built into the Oracle executable. In general, V$ views do not directly reference X$ tables; instead, they generally use them indirectly through GV$ views, using queries of the following form:

SELECT <column_list>
FROM gv$<view_name>
WHERE inst_id = USERENV ('INSTANCE'),

Both the GV$ and V$ views are defined in V$FIXED_VIEW_DEFINITION.

In a RAC environment, a number of additional synonyms for GV$ and V$ views are created by the script, $ORACLE_HOME/rdbms/admin/catclust.sql.

GV$ views include the instance number. When they are queried, the query is executed separately on all active instances using parallel execution. The results are merged on the instance initiating the query. GV$ views are not meaningful in a few cases because the views are derived from the control files. Because the control files are shared between all instances, the GV$ views return with duplicate sets of rows from each instance. For these views, we recommend using the local V$ dynamic performance views instead.

System Statistics

System statistics are maintained in each instance. They are reported in a number of dynamic performance views, including V$SYSSTAT, V$SESSTAT, and V$MYSTAT. In Oracle 11g, a subset of service statistics is also reported at a session level in V$SERVICE_STATS.

Segment Statistics

Segment statistics were introduced in Oracle 9.2. They provide a powerful mechanism you can use to identify which objects are subject to I/O or contention. Prior to the introduction of segment statistics, this information could only be obtained using an event 10046 level 8 trace or by polling the V$SYSTEM_WAIT dynamic performance view. The level of granularity is the object, so you may still need to use the 10046 trace if you need to identify a specific block or set of blocks.

Segment statistics are maintained in all databases when the value of the STATISTICS_LEVEL parameter is TYPICAL or ALL. Three statistics report global cache activity at the object level, and you can use these to quickly pinpoint the segments in your database that are experiencing the highest levels of inter-instance activity: gc cr blocks received, gc current blocks received, and gc buffer busy. You should be familiar with the first two statistics from the charts displayed in EM. The additional statistic, gc buffer busy, shows the segments that are experiencing levels of contention between instances.

The following example shows the global cache activity for the table segments of a schema named TPCC:

SQL> li
  1  SELECT
  2    table_name                 AS "Table Name",
  3    gc_buffer_busy             AS "Buffer Busy",
  4    gc_cr_blocks_received      AS "CR Blocks Received",
  5    gc_current_blocks_received AS "Current Blocks Received"
  6  FROM
  7  (
  8    SELECT table_name FROM dba_tables
  9    WHERE owner = 'TPCC'
 10  ) t,
 11  (
 12    SELECT object_name,value AS gc_buffer_busy
 13    FROM v$segment_statistics
 14    WHERE owner = 'TPCC'
 15    AND object_type = 'TABLE'
 16    AND statistic_name = 'gc buffer busy'
 17  ) ss1,
 18  (
 19    SELECT object_name,value AS gc_cr_blocks_received
 20    FROM v$segment_statistics
 21    WHERE owner = 'TPCC'
 22    AND object_type = 'TABLE'
 23    AND statistic_name = 'gc cr blocks received'
 24  ) ss2,
 25  (
 26    SELECT object_name,value AS gc_current_blocks_received
 27    FROM v$segment_statistics
 28    WHERE owner = 'TPCC'
 29    AND object_type = 'TABLE'
 30    AND statistic_name = 'gc current blocks received'
 31  ) ss3
 32  WHERE t.table_name = ss1.object_name
 33  AND t.table_name = ss2.object_name
 34* AND t.table_name = ss3.object_name
SQL> /

Table Name Buffer Busy CR Blocks Received Current Blocks Received
---------- ----------- ------------------ -----------------------
CUSTOMER            16             546039                  180391
DISTRICT           537              10130                  200367
HISTORY            238              11840                    7000
ITEM                 0                  0                       0
WAREHOUSE        13172              88643                  113499
STOCK              159            1649497                 1215391
ORDERS            1775             102103                   94959

7 rows selected.

Pay particular attention to the buffer busy statistics, which can help you identify contention between instances. In this example, the STOCK table is responsible for the highest degree of inter-instance messaging. However, the WAREHOUSE table is responsible for the highest degree of contention, which confirms the findings previously noted in the sections that covered AWR, ASH, and ADDM.

If contention is high, it means there are a large number of rows per block, and sessions are updating multiple rows in that block, as opposed to a single row. Thus you may benefit from distributing the rows across a greater number of blocks by increasing the value of PCTFREE for the object in question.

Global Caches Services: Consistent and Current Reads

You can investigate GCS activity in terms of consistent reads and current reads. For example, you can use this formula to monitor the performance of consistent reads:

(gc cr block receive time X 10) / (gc cr blocks received)

The preceding formula returns the average time for consistent read block requests for the instance in milliseconds, and it is the GC CR metric of the Global Cache Block Transfer Rate that you observe within the EM RAC performance views. This is the most important value for determining interconnect performance because it usually reflects the largest constituent component of interconnect traffic.

Requests for global resources for data blocks originate in the buffer cache of the requesting instance. Before a request enters the GCS request queue, Oracle allocates data structures in the SGA to track the state of the request. It also collects statistics on these resource structures. The following query and output shows the consistent reads receive time, the blocks received, and the average latency:

SQL> li
  1  SELECT
  2    gc_cr_block_receive_time AS "Receive Time",
  3    gc_cr_blocks_received AS "Blocks Received",
  4    (gc_cr_block_receive_time * 10) /
  5      gc_cr_blocks_received AS "Average Latency (MS)"
  6  FROM
  7  (
  8      SELECT value AS gc_cr_block_receive_time FROM v$sysstat
  9      WHERE name = 'gc cr block receive time'
 10  ),
 11  (
 12    SELECT value AS gc_cr_blocks_received FROM v$sysstat
 13    WHERE name = 'gc cr blocks received'
 14* )
SQL> /

Receive Time Blocks Received Average Latency (MS)
------------ --------------- --------------------
      365718         3378788           1.08239404

The latency of a consistent block request is the time elapsed between the original request and the receipt of the consistent block image at the local instance. Using a Gigabit Ethernet interconnect, this value should normally be less than 2ms and should not exceed 10ms. However, this value can be affected by the system configuration and volume. In this example, you see that the average latency is 1ms; therefore, it lies within expectations. If latencies exceed 10ms, you should first use the Linux operating system utilities, such as netstat (see Chapter 13 for more information on this utility), to determine whether there are any network configuration issues, such as network packet send and receive errors. You should also use operating system utilities such as top and sar to measure the load on the nodes themselves.

If you have double-checked that the interconnect is configured correctly, and you are still experiencing high average latencies, you might consider reducing the value of DB_FILE_MULTIBLOCK_READ_COUNT. This parameter specifies the number of blocks a process will request in a single operation. The process will have to wait for all blocks to be returned; therefore, higher values may cause longer waits. Before you adjust this parameter, carefully assess its potential impact on the entire workload.

High average latencies may also be caused by a high number of incoming requests or multiple nodes dispatching requests to the LMS process. You can calculate the average LMS service time as follows:

average LMS service time = average latency
        - average time to build consistent read block
        - average time to wait for log flush
        - average time to send completed block

Average latency is calculated as shown previously. Average time to build a consistent read block is calculated as follows:

gc cr block build time / gc cr block served

Similarly, average time spent waiting for a redo log flush is calculated as follows:

gc cr block flush time / gc cr blocks served

Average time to send a completed block is calculated as follows:

gc cr block send time / gc cr blocks served

The following query can be used to calculate the average LMS service time for consistent block reads:

SQL> li
  1  SELECT
  2  average_latency AS "Average Latency",
  3  average_build_time AS "Average Build Time",
  4  average_flush_time AS "Average Flush Time",
  5  average_send_time AS "Average Send Time",
  6  average_latency - average_build_time - average_flush_time - average_send _time
  7       AS "Average LMS Service Time"
  8   FROM
9   (
 10     SELECT
 11 (gc_cr_block_receive_time * 10) / gc_cr_blocks_received AS average_latency,
 12 (gc_cr_block_build_time * 10) / gc_cr_blocks_served AS average_build_time,
 13 (gc_cr_block_flush_time * 10) / gc_cr_blocks_served AS average_flush_time,
 14 (gc_cr_block_send_time * 10) / gc_cr_blocks_served AS average_send_time
 15     FROM
 16     (
 17       SELECT value AS gc_cr_block_receive_time FROM v$sysstat
 18       WHERE name = 'gc cr block receive time'
 19     ),
 20     (
 21       SELECT value AS gc_cr_blocks_received FROM v$sysstat
 22       WHERE name = 'gc cr blocks received'
 23     ),
 24     (
 25       SELECT value AS gc_cr_block_build_time FROM v$sysstat
 26       WHERE name = 'gc cr block build time'
 27     ),
 28     (
 29       SELECT value AS gc_cr_block_flush_time FROM v$sysstat
 30       WHERE name = 'gc cr block flush time'
 31     ),
 32     (
 33       SELECT value AS gc_cr_block_send_time FROM v$sysstat
 34       WHERE name = 'gc cr block send time'
 35     ),
 36     (
 37       SELECT value AS gc_cr_blocks_served FROM v$sysstat
 38       WHERE name = 'gc cr blocks served'
 39     )
 40*  )
SQL> /

Average Latency Average Build Time Average Flush Time Average Send Time
--------------- ------------------ ------------------ -----------------
Average LMS Service Time
------------------------
     1.11967715         .012249408         .113152249        .030138924
              .964136572

The difference between the average latency time and the sum of the average build, flush, and send times represents the time spent in the LMS service and the time spent transmitting the messages across the interconnect. In the example just shown, the LMS Service Time is within expected boundaries.

Global Cache Services: Current Block Activity

You can also use SQL*Plus to determine the current block activity, which is the GC Current metric of the Global Cache Block Transfer Rate from the EM RAC performance views. You can calculate the overall average latency involved in processing requests for current blocks using the following formula:

(gc current block receive time X 10) / (gc current blocks received)

The latency is returned using the following query:

SQL> li
  1  SELECT
  2     gc_current_block_receive_time AS "Receive Time",
  3     gc_current_blocks_received AS "Blocks Received",
  4     (gc_current_block_receive_time * 10) / gc_current_blocks_received
  5     AS "Average (MS)"
  6   FROM
  7   (
  8     SELECT value AS gc_current_block_receive_time
  9     FROM v$sysstat
 10     WHERE name = 'gc current block receive time'
 11   ),
 12   (
 13     SELECT value AS gc_current_blocks_received
 14     FROM v$sysstat
 15     WHERE name = 'gc current blocks received'
 16*  )
SQL> /

Receive Time Blocks Received Average (MS)
------------ --------------- ------------
      462110         4514548   1.02360192

You can calculate the amount of overall latency that can be attributed to the LMS process using the following snippet:

average LMS service time =
        average latency
        - average time to pin current blocks
        - average time to wait for log flush
        - average time to send completed block

The average latency is calculated as follows:

gc current block receive time / gc current blocks received

Similarly, the average time to pin current blocks is calculated as follows:

gc current block pin time / gc current block served

The average time spent waiting for a redo log flush is calculated as follows:

gc current block flush time / gc current blocks served

And the average time to send a completed block is calculated as follows:

gc current block send time / gc current blocks served

You can use the following query to calculate the average LMS service time for current block reads:

SQL> li
  1  SELECT
  2    average_latency    AS "Average Latency",
  3    average_pin_time   AS "Average Pin Time",
  4    average_flush_time AS "Average Flush Time",
  5    average_send_time  AS "Average Send Time",
  6    average_latency - average_pin_time - average_flush_time - average_send_time
  7    AS "Average LMS Service Time"
  8  FROM
  9  (
 10    SELECT
 11      (gc_current_block_receive_time * 10) / gc_current_blocks_received
 12        AS average_latency,
 13      (gc_current_block_pin_time * 10) / gc_current_blocks_served
 14        AS average_pin_time,
 15      (gc_current_block_flush_time * 10) / gc_current_blocks_served
 16        AS average_flush_time,
 17      (gc_current_block_send_time * 10) / gc_current_blocks_served
 18        AS average_send_time
 19    FROM
 20    (
 21      SELECT value AS gc_current_block_receive_time FROM v$sysstat
 22      WHERE name = 'gc current block receive time'
 23    ),
 24    (
 25      SELECT value AS gc_current_blocks_received FROM v$sysstat
 26      WHERE name = 'gc current blocks received'
 27    ),
 28    (
 29      SELECT value AS gc_current_block_pin_time FROM v$sysstat
 30      WHERE name = 'gc current block pin time'
 31    ),
 32    (
 33      SELECT value AS gc_current_block_flush_time FROM v$sysstat
 34      WHERE name = 'gc current block flush time'
 35    ),
 36    (
 37      SELECT value AS gc_current_block_send_time FROM v$sysstat
 38      WHERE name = 'gc current block send time'
 39    ),
 40    (
 41      SELECT value AS gc_current_blocks_served FROM v$sysstat
 42      WHERE name = 'gc current blocks served'
43    )
 44* )
SQL> /

Average Latency Average Pin Time Average Flush Time Average Send Time
--------------- ---------------- ------------------ -----------------
Average LMS Service Time
------------------------
     1.06117092       .148803422         .104628541        .028298863
              .779440094

High latency values may indicate server or interconnect performance issues; however, you should review your current block statistics against the possibility of contention for data between instances.

Global Enqueue Service

GES manages all the non-Cache Fusion intrainstance and inter-instance resource operations. High GES workload request rates can adversely affect performance. To calculate the average global enqueue time in milliseconds, use the following statement:

SQL> li
  1  SELECT
  2    global_enqueue_get_time AS "Get Time",
  3    global_enqueue_gets_sync AS "Synchronous Gets",
  4    global_enqueue_gets_async AS "Asynchronous Gets",
  5    (global_enqueue_get_time * 10) /
  6    (global_enqueue_gets_sync + global_enqueue_gets_async)
  7    AS "Average (MS)"
  8  FROM
  9  (
 10    SELECT value AS global_enqueue_get_time
 11    FROM v$sysstat
 12    WHERE name = 'global enqueue get time'
 13  ),
 14  (
 15    SELECT value AS global_enqueue_gets_sync
 16    FROM v$sysstat
 17    WHERE name = 'global enqueue gets sync'
 18  ),
 19  (
 20    SELECT value AS global_enqueue_gets_async
 21    FROM v$sysstat
 22    WHERE name = 'global enqueue gets async'
 23* )
SQL> /

  Get Time Synchronous Gets Asynchronous Gets Average (MS)
---------- ---------------- ----------------- ------------
    358597         27946887            115393   .127786124

Synchronous gets are usually locking events, whereas asynchronous gets are usually caused by nonblocking inter-instance process activity.

Library Cache

You can obtain further information about global enqueue activity caused by statement parsing and execution from the V$LIBRARYCACHE dynamic performance view. This view reports GES activity for locks, pins, and invalidations, as in this example:

SQL> li
  1  SELECT
  2    namespace                   AS "Namespace",
  3    dlm_lock_requests           AS "Lock Requests",
  4    dlm_pin_requests            AS "Pin Requests",
  5    dlm_pin_releases            AS "Pin Releases",
  6    dlm_invalidation_requests   AS "Invalidation Requests",
  7    dlm_invalidations           AS "Invalidations"
  8* FROM v$librarycache
SQL> /

Namespace       Lock Requests Pin Requests Pin Releases Invalidation Requests
--------------- -------------  ------------ ------------ --------------------
Invalidations
---------------
SQL AREA                    0             0            0                    0
0

TABLE/PROCEDURE        640720       5551128      5551128                    0
0

BODY                        2       1965888      1965888                    0
0

TRIGGER                     0         70915        70915                    0
0

INDEX                   14849         14849        14849                    0
0

CLUSTER                  2479          2503         2503                    0
0

If you see excessively high values in these columns, solutions include pinning packages in the shared pool using the KEEP procedure in the DBMS_SHARED_POOL package. Also, you might investigate methods of keeping cursors open either within the application or by using the SESSION_CACHED_CURSORS parameter.

Dictionary Cache

You can obtain additional information about global enqueue activity caused by statement pinning dictionary cache objects from the V$ROWCACHE dynamic performance view. This view reports GES lock requests, conflicts, and releases, as in this example:

SQL> li
  1  SELECT
  2    parameter      AS "Cache Name",
  3    dlm_requests   AS "Requests",
  4    dlm_conflicts  AS "Conflicts",
  5    dlm_releases   AS "Releases"
  6* FROM v$rowcache
SQL> /

Cache Name                         Requests  Conflicts   Releases
-------------------------------- ---------- ---------- ----------
dc_rollback_segments                     99          0          0
dc_free_extents                           0          0          0
dc_used_extents                           0          0          0
dc_segments                            7335        469       4158
dc_tablespaces                           20          0          0
dc_tablespace_quotas                      0          0          0
dc_files                                  6          0          0
dc_users                                352          0        152
dc_objects                             8712        106       1177
dc_global_oids                          171          0         27
dc_constraints                         2135          0          0
dc_sequences                            733         43          5
dc_histogram_defs                     56671        103       1828
kqlsubheap_object                         0          0          0
dc_table_scns                           121          0          0
...

You should look for high values in the Conflicts column. In the preceding example, there have been 43 conflicts for sequences. Uncached sequences are one of the most common causes of performance issues in RAC.

Lock Conversions

In Oracle 11g, most of the statistics related to lock conversions can be obtained from V$GES_CONVERT_LOCAL and V$GES_CONVERT_REMOTE. These views show the number of lock conversions on the local system and on remote systems, respectively:

SQL> li
  1  SELECT convert_type,average_convert_time, convert_count
  2* FROM v$ges_convert_local
SQL> /
CONVERT_TYPE     AVERAGE_CONVERT_TIME CONVERT_COUNT
---------------- -------------------- -------------
NULL -> SS                          0             0
NULL -> SX                          2             3
NULL -> S                           1         24064
NULL -> SSX                         5           627
NULL -> X                           3        166358
SS   -> SX                          0             0
SS   -> S                           0             0
SS   -> SSX                         0             0
SS   -> X                           0             0
SX   -> S                           0             0
SX   -> SSX                         0             0
SX   -> X                           0             0
S    -> SX                          0             0
S    -> SSX                         0             0
S    -> X                           1          1693
SSX  -> X                           1            12

Lock conversions are essential to the efficient operation of a RAC database; moreover, they are not necessarily harmful. However, it is important to check that lock conversions, like lock requests, are not being blocked by instances holding incompatible locks.

To check which instances are currently blocking other instances, use the following query:

SQL> li
  1  SELECT
  2    dl.inst_id,
  3    s.sid,
  4    p.spid,
  5    dl.resource_name1,
  6    decode (substr (dl.grant_level,1,8),
  7      'KJUSERNL','Null',
  8      'KJUSERCR','Row-S (SS)',
  9      'KJUSERCW','Row-X (SX)',
 10      'KJUSERPR','Share',
 11      'KJUSERPW','S/Row-X (SSX)',
 12      'KJUSEREX','Exclusive',
 13    request_level) as grant_level,
 14    decode(substr(dl.request_level,1,8),
 15      'KJUSERNL','Null',
 16      'KJUSERCR','Row-S (SS)',
 17      'KJUSERCW','Row-X (SX)',
 18      'KJUSERPR','Share',
 19      'KJUSERPW','S/Row-X (SSX)',
 20      'KJUSEREX','Exclusive',
 21    request_level) as request_level,
 22    decode(substr(dl.state,1,8),
 23      'KJUSERGR','Granted','KJUSEROP','Opening',
 24      'KJUSERCA','Cancelling',
 25      'KJUSERCV','Converting'
 26    ) as state,
27    sw.event,
 28    sw.seconds_in_wait sec
 29  FROM
 30    gv$ges_enqueue dl,
 31    gv$process p,
 32    gv$session s,
 33    gv$session_wait sw
 34  WHERE blocker = 1
 35  AND (dl.inst_id = p.inst_id AND dl.pid = p.spid)
 36  AND (p.inst_id = s.inst_id AND p.addr = s.paddr)
 37  AND (s.inst_id = sw.inst_id AND s.sid = sw.sid)
 38* ORDER BY sw.seconds_in_wait DESC
SQL> /

   INST_ID        SID SPID                     RESOURCE_NAME1
---------- ---------- ------------------------ ------------------------------
GRANT_LEVEL   REQUEST_LEVEL STATE
------------- ------------- ----------
EVENT                                                                   SEC
---------------------------------------------------------------- ----------
         2        127 5520                 [0x19][0x2],[RS]
Exclusive     Exclusive
rdbms ipc message                                                         2

         1        131 28215                    [0x1c000a][0x18a12],[TX]
Exclusive     Exclusive
gc cr request                                                             0

         2        126 5496                     [0x110012][0x5659b],[TX]
Exclusive     Exclusive
ges remote message

To discover which sessions are currently being blocked, first find this line in the preceding query:

>   WHERE blocker = 1

Once you locate the line, change it to this instead:

>   WHERE blocked = 1

Automatic Diagnostic Repository

In addition to proactively monitoring your Oracle Database performance, you should also be aware of any faults or errors in the software environment that may assist you in diagnosing the underlying reason for performance issues. In an Oracle 11g RAC environment, the Oracle Database Fault Diagnosability Infrastructure provides a central point for such fault diagnosis. Within this infrastructure, we will focus on the Automatic Diagnostic Repository (ADR), Trace Files, and the Oracle Database Health Monitor.

The ADR provides a framework for diagnosing faults in the Oracle Database software environment, even when the Database itself is unavailable. For this reason, the ADR is maintained as a directory structure external to the Database. The root of this directory structure is known as the ADR base, and it is defined by the DIAGNOSTIC_DEST init.ora parameter. If this parameter isn't set, then DIAGNOSTIC_DEST defaults to the ORACLE_BASE directory, as shown here:

SQL> show parameter diag

NAME                                 TYPE     VALUE
---------------------------- ----------- --------------------------
diagnostic_dest                      string   /u01/app/oracle

All of the nodes in the cluster may share a single ADR base on a shared cluster file system; however, this cluster file system should be separate from the Oracle Database environment. Therefore, OCFS2 would be preferable to ACFS because the ADR must remain available, even if the ASM instance is not. Alternatively, they can be maintained separately on all of the nodes in the cluster. Underneath the ADR base, each instance upon a node has an ADR home. Typically, in a RAC environment based on non-shared storage, this will mean that there is an ADR home for the ASM instance and an ADR home for the Database instance. All ADR homes are located under the diag directory and rdbms for the Database instance and asm for the ASM instance. The actual ADR home is defined by the database and instance names. You can set the ADR home at the operating system to a particular instance by setting the ADR_HOME environment variable. For example, this snippet sets the ADR home to a RAC database instance:

[oracle@london1 ˜]$ export ADR_HOME=/u01/app/oracle/diag/rdbms/prod/PROD1

ADR homes share a common directory structure, and you can view the directory structure for a particular instance in the view V$DIAG_INFO or for all instances in GV$DIAG_INFO:

SQL> select name, value from v$diag_info;

NAME
----------------------------------------------------------------
VALUE
----------------------------------------------------------------
Diag Enabled
TRUE
ADR Base
/u01/app/oracle
ADR Home
/u01/app/oracle/diag/rdbms/prod/PROD1
Diag Trace
/u01/app/oracle/diag/rdbms/prod/PROD1/trace
Diag Alert
/u01/app/oracle/diag/rdbms/prod/PROD1/alert
Diag Incident
/u01/app/oracle/diag/rdbms/prod/PROD1/incident
Diag Cdump
/u01/app/oracle/diag/rdbms/prod/PROD1/cdump
Health Monitor
/u01/app/oracle/diag/rdbms/prod/PROD1/hm
Default Trace File
/u01/app/oracle/diag/rdbms/prod/PROD1/trace/PROD1_ora_2404.trc
Active Problem Count
1
Active Incident Count
1

The ADR is managed and the collected information viewed with the adrci command-line utility:

[oracle@london1 PROD1]$ adrci

ADRCI: Release 11.2.0.1.0 - Production on Wed Feb 17 09:40:32 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/oracle"
adrci>

If you have not set the ADR_HOME environment variable, then the adrci commands will report on all ADR homes underneath the ADR base directory. However, some commands do not support multiple ADR homes, as in this example:

adrci> show control;
DIA-48448: This command does not support multiple ADR homes

For commands that don't support multiple ADR homes, it is necessary to either set the ADR_HOME environment variable externally or within adrci using the set homepath command. The show control command details information about the ADR environment for an incident. For example, the variables SHORTP_POLICY and LONGP_POLICY show the incident files and metadata retention polices, which are set to 720 hours or 30 days and 8760 hours or 365 days, respectively:

adrci> show control

ADR Home = /u01/app/oracle/diag/rdbms/prod/PROD1:
*************************************************************************
ADRID                SHORTP_POLICY        LONGP_POLICY       LAST_MOD_TIME                         LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                      ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY   ADRALERT_VERSION     CREATE_TIME
-------------------- -------------------- -------------------- -------------------------------------- ---------------------------------------- -------------------------------------- -------------------- -------------------- ------------------ -------------------- --------------------------------------
2978996791           720                  8760                 2010-02-01 14:40:45.803438 +00:00        2010-02-15 02:08:41.049771 +00:00                           1                    2                    76          1                  2010-02-01 14:40:45.803438 +00:00
1 rows fetched

These retention policies can be user-modified for a particular environment.

One interesting feature of adrci is that it provides a central tool to view the database alert log. If the ADR Home is not set, you have the option of selecting which alert log that you wish to view, as in this example:

adrci> show alert

Choose the alert log from the following homes to view:

1: diag/asm/+asm/+ASM1
2: diag/tnslsnr/london1/listener_scan2
3: diag/tnslsnr/london1/listener
4: diag/tnslsnr/london1/listener_scan1
5: diag/tnslsnr/london1/listener_scan3
6: diag/rdbms/prod/PROD1
Q: to quit

Alternatively, if the ADR Home is set, you will view the alert log for that instance. For example, the show alert command also supports standard Linux type options to tail the alert log for message in real time. The following example shows the alert log revealing an issue with "Checkpoint not complete" messages that will impact on performance (you can learn more about the reasons for this in Chapter 4).

adrci> show alert -tail -f
2010-02-16 22:00:39.917000 +00:00
Checkpoint not complete
  Current log# 2 seq# 1668 mem# 0: +DATA/prod/onlinelog/group_2.262.709828935
2010-02-16 22:00:42.928000 +00:00
Thread 1 advanced to log sequence 1669 (LGWR switch)
  Current log# 1 seq# 1669 mem# 0: +DATA/prod/onlinelog/group_1.261.709828935
2010-02-16 22:01:09.935000 +00:00
Thread 1 cannot allocate new log, sequence 1670
Checkpoint not complete
  Current log# 1 seq# 1669 mem# 0: +DATA/prod/onlinelog/group_1.261.709828935
2010-02-16 22:01:12.945000 +00:00
Thread 1 advanced to log sequence 1670 (LGWR switch)
  Current log# 2 seq# 1670 mem# 0: +DATA/prod/onlinelog/group_2.262.709828935

The alert log read by adrci is in XML format. for this reason, there is also a text version of the alert log in the trace directory that can be read with the standard Linux commands, as in this example:

/u01/app/oracle/diag/rdbms/prod/PROD1/trace/alert_PROD1.log

You may have also observed that the example information from V$DIAG_INFO revealed an active incident. You can view this incident and the time that it was raised with the show incident command, as in this example:

adrci> show incident

ADR Home = /u01/app/oracle/diag/rdbms/prod/PROD1:
***********************************************************************
INCIDENT_ID          PROBLEM_KEY  CREATE_TIME
-------------------- --------------------------------------------------------- ----------------------------------------
27729                ORA 29770  2010-02-09 00:45:35.477000 +00:00
1 rows fetched

You can also use adrci to create a package for this incident and create a zip file that includes all of the related information for it; this can be done either for your own benefit or to upload data to Oracle support:

adrci> ips create package incident 27729
Created package 1 based on incident id 27729, correlation level typical
adrci> ips generate package 1 in /tmp
cp: omitting directory `/u01/app/11.2.0/grid/log/london1/racg/racgmain'
cp: omitting directory `/u01/app/11.2.0/grid/log/london1/racg/racgeut'
cp: omitting directory `/u01/app/11.2.0/grid/log/london1/racg/racgevtf'
Generated package 1 in file /tmp/ORA29770_20100217094458_COM_1.zip, mode complete

In this next example, the incident trace file reveals that the incident is indeed cluster related. The file also provides a focus for investigating why performance was impacted at the point in time that the incident was raised:

ORA-29770: global enqueue process LMS0 (OSID 7419) is hung for more than 70 seconds

========= Dump for incident 27729 (ORA 29770) ========
----- Beginning of Customized Incident Dump(s) -----
==================================================
=== LMS0 (ospid: 7419) Heartbeat Report
==================================================
LMS0 (ospid: 7419) has no heartbeats for 100 sec. (threshold 70 sec)
  : Not in wait; last wait ended 64 secs ago.
  : last wait_id 1398345 at 'latch: gc element'.
==============================
Dumping PROCESS LMS0 (ospid: 7419) States
...

In addition to reactive incident response and monitoring, the ADR also provides a framework for proactively running verification checks against the database with a tool called Health Monitor. Health checks run a diagnosis to detect and report on faults such as datafile corruptions, and the list of checks can be viewed in V$HM_CHECK:

1* select name from v$hm_check
SQL> /
NAME
----------------------------------------------------------------
HM Test Check
DB Structure Integrity Check
CF Block Integrity Check
Data Block Integrity Check
Redo Integrity Check
Logical Block Check
Transaction Integrity Check
Undo Segment Integrity Check
No Mount CF Check
Mount CF Check
CF Member Check
All Datafiles Check
Single Datafile Check
Tablespace Check Check
Log Group Check
Log Group Member Check
Archived Log Check
Redo Revalidation Check
IO Revalidation Check
Block IO Revalidation Check
Txn Revalidation Check
Failure Simulation Check
Dictionary Integrity Check
ASM Mount Check
ASM Allocation Check
ASM Disk Visibility Check
ASM File Busy Check

Health checks can be run automatically in response to incidents. However, some health checks can be run manually with the DBMS.HM package, as in this example:

SQL> exec dbms_hm.run_check('DB Structure Integrity Check'),

PL/SQL procedure successfully completed.

Details of the health check can be viewed with adrci:

adrci> show hm_run;
...
**********************************************************
HM RUN RECORD 3
**********************************************************
   RUN_ID                        61
   RUN_NAME                      HM_RUN_61
   CHECK_NAME                    DB Structure Integrity Check
   NAME_ID                       2
   MODE                          0
   START_TIME                    2010-02-17 10:54:29.982586 +00:00
   RESUME_TIME                   <NULL>
END_TIME                      2010-02-17 10:54:29.985681 +00:00
   MODIFIED_TIME                 2010-02-17 10:54:29.985681 +00:00
   TIMEOUT                       0
   FLAGS                         0
   STATUS                        5
   SRC_INCIDENT_ID               0
   NUM_INCIDENTS                 0
   ERR_NUMBER                    0
   REPORT_FILE                   <NULL>
...

You can use adrci to create a report and view the findings:

adrci> create report hm_run HM_RUN_61;
adrci> show report hm_run HM_RUN_61;
<?xml version="1.0" encoding="US-ASCII"?>
<HM-REPORT REPORT_ID="HM_RUN_61">
    <TITLE>HM Report: HM_RUN_61</TITLE>
    <RUN_INFO>
        <CHECK_NAME>DB Structure Integrity Check</CHECK_NAME>
        <RUN_ID>61</RUN_ID>
        <RUN_NAME>HM_RUN_61</RUN_NAME>
        <RUN_MODE>MANUAL</RUN_MODE>
        <RUN_STATUS>COMPLETED</RUN_STATUS>
        <RUN_ERROR_NUM>0</RUN_ERROR_NUM>
        <SOURCE_INCIDENT_ID>0</SOURCE_INCIDENT_ID>
        <NUM_INCIDENTS_CREATED>0</NUM_INCIDENTS_CREATED>
        <RUN_START_TIME>2010-02-17 10:54:29.982586 +00:00</RUN_START_TIME>
        <RUN_END_TIME>2010-02-17 10:54:29.985681 +00:00</RUN_END_TIME>
    </RUN_INFO>
    <RUN_PARAMETERS/>
    <RUN-FINDINGS/>
</HM-REPORT>

You can also view the generated report under the ADR hm directory which in this example is/u01/app/oracle/diag/rdbms/prod/PROD1/hm, where the file HMREPORT_HM_RUN_61.hm was generated. This example didn't produce any findings, but if you do encounter findings in a run, their impact can also be seen in the V$HM_FINDING view

SQL>  select description, damage_description from v$hm_finding where run_id = 62;
...
Datafile 4: '+DATA/prod/datafile/users.259.709828865' needs media recovery
Some objects in tablespace USERS might be unavailable
...

An additional feature provided by the ADR is the ability to view and search the database trace files gathered in the trace directory under the ADR home:

/u01/app/oracle/diag/rdbms/prod/PROD1/trace

The command show tracefile lists all of the trace files in this directory. adrci also supports searching for strings within trace file names and showing trace files related to a particular incident:

adrci> show tracefile -I 27729

diag/rdbms/prod/PROD1/incident/incdir_27729/PROD1_lmhb_7430_i27729.trc

Once a trace file has been found, its contents can also be viewed under adrci, as in this example:

adrci> show trace PROD1_lmhb_7430_i27729.trc
...

In addition to viewing system-generated trace files, you can also generate trace files of your own applications. The DBMS_MONITOR package provides a single API interface from which you can enable and disable trace for a specific session, module, action, or client identifier. In addition, DBMS_MONITOR provides other functionality to enable more granular collection of statistics for specified modules, actions, and client identifiers.

To enable tracing in a specific session, you first need to identify the SID and, optionally, the serial number of the session. This information can be obtained from V$SESSION:

SQL> SELECT sid, serial# FROM v$session WHERE username = 'TPCC';

Note that you may need to join V$SESSION to other dynamic performance views, such as V$SQL, to identify the session of interest.

To enable tracing for a specific session, use the SESSION_TRACE_ENABLE procedure. For example, you can use the following line to enable tracing for a session with a SID of 164:

SQL> EXECUTE dbms_monitor.session_trace_enable (session_id=>164);

This command will immediately enable tracing for the specified session; the result will be appended to the current trace file.

You can also include bind variable information in the same trace file:

SQL> EXECUTE dbms_monitor.session_trace_enable (session_id=>164,binds=>true);

By default, wait information is included in the trace file. You can specify that wait events should be traced using this snippet:

SQL> EXECUTE dbms_monitor.session_trace_enable (session_id=>164,waits=>true);

You can also include both bind and wait information in the same trace file. Tracing can be disabled for the same session with this snippet:

SQL> EXECUTE dbms_monitor.session_trace_disable (session_id=>164);

You can also enable tracing for all sessions connected to the database or instance with database_trace_enable. Similarly, you can enable tracing for a particular service or module with serv_mod_act_trace_enable

Note that enabling tracing for all sessions will generate a large amount of trace output very quickly in the ADR trace directory. Enabling tracing globally will also slow down the database, so it is advisable not to enable it at the instance level on a production system.

Once you have generated your trace file, you can either view the contents directly or use the standard Oracle utility tkprof to provide a summary of the statements processed by the session when tracing was enabled and the events on which the statements waited. The following excerpt generated by tkprof illustrates how tracing permits you to view the activities of a session with finer granularity:

...
UPDATE DISTRICT SET D_YTD = D_YTD + :B3 WHERE D_W_ID = :B2 AND D_ID = :B1

call     count       cpu    elapsed       disk      query    current      rows
------- ------  -------- ---------- ---------- ---------- ----------  --------
Parse        0      0.00       0.00          0          0          0         0
Execute    391      0.17       0.77          0        868        464       391
Fetch        0      0.00       0.00          0          0          0         0
------- ------  -------- ---------- ---------- ---------- ----------  --------
total      391      0.17       0.77          0        868        464       391

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 85     (recursive depth: 1)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  ges message buffer allocation                 291        0.00          0.00
  gc current block 2-way                        196        0.00          0.07
  gc cr block busy                                7        0.00          0.00
  enq: TX - row lock contention                  71        0.02          0.53
  gc cr block 2-way                              25        0.00          0.00
  gc current block congested                      1        0.00          0.00
  gc current block busy                           8        0.00          0.00
  gc buffer busy acquire                          3        0.00          0.00
  KJC: Wait for msg sends to complete             2        0.00          0.00
  buffer busy waits                               1        0.00          0.00
******************************************************************************
...

Summary

In this chapter, we described some of the tools and techniques available to you for monitoring the performance of your RAC cluster at the Oracle level. In particular, we looked at the tools integrated with the Oracle 11g Manageability Infrastructure.

In the next chapter, we'll look at what to do at the Linux level. For example, you'll learn about the most important of the built-in Linux commands and tools to help you diagnose server and interconnect performance problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset