Chapter 10: Exadata Wait Events

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 10

Exadata Wait Events

The Oracle database is a very well-instrumented piece of code, and it has been so for quite a while. It keeps track of the amount of time spent in discrete operations via the use of wait events, unless the session in question is on the CPU. While the database software is quite complex, wait event analysis allows performance analysts to determine where the database is spending its time. Many difficult performance problems can be resolved by analyzing data from the wait interface and, more recently, the Active Session History (ASH). The introduction of Exadata has resulted in the creation of several new wait events to support the unique operations that are performed on the platform. This chapter will focus on describing these new events and how they relate to the activities actually being performed, while contrasting them with the wait events used by the database on non-Exadata platforms. It will also describe a few wait events that are not specific to Exadata but play an important role on Exadata platforms.

In some rare cases, the wait interface is not granular enough to allow you to work out what the database engine is spending time on. If you use your favorite search engine to search for “wait interface not enough,” you will find blog posts about that exact subject, as well as how the session statistics Oracle maintains can provide further insight into what a session does. Chapter 11 will help you understand the session statistics better. In most cases, though, an analysis of the wait events will be sufficient to troubleshoot the problem. A very useful piece of advice to this chapter’s author was not to get lost in the details too soon!

A wait event is actually a section of code that is timed and assigned a name. The pieces of code covered by these events are quite often discrete operating system calls, such as individual I/O requests, but some wait events cover considerably larger portions of code. The events may even include sections of code that contain other wait events. The naming of wait events has been somewhat inconsistent over the years, and many events have slightly misleading names. Even though some of the event names are acknowledged to be potentially misleading, Oracle has been understandably reluctant to change them. The Exadata platform has provided an excuse to rename some of the I/O-related events, and, as you will see shortly, the developers took the opportunity to do just that.

Wait events are externalized in many places in the database. The most common ways to view wait events are queries against V-dollar views, tracing, and SQL Monitor. Common V-dollar views used to view wait event information are V$SESSSION or V$SESSION_WAIT. These views show the current wait event for a session, but not the history of events. The introduction of Active Session History in 10g changed this for the better by sampling active sessions and recording relevant information for the performance analyst, but it requires an extra license. Always make sure that what you are doing is compliant with your licenses.

If you want to get all waits for a session, you have the option to enable SQL tracing. The well-established SQL traces allow you to record SQL statements issued against the database as well as all the associated wait events. The raw trace file, which is found in the diagnostic destination for the database, is then post-processed and transformed into something more human-readable.

An alternative to SQL tracing is the use of the SQL Monitor. Thankfully on the Exadata platform, you will not be in a position where this great tool is not technically unavailable (licensing again is a separate question). SQL Monitor was introduced in Oracle 11.1, which was the minimum supported version on Exadata. SQL Monitor allows you to peek into the execution of a single SQL statement in real time. During the SQL statement’s execution, you can see exactly where and in what row source of the SQL execution of a SQL the RDBMS engine is spending time, and you can see wait events, if applicable, as well. All of that is available even while the statement is still executing! To be able to make use of this great tool, you have to have the Diagnostic and Tuning Pack licensed for the database. The underlying technology enabling SQL Monitor again is Active Session History (ASH) and, as you just read, was introduced with Oracle 10g. ASH samples database activity every second and gathers information about active sessions, which it stores for about one hour with a one-second interval. After that, the information is aggregated and every tenth sample is preserved on disk in the SYSAUX tablespace. The persistent part is referred to as the Active Workload Repository (AWR). AWR resembles the STATSPACK tool only in that it keeps a record of database activity over time. The retention period for the information is user configurable. The amount of space it takes to store AWR information can be large, but you should not forget that more data allows you to make comparisons with past events a lot easier. As with everything, you need to find the balance between disk space usage and the advantage of comparing a perceived performance problem with past events. In my opinion, the default retention period of eight days is far too little and should be increased.

Events Specific to Exadata

There are actually no events that exist exclusively on the Exadata platform. Wait events are built into the database code. Since the compute nodes on Exadata run standard Oracle Database software, all the wait events that are used when invoking Exadata-specific features are available in databases that are running on non-Exadata platforms as well. But because the Exadata features are only available on the Exadata platform, no time is ever allocated to those events on other platforms. By way of proof, consider the following example, which compares the events from V$EVENT_NAME (which exposes the valid wait events) first on an Exadata Database Machine and then on a standard 12c Release 1 database on a non-Exadata platform:

SQL> select count(1) from v$event_name;

   COUNT(1)
-----------
       1650

SQL> select count(1) from v$event_name@lab12c;

   COUNT(1)
-----------
       1650

SQL> select name from v$event_name
  2  minus
  3  select name from v$event_name@lab12c;

no rows selected

So there are no differences in the events. This does make it a little difficult to come up with a list of “Exadata Only” events. The event names however are a good starting point.

The “cell” Events

The Oracle Exadata Storage Server Software User’s Guide 12c Release 1 provides a table of wait events. All of them start with the word “cell.” The manual lists ten such events. One of them (cell interconnect retransmit during physical read) still does not actually exist.

There is also another batch of event names with the word “cell” in their names that are not included in the documentation. Combining both the documented and undocumented events reveals the complete list of “cell” events. This list will be the starting point of this chapter. You can query V$EVENT_NAME for that list, and you will get the following results. Notice that most of the events are in one of the I/O classes. The number of wait events has not changed in 12.1.0.1.x; there are still the same 17 events as found in 11.2.0.2 used for this chapter in the first edition of the book.

SYS:db12c1> select name,wait_class
  2  from v$event_name
  3  where name like 'cell%'
  4  order by name;

NAME                                               WAIT_CLASS
-------------------------------------------------- --------------------
cell list of blocks physical read                  User I/O
cell manager cancel work request                   Other
cell manager closing cell                          System I/O
cell manager discovering disks                     System I/O
cell manager opening cell                          System I/O
cell multiblock physical read                      User I/O
cell single block physical read                    User I/O
cell smart file creation                           User I/O
cell smart flash unkeep                            Other
cell smart incremental backup                      System I/O
cell smart index scan                              User I/O
cell smart restore from backup                     System I/O
cell smart table scan                              User I/O
cell statistics gather                             User I/O
cell worker idle                                   Idle
cell worker online completion                      Other
cell worker retry                                  Other

17 rows selected.

Oracle 12.1.0.2 was the first release that added new cell-related wait events. These are as follows:

cell external table Smart Scan
cell list of blocks read request
cell multi-block read request
cell physical read no I/O
cell single block read request

The events are covered in the later section with one exception: The external table scan event belongs to a different Oracle product and will not be covered here.

The following sections will cover all of the relevant Exadata wait events, along with a few of additional events that have special applicability to Exadata.

Plan Steps That Trigger Events

First, though, it might be interesting to see what operations (plan steps) cause the “cell” wait events to occur. Here is a query against DBA_HIST_ACTIVE_SESS_HISTORY on an active production system running on an Exadata system that shows cell events and the operations that caused them. The list is not exhaustive, of course, not every event is visible on every system!

SQL> select event, operation,  count(*) from (
  2  select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
  3    from DBA_HIST_ACTIVE_SESS_HISTORY
  4    where event like 'cell %')
  5    group by operation, event
  6    order by 1,2,3
  7  /

EVENT                              OPERATION                                  COUNT(*)
---------------------------------- ---------------------------------------- ----------
cell list of blocks physical read                                                   62
                                   DDL STATEMENT                                     2
                                   INDEX FAST FULL SCAN                              1
                                   INDEX RANGE SCAN                               3060
                                   INDEX STORAGE FAST FULL SCAN                      7
                                   INDEX STORAGE SAMPLE FAST FULL SCAN              10
                                   INDEX UNIQUE SCAN                              1580
                                   INSERT STATEMENT                                  6
                                   TABLE ACCESS BY GLOBAL INDEX ROWID              151
                                   TABLE ACCESS BY INDEX ROWID                    5458
                                   TABLE ACCESS BY LOCAL INDEX ROWID               131
                                   TABLE ACCESS STORAGE FULL                       183
                                   TABLE ACCESS STORAGE SAMPLE                       2
                                   TABLE ACCESS STORAGE SAMPLE BY ROWID RAN          1
cell multiblock physical read                                                      3220
                                   DDL STATEMENT                                   157
                                   INDEX FAST FULL SCAN                             94
                                   INDEX RANGE SCAN                                  2
                                   INDEX STORAGE FAST FULL SCAN                   6334
                                   INDEX STORAGE SAMPLE FAST FULL SCAN             429

                                   UNIQUE SCAN                                       2
                                   VIEW ACCESS STORAGE FULL                        634
                                   MAT_VIEW ACCESS STORAGE SAMPLE                   56
                                   TABLE ACCESS BY GLOBAL INDEX ROWID                5
                                   TABLE ACCESS BY INDEX ROWID                     484
                                   TABLE ACCESS BY LOCAL INDEX ROWID                 3
                                   TABLE ACCESS STORAGE FULL                     41559
                                   TABLE ACCESS STORAGE SAMPLE                    1763
                                   TABLE ACCESS STORAGE SAMPLE BY ROWID RAN         78
                                   UPDATE                                            4
cell single block physical read                                                 181186
                                   BUFFER SORT                                       1
                                   CREATE TABLE STATEMENT                           67
                                   DDL STATEMENT                                   985
                                   DELETE                                        11204
                                   DELETE STATEMENT                                  6
                                   FIXED TABLE FIXED INDEX                         352
                                   FOR UPDATE                                       27
                                   HASH GROUP BY                                     3
                                   HASH JOIN                                        14
                                   HASH JOIN RIGHT OUTER                             1
                                   INDEX BUILD NON UNIQUE                           80
                                   INDEX BUILD UNIQUE                                6
                                   INDEX FAST FULL SCAN                              9
                                   INDEX FULL SCAN                                1101
                                   INDEX RANGE SCAN                              17597
                                   INDEX RANGE SCAN (MIN/MAX)                        1
                                   INDEX RANGE SCAN DESCENDING                       6
                                   INDEX SKIP SCAN                                 691
                                   INDEX STORAGE FAST FULL SCAN                    313
                                   INDEX STORAGE SAMPLE FAST FULL SCAN              72
                                   INDEX UNIQUE SCAN                             30901
                                   INSERT STATEMENT                               5174
                                   LOAD AS SELECT                                  120
                                   LOAD TABLE CONVENTIONAL                        5827
                                   MAT_VIEW ACCESS STORAGE FULL                      3
                                   MAT_VIEW ACCESS STORAGE SAMPLE                    1
                                   MERGE                                            12
                                   PX COORDINATOR                                    1
                                   SELECT STATEMENT                                978
                                   SORT CREATE INDEX                                 1
                                   SORT GROUP BY                                     1
                                   SORT JOIN                                         5
                                   SORT ORDER BY                                     2
                                   TABLE ACCESS BY GLOBAL INDEX ROWID             5812
                                   TABLE ACCESS BY INDEX ROWID                   65799
                                   TABLE ACCESS BY LOCAL INDEX ROWID              4591
                                   TABLE ACCESS BY USER ROWID                      464
                                   TABLE ACCESS CLUSTER                             57
                                   TABLE ACCESS STORAGE FULL                      7168
                                   TABLE ACCESS STORAGE SAMPLE                     205
                                   TABLE ACCESS STORAGE SAMPLE BY ROWID RAN         24
                                   UNION-ALL                                         7
                                   UPDATE                                        89353
                                   UPDATE STATEMENT                                367
                                   WINDOW CHILD PUSHED RANK                          2
                                   WINDOW SORT                                       1
                                   WINDOW SORT PUSHED RANK                           1
cell smart file creation                                                            35
                                   DELETE                                            3
                                   INDEX BUILD NON UNIQUE                            5
                                   LOAD AS SELECT                                    3
                                   LOAD TABLE CONVENTIONAL                           1
                                   UPDATE                                            1
cell smart incremental backup                                                      714
cell smart index scan                                                               14
                                   INDEX STORAGE FAST FULL SCAN                     42
                                   INDEX STORAGE SAMPLE FAST FULL SCAN              32
cell smart table scan                                                              163
                                   MAT_VIEW ACCESS STORAGE FULL                      1
                                   TABLE ACCESS STORAGE FULL                     12504

Again, this output does not show all the possible combinations, but it should give you an idea of the relative frequency of events and which operations generally motivate them.

Exadata Wait Events in the User I/O Class

The User I/O Class is far and away the most important for Exadata. The most interesting events in this category are, of course, the two Smart Scan events (cell smart table scan and cell smart index scan). These are the events that record time for the primary query offloading optimizations provided by Exadata, which primarily include predicate filtering, column projection, and storage index usage. The User I/O Class also contains three events described as physical I/O events. These three events actually measure time for physical I/O using the more familiar multi-block and single-block read mechanisms you are used to seeing on non-Exadata platforms, although their names have been changed to something a little more meaningful. Finally, there are two events that do not really seem to belong in the User I/O category at all. One has to do with initialization of blocks when file space is allocated. The other has to do with gathering statistical information from the storage cells. Oracle 12.1.2.1 and database 12.1.0.2 introduced three new minor cell events as well related to user I/O that did not exist in 12.1.0.1 or 11.2.0.3. Each of these wait events are covered in turn in the next several sections, starting with the Smart Scan events.

cell smart table scan

The cell smart table scan event is what Oracle uses to account for time spent waiting for full table scans that are offloaded. It is the most important event on the Exadata platform for reporting workloads. Its presence or absence can be used to verify whether a statement benefited from offloading or not. As discussed in Chapter 2, offloading only occurs when Oracle is able to do direct path reads. Consequently, this event replaces the direct path read event in most cases on Exadata. As with non-Exadata direct path reads, data is returned directly to the PGA of the requesting process on the database server (either the user’s shadow process or a parallel slave process). Blocks are not returned to the buffer cache.

Event Meaning

Although the mechanism for performing reads across the InfiniBand network is very different from that for normal reads on non-Exadata platforms, the code path driving the Smart Scans is actually very similar to a direct path read on a non-Exadata platform. The main difference lies in the fact that each request to a storage cell contains a reference to the metadata of the statement, which in the case of Exadata includes the predicates and the list of columns to be returned, among others. Since the storage cells have access to this information, they can apply the filters and do the column projection before returning the data to the requesting process. These optimizations are applied to each set of blocks as they are requested. The processes on the database servers that request the data have access to the ASM extent map and can, therefore, request the required allocation units (AUs) from each storage cell. The storage cells read the requested AU in the absence of storage index segments and apply the predicate filters, among other tasks. If any rows satisfy the filters, the cells return the projected columns to the requesting process. The process then requests the next AU and the entire routine is repeated until all the data has been scanned. So this event will occur repeatedly in a large scan. It is important that, unlike some other I/O-related events, a single Smart Scan event cannot be used to derive I/O performance. You either use OEM 12c for that or the cellcli command to access metric information on the cells. We cover this in Chapter 12 in more detail.

Note Column Projection is one of the major optimizations provided by Smart Scans. The feature is slightly misunderstood. It does not pass only columns in the select list back to the database servers; it also passes back some columns from the WHERE clause. Older versions of cellsrv passed all columns specified in a WHERE clause back to the database tier. Later versions have corrected this behavior to include only columns that are involved in joining predicates. You can see the columns projected in the output from DBMS_XPLAN.DISPLAY_CURSOR, for example.

As all wait events, the cell smart table scan will be recorded in a SQL trace. But with Exadata, you might want to investigate further, since a trace file is not limited to contain information for a single event you set. It may not immediately be obvious that you can combine the output of multiple events in a single trace file. For Smart Scans, you could include traces for LIBCELL’s client library (mainly useful for inter-system communication), the Exadata Smart Scan layer, and finally for SQL Trace to get a combined picture of everything that is happening. Consider the following snippet to enable lots of tracing. You would not normally need to trace that much information, but it is extremely useful for research! Needless to say, you do not enable these events in production. This is for a development system only, outside of production and disaster recovery environments. The amount of information in the trace can be massive, filling up the /u01 mount point and effectively causing an outage.

SQL> select value from v$diag_info where name like 'Default%'

VALUE
----------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/db12c/db12c1/trace/db12c1_ora_11916.trc

SQL> alter session set events 'trace[LIBCELL.Client_Library.*] disk=highest';

Session altered.

SQL> alter session set events 'sql_trace level 8';

Session altered.

SQL> alter session set events 'trace[KXD.*] disk=highest';

Session altered.

SQL> select count(*) from t1;

  COUNT(*)
----------
  33554432

Oracle has been so kind as to document many of its traceable components, so you can actually see the code prefixes and map them to their respective code layer. The command to get the magic documentation is oradebug. Use oradebug doc event name and oradebug doc component to get more information about what can be traced, as in this example for LIBCELL.

SQL> oradebug doc component libcell

Components in library LIBCELL:
--------------------------
  Client_Library               Client Library
    Disk_Layer                 Disk Layer
    Network_Layer              Network Layer
    IPC_Layer                  IPC Layer

The asterisk in the above code example includes every sublayer without having to explicitly specify it. Be warned that the resulting trace will most likely be huge and can cause space problems in your database mount point. But let’s not digress—back to the SQL Trace. When tracing SQL alone, the trace file shows the following lines:

=====================
  PARSING IN CURSOR #140...784 len=23 dep=0 uid=198 oct=3 lid=198 tim=3471433479182
hv=4235652837 ad='808033f0' sqlid='5bc0v4my7dvr5'
select count(*) from t1
END OF STMT
PARSE #140...784:c=0,e=138,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3724264953,tim=3471433479181
EXEC #140...784:c=0,e=51,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3724264953,tim=3471433479280
WAIT #140...784: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0
  obj#=-1 tim=3471433479330
WAIT #140...784: nam='reliable message' ela= 968 channel context=10126085744 channel
  handle=10164814784 broadcast message=10178993320 obj#=-1 tim=3471433480522
WAIT #140...784: nam='enq: KO - fast object checkpoint' ela= 194 name|mode=1263468550 2=65606
  0=1 obj#=-1 tim=3471433480789
WAIT #140...784: nam='enq: KO - fast object checkpoint' ela= 125 name|mode=1263468545 2=65606
 0=2 obj#=-1 tim=3471433480986
WAIT #140...784: nam='Disk file operations I/O' ela= 6 FileOperation=2 fileno=7 filetype=2
  obj#=-1 tim=3471433481040
WAIT #140...784: nam='cell smart table scan' ela= 145 cellhash#=3249924569 p2=0 p3=0 obj#=61471
  tim=3471433501731
WAIT #140...784: nam='cell smart table scan' ela= 149 cellhash#=674246789 p2=0 p3=0 obj#=61471
  tim=3471433511233
WAIT #140...784: nam='cell smart table scan' ela= 143 cellhash#=822451848 p2=0 p3=0 obj#=61471
  tim=3471433516295
WAIT #140...784: nam='cell smart table scan' ela= 244 cellhash#=3249924569 p2=0 p3=0 obj#=61471
  tim=3471433561706
WAIT #140...784: nam='cell smart table scan' ela= 399 cellhash#=674246789 p2=0 p3=0
...

Unfortunately, Oracle changed the format of the cursor names somewhere in the life cycle in Oracle 11g. This would have caused the trace lines to wrap around in the output above. This is why the full cursor identifier has been truncated.

This portion of the trace file also shows the enq: KO - fast object checkpoint event, which is used to ensure that any dirty blocks for the scanned object are flushed to disk prior to beginning the scan. By the way, the direct path read event is not completely eliminated on Exadata platforms. In fact, it is possible to use a hint to disable offloading and see how the same statement behaves without offloading:

=====================
PARSING IN CURSOR #140...520 len=75 dep=0 uid=198 oct=3 lid=198 tim=3471777663904 hv=2068810426 ad='9114cd20' sqlid='44xptp5xnz2pu'
select /*+ opt_param('cell_offload_processing','false') */
count(*) from t1
END OF STMT
PARSE #140...520:c=2000,e=1381,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3724264953,
  tim=3471777663903
EXEC #140...520:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3724264953,tim=3471777663997
WAIT #140...520: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0
  obj#=61471 tim=3471777664057
WAIT #140...520: nam='enq: KO - fast object checkpoint' ela= 228 name|mode=1263468550 2=65677
  0=2 obj#=61471 tim=3471777664460
WAIT #140...520: nam='reliable message' ela= 1036 channel context=10126085744 channel
  handle=10164814784 broadcast message=10179010104 obj#=61471 tim=3471777665585
WAIT #140...520: nam='enq: KO - fast object checkpoint' ela= 134 name|mode=1263468550 2=65677
  0=1 obj#=61471 tim=3471777665771
WAIT #140...520: nam='enq: KO - fast object checkpoint' ela= 126 name|mode=1263468545 2=65677
  0=2 obj#=61471 tim=3471777665950
WAIT #140...520: nam='direct path read' ela= 807 file number=7 first dba=43778051 block cnt=13
  obj#=61471 tim=3471777667104
WAIT #140...520: nam='direct path read' ela= 714 file number=7 first dba=43778081 block cnt=15
  obj#=61471 tim=3471777668112
WAIT #140...520: nam='direct path read' ela= 83 file number=7 first dba=43778097 block cnt=15
  obj#=61471 tim=3471777668322
WAIT #140...520: nam='direct path read' ela= 617 file number=7 first dba=43778113 block cnt=15
  obj#=61471 tim=3471777669050
WAIT #140...520: nam='direct path read' ela= 389 file number=7 first dba=43778129 block cnt=15
  obj#=61471 tim=3471777669549

Note that we still have the enq: KO – fast object checkpoint events for flushing dirty blocks. So, it is clear that the cell smart table scan event replaces this event.

Parameters

The parameters for this event are not particularly informative. Only the object ID of the table being scanned and the cell hash number are provided:

P1 - Cell hash number
P2 - Not used
P3 - Not used
obj# - The data object ID of the table segment being scanned

You should use the obj# reported in the trace to look up the table name in DBA_OBJECTS. Remember that Smart Scans operate on the segment level—you will need to query for the data_object_id rather than the object_id. You will notice that the direct path read event (which cell smart table scan replaces) provides additional information including the file number, the offset into the file (first dba), and the number of contiguous blocks read (block cnt). On the other hand, with the direct path read event, there is no indication how the read requests are routed to the individual cells.

The cell hash number reported with many of the wait events can be found in the V$CELL view. This view has only two columns, CELL_PATH and CELL_HASHVAL. The CELL_PATH column actually contains the IP address of the storage cell. If you trace using the KXD and LIBCELL components, you will see a lot more information prior and post each wait event. The additional trace information is well suited to give you extremely in-depth insights in the Smart Scan processing. A full explanation of these does not fit the scope of the chapter; please refer back to Chapter 2.

Unlike some of the non-Exadata I/O events, the Smart Scan event is not really suited to time I/O completion. Remember from Chapter 2 that Smart Scans are fired off asynchronously against various threads in the cell—each is a separate logical unit of work. The Oracle software may read the results of these I/O events out of their order of execution. This, as well as the complicated nature of Smart Scan processing, makes it impossible to derive I/O latencies using this event.

cell smart index scan

Time is clocked to the cell smart index scan event when fast full index scans are performed that are offloaded. This event is analogous to cell smart table scan, except that the object being scanned is an index. This index access path is not to be confused with any of the other index access paths available, such as index unique, range, or full scan. The latter indicate single block I/O calls and by definition cannot be offloaded. The cell smart index scan wait replaces the direct path read event and returns data directly to the PGA of the requesting process as opposed to the buffer cache.

Event Meaning

This event does not show up very often on the systems we have observed, probably for several reasons:

Exadata is quite good at doing full table scans, so the tendency is to eliminate a lot of indexes when moving to the platform. This can include making indexes invisible for the purpose of query optimization.
Direct path reads are not done as often on index scans as they are on table scans. One of the important changes to Oracle 11.2 is the aggressiveness with which it does direct path reads on serial table scans. This enhancement was probably pushed forward specifically in order to allow Exadata to do more smart full table scans, but, regardless,, without this feature, only parallel table scans would be able to take advantage of Smart Scans. The same enhancement applies to index fast full scans. That is, they can also be done via serial direct path reads. However, the algorithm controlling when they happen appears to be less likely to use this technique with indexes (probably because the indexes can be much smaller than tables).

In addition, only fast full scans of indexes are eligible for Smart Scans (range scans and full scans are not eligible). As a result of these issues, cell smart index scans are fairly rare compared to cell smart table scans. It is, of course, possible to encourage the feature with hints (such as parallel_index) or by decorating specific indexes with a parallel degree setting of greater than 1. Here is an excerpt from a 10046 trace file showing the event:

=====================
PARSING IN CURSOR #140694139370464 len=100 dep=1 uid=198 oct=3 lid=198 tim=3551928848463
  hv=3104270335 ad='85f450b8' sqlid='fg97a72whftzz'
select /*+ monitor gather_plan_statistics parallel_index(t1) */
       /* ffs_test_001 */
      count(id) from t1
END OF STMT
PARSE #140694139370464:c=0,e=114,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3371189879,
  tim=3551928848462
WAIT #140694139370464: nam='cell smart index scan' ela= 173 cellhash#=674246789 p2=0 p3=0
  obj#=78100 tim=3551928872228
WAIT #140694139370464: nam='cell smart index scan' ela= 127 cellhash#=674246789 p2=0 p3=0
  obj#=78100 tim=3551928872442
WAIT #140694139370464: nam='cell smart index scan' ela= 8888 cellhash#=674246789 p2=0 p3=0
  obj#=78100 tim=3551928881675
WAIT #140694139370464: nam='cell smart index scan' ela= 142 cellhash#=822451848 p2=0 p3=0
  obj#=78100 tim=3551928900699
WAIT #140694139370464: nam='cell smart index scan' ela= 400 cellhash#=822451848 p2=0 p3=0
  obj#=78100 tim=3551928901202

Note that this trace file was produced by one of the parallel slave processes and not the requesting process. Some of the PX messages have been removed for clarity. The trace produced for the same statement when offloading is disabled should look more familiar. Here is an excerpt, again from a query slave:

=====================
PARSING IN CURSOR #13...92 len=158 dep=1 uid=198 oct=3 lid=198 tim=3553633447559 hv=3263861856 ad='9180bab0' sqlid='35yb4ug18p530'
select /*+ opt_param('cell_offload_processing','false') monitor
           gather_plan_statistics parallel_index(t1) */
       /* ffs_test_002 */
      count(id) from t1
END OF STMT
PARSE #13...92:c=0,e=152,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3371189879,tim=3553633447557
WAIT #13...92: nam='PX Deq: Execution Msg' ela= 9671 sleeptime/senderid=268566527 passes=1
  p3=9216340064 obj#=-1 tim=3553633458202
WAIT #13...92: nam='direct path read' ela= 5827 file number=7 first dba=39308443 block cnt=128 obj#=78100 tim=3553633464791
WAIT #13...92: nam='PX Deq: Execution Msg' ela= 62 sleeptime/senderid=268566527 passes=1
  p3=9216340064 obj#=78100 tim=3553633481394
WAIT #13...92: nam='direct path read' ela= 2748 file number=7 first dba=39370251 block cnt=128
  obj#=78100 tim=3553633484390
WAIT #13...92: nam='PX Deq: Execution Msg' ela= 64 sleeptime/senderid=268566527 passes=1
  p3=9216340064 obj#=78100 tim=3553633497671
WAIT #13...92: nam='direct path read' ela= 4665 file number=7 first dba=39357143 block cnt=128
  obj#=78100 tim=3553633502565
WAIT #13...92: nam='PX Deq: Execution Msg' ela= 70 sleeptime/senderid=268566527 passes=1
  p3=9216340064 obj#=78100 tim=3553633518439
WAIT #13...92: nam='direct path read' ela= 3001 file number=7 first dba=39337739 block cnt=128
  obj#=78100 tim=3553633521674

Compare this to the previous example where the cells offloaded the scan. In that example the cell smart index scan event replaces the direct path read. The enq: KO – fast object checkpoint events for flushing dirty blocks prior to starting the direct path reads are still present. However, they are not shown in this excerpt because they occur in the query coordinator process, not in the parallel slave processes.

Parameters

Just as with the cell smart table scan event, the parameters for cell smart index scan do not contain a lot of details. The cell hash number and the object ID of the segment being scanned are the only information provided:

P1 - Cell hash number
P2 - Not used
P3 - Not used
obj# - The object number of the index being scanned

cell single block physical read

This event is equivalent to the db file sequential read event used on non-Exadata platforms. Single block reads are used most often for index access paths (both the index block reads and the table block reads via rowids from the index lookups). They can also be used for a wide variety of other operations where it makes sense to read a single block.

Event Meaning

Here is the output of a query that shows the operations that resulted in the cell single block physical read wait event on an active production system:

SQL> select event, operation,  count(*) from (
  2  select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
  3    from DBA_HIST_ACTIVE_SESS_HISTORY
  4    where event like 'cell single%')
  5    group by operation, event
  6    order by 1,2,3
  7  /

EVENT                             OPERATION                                  COUNT(*)
--------------------------------- ---------------------------------------- ----------
cell single block physical read                                                 13321
                                  CREATE TABLE STATEMENT                           35
                                  DDL STATEMENT                                   118
                                  DELETE                                          269
                                  FIXED TABLE FIXED INDEX                           3
                                  FOR UPDATE                                        2
                                  HASH JOIN                                         4
                                  HASH JOIN RIGHT OUTER                             8
                                  INDEX FULL SCAN                                9283
                                  INDEX FULL SCAN (MIN/MAX)                         1
                                  INDEX RANGE SCAN                               2763
                                  INDEX STORAGE FAST FULL SCAN                      6
                                  INDEX STORAGE SAMPLE FAST FULL SCAN              13
                                  INDEX UNIQUE SCAN                              1676
                                  INSERT STATEMENT                               1181
                                  LOAD AS SELECT                                    6
                                  LOAD TABLE CONVENTIONAL                          92
                                  MERGE                                           106
                                  SELECT STATEMENT                                 41
                                  SORT ORDER BY                                     6
                                  TABLE ACCESS BY GLOBAL INDEX ROWID            10638
                                  TABLE ACCESS BY INDEX ROWID                    8714
                                  TABLE ACCESS BY LOCAL INDEX ROWID             10446
                                  TABLE ACCESS CLUSTER                             12
                                  TABLE ACCESS STORAGE FULL                       776
                                  TABLE ACCESS STORAGE SAMPLE                      40
                                  UPDATE                                         8116

As you can see, row access via an index is the most common operation that generates this event. You should also be aware that Exadata provides a large amount of Flash Cache on each storage cell. For that reason, physical reads (both multi-block and single-block) are considerably faster than on most disk-based storage systems, even without offloading. Here is an excerpt from an AWR report showing a histogram of single-block reads for the instance:

                                 % of Waits
                                 -----------------------------------------
                           Total
Event                      Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s  >1s
-------------------------- ----- ---- ---- ---- ---- ----- ----- ---- ----
cell single block physical 2940K 94.4  3.2   .3   .6    .9    .5   .2   .0

Notice that about 95 percent of the cell single block physical read events take less than 1ms. This is fairly representative of several production systems that we have observed.

Parameters

The cell single block physical read event provides more information than most cell events. The parameters allow you to tell exactly which object was read along with providing the disk and cell where the block was stored:

P1 - Cell hash number
P2 - Disk hash number
P3 - Total bytes passed during read operation (always 8192 assuming 8K block size)
obj# - The object number of the object being read

cell multiblock physical read

This is another renamed event. It is equivalent to the less clearly named db file scattered read event. On non-Exadata platforms, Oracle Database 11gR2 and 12c still uses the db file scattered read event whenever it issues a contiguous multi-block buffered read to the operating system. The “scattered” in the old event name reflects how the data read from disk is stored in the buffer cache: scattered around. It is not a reflection of how data is read from disk, which is actually contiguous.

Event Meaning

This event is generally used with full table scans and fast full index scans, although it can be used with many other operations. The new name on the Exadata platform is much more descriptive than the older name. For reporting workloads, this wait event is not nearly as prevalent on Exadata platforms as on non-Exadata platforms, because Exadata handles many full scan operations with Smart Scans that have their own wait events (cell smart table scan and cell smart index scan). The cell multiblock physical read event on Exadata platforms is used for serial full scan operations on tables that are below the threshold for serial direct path reads. That is to say, you will see this event used most often on full scans of relatively small tables. On the other hand, if your queries are not offloaded (caused by a very large buffer cache, for instance) you will see lots of these events as well. The threshold before a Smart Scan or direct path read is initiated is very high in these cases.

The event is also used for fast full index scans that are not executed with direct path reads. Here is the output of a query that shows the operations that resulted in the cell multiblock physical read wait event on an active production system:

EVENT                            OPERATION                                  COUNT(*)
-------------------------------- ---------------------------------------- ----------
cell multiblock physical read                                                    764
                                 DDL STATEMENT                                    28
                                 INDEX FAST FULL SCAN                              2
                                 INDEX STORAGE FAST FULL SCAN                    657
                                 INDEX STORAGE SAMPLE FAST FULL SCAN             133
                                 TABLE ACCESS BY INDEX ROWID                      74
                                 TABLE ACCESS BY LOCAL INDEX ROWID              1428
                                 TABLE ACCESS STORAGE FULL                      5046
                                 TABLE ACCESS STORAGE SAMPLE                     916

Parameters

The cell multiblock physical read event also provides more information than most cell events. The parameters in the following list allow you to tell which object was read and identifies the disk and cell where the blocks were stored. The total bytes passed should be a multiple of the block size:

P1 - Cell hash number
P2 - Disk hash number
P3 - Total bytes passed during read operation
obj# - The object number of the object being read

cell list of blocks physical read

This event is a replacement for the db file parallel read event on non-Exadata platforms. It appears that the developers took the opportunity to rename some of the events that are related to disk operations, and this is one of those events. The new name is actually much more descriptive than the previous name, since the wait event has nothing whatsoever to do with parallel query or parallel DML.

Event Meaning

This event is used for a multi-block read of non-contiguous blocks. This is more effective with asynchronous I/O, which is enabled on Exadata by default. This event can be provoked by several operations. The most common are index range scans, index unique scans, and table access by index rowid. The most common reason for the event is index pre-fetching. Here is the output of a query that shows the operations that resulted in this wait event on an Exadata system:

SQL> select event, operation,  count(*) from (
  2  select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
  3    from DBA_HIST_ACTIVE_SESS_HISTORY
  4    where event like 'cell list%')
  5    group by operation, event
  6    order by 1,2,3
  7  /

EVENT                                    OPERATION                                COUNT(*)
---------------------------------------- ----------------------------------- -------------
cell list of blocks physical read                                                        2
                                         INDEX RANGE SCAN                              156
                                         INDEX STORAGE FAST FULL SCAN                    1
                                         INDEX UNIQUE SCAN                              66
                                         TABLE ACCESS BY GLOBAL INDEX ROWID             90
                                         TABLE ACCESS BY INDEX ROWID                  1273
                                         TABLE ACCESS BY LOCAL INDEX ROWID            2593
                                         TABLE ACCESS STORAGE FULL                      20
                                         TABLE ACCESS STORAGE SAMPLE                     1

As you can see, the vast majority of these events were motivated by index access paths. By the way, on non-Exadata platforms, noncontiguous multi-block reads still clock time to the old db file parallel read event. It is also possible for this older wait event name to show up on an Exadata platform for some operations.

Parameters

The cell list of blocks physical read event provides more information than most cell events. The following parameters allow you to tell exactly which object was read along with identifying the disk and cell where the block was stored:

P1 - Cell hash number
P2 - Disk hash number
P3 - Number of blocks read
obj# - The object number of the object being read

cell smart file creation

Exadata has an optimization technique that allows the storage cells to do the initialization of blocks when a data file is created or extended. This occurs when a tablespace is created or a data file is manually added to a tablespace. However, it can also occur when a data file is automatically extended during DML operations.

Event Meaning

You have previously read in this chapter that this event seemed out of place in the User I/O class. However, if it occurs because of DML operations, it makes sense to have it in this category. At any rate, offloading the block formatting eliminates CPU usage and I/O from the database servers and moves it to the storage tier. When this occurs, time is collected in the cell smart file creation event. This event replaces the Data file init write event that is still used on non-Exadata platforms. Here is the output of a query from a busy production system showing operations that generated this event:

SYS@EXDB1> select event, operation,  count(*) from (
  2  select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
  3    from DBA_HIST_ACTIVE_SESS_HISTORY
  4    where event like 'cell smart file%')
  5    group by operation, event
  6    order by 1,2,3
  7  /

EVENT                      OPERATION                 COUNT(*)
-------------------------- ------------------------- --------
cell smart file creation                                   35
                           DELETE                           3
                           INDEX BUILD NON UNIQUE           5
                           LOAD AS SELECT                   3
                           LOAD TABLE CONVENTIONAL          1
                           UPDATE                           1

You will notice that on this particular system, the cell smart file creation event was occasionally generated by a DELETE statement. The fact that a DELETE could cause this event might be a little surprising. But remember that this event is actually timing a section of code that does block formatting, not file creation.

Parameters

The only parameter of interest for this event is P1, which shows which cell was being accessed when this event was generated:

P1 - Cell hash number
P2 - Not used
P3 - Not used

cell statistics gather

The cell statistics gather event records time spent reading from various V$ and X$ tables. Although the event is grouped in the User I/O category, it does not refer to I/O in the sense of reading and writing to and from disk.

Event Meaning

Time is clocked to this event when a session is reading from the V$CELL family of views and a few other X$ tables in the same category. The event is miscategorized, in our opinion, and does not really belong in the User I/O category. Here is a typical example:

PARSING IN CURSOR #140003020408112 len=79 dep=0 uid=0 oct=3 lid=0 tim=4163271218132 hv=2136254865 ad='a2c23650' sqlid='fcjwbgdzp9acj'
select count(cell_name),cell_name from V$CELL_THREAD_HISTORY group by cell_name
END OF STMT
PARSE #14...12:c=17997,e=23189,p=5,cr=99,cu=0,mis=1,r=0,dep=0,og=1,plh=3108513074,tim=4
WAIT #14...12: nam='Disk file operations I/O' ela= 34 FileOperation=8 fileno=0
  filetype=8 obj#=262 tim=4163271218207
EXEC #14...12:c=0,e=54,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3108513074,tim=4...
WAIT #14...12: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0
   obj#=262 tim=4163271218304
WAIT #14...12: nam='cell statistics gather' ela= 162 cellhash#=0 p2=0 p3=0 obj#=262 tim=4163
WAIT #14...12: nam='cell statistics gather' ela= 248 cellhash#=0 p2=0 p3=0 obj#=262 tim=4163
WAIT #14...12: nam='cell statistics gather' ela= 236 cellhash#=0 p2=0 p3=0 obj#=262 tim=4163
WAIT #14...12: nam='cell statistics gather' ela= 250 cellhash#=0 p2=0 p3=0 obj#=262 tim=4163

Note You can read more about the V$CELL% family of views in Chapter 11.

Parameters

The parameters for this event provide no additional information. In fact, values are not even set for the parameters in this event. Here are the parameter definitions:

P1 - Cell hash number (always 0)
P2 - Not used
P3 - Not used

Minor Events in the User/IO Class

As you read in the introduction, Oracle 12.1.0.2 is the first release in a long time that introduces new Exadata-related wait events. To recap, here are the events again:

cell list of blocks read request
cell multi-block read request
cell physical read no I/O
cell single-block read request

Although great care has been taken when preparing this chapter, we could not find a way to generate waits for them with two exceptions: cell single block read request and cell multi-block read request have been found as part of the execution of DBMS_STATS.GATHER_SCHEMA_STATS. The corresponding SQL_IDs were a sure sign that the events have been triggered as part of the stats gathering job. The TOP_LEVEL_SQL_ID could be used to confirm this.

Exadata Wait Events in the System I/O Class

The Exadata wait events that are assigned to the System I/O class are of less importance and do not generally show up as major consumers of time. The backup events are the most interesting as they record time for sections of code that have been optimized on the Exadata platform. The others are simply housekeeping events. The non-backup events are listed in Table 10-1, while the backup events are detailed in the following sections.

Table 10-1. Miscellaneous System I/O Class Events

Event	Description
cell manager closing cell	This is a shutdown-related event. The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused.
cell manager discovering disks	This is a startup-related event. The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused.
cell manager opening cell	This is a startup-related event. The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused.

cell smart incremental backup

This event is used to measure time spent waiting on RMAN when doing an incremental level 1 backup. Exadata optimizes incremental backups by offloading much of the processing to the storage tier. This new wait event was added to account for time spent waiting on the optimized incremental backup processing that is offloaded to the storage cells.

The processing for incremental backups in its unoptimized form is quite resource intensive. The backup process needs to capture every change to the data files after the previous incremental backup. A lot of I/O is required to preform this task as the data files have to be scanned from start to finish. To reduce the time it takes to do a level 1 backup, Oracle introduced block change tracking long before Exadata. Block change tracking is implemented using a small binary file that is updated when redo is generated. The introduction of the block-change-tracking file sped up incremental backups many times.

Event Meaning

Interestingly, an incremental level 0 backup does not result in this wait event written to a trace file even though the word “incremental” is in the RMAN command. That is because the Level 0 backup—despite the name—does not take an incremental backup at all. It generates a full backup that is flagged as a baseline for future incremental backups. Here is an excerpt from a 10046 trace file generated on a process that was creating an incremental level 1 backup, this time on 11.2.0.4 and block change tracking enabled:

*** 2014-07-06 15:23:11.171
*** SESSION ID:(200.471) 2014-07-06 15:23:11.171
*** CLIENT ID:() 2014-07-06 15:23:11.171
*** SERVICE NAME:(SYS$USERS) 2014-07-06 15:23:11.171
*** MODULE NAME:(backup incr datafile) 2014-07-06 15:23:11.171
*** ACTION NAME:(0000048 STARTED16) 2014-07-06 15:23:11.171

WAIT #0: nam='enq: TC - contention' ela= 30442809 name|mode=1413677062 checkpoint ID=65586 0=0
  obj#=-1 tim=1404678191171373
WAIT #0: nam='enq: CF - contention' ela= 223 name|mode=1128660997 0=0 operation=0 obj#=-1
  tim=1404678191172055
...
WAIT #0: nam='change tracking file synchronous read' ela= 523 block#=9344 blocks=64 p3=0
  obj#=-1 tim=1404678192231084
WAIT #0: nam='change tracking file synchronous read' ela= 523 block#=9472 blocks=64 p3=0
  obj#=-1 tim=1404678192231665
WAIT #0: nam='change tracking file synchronous read' ela= 501 block#=9920 blocks=64 p3=0
  obj#=-1 tim=1404678192232261
WAIT #0: nam='cell smart incremental backup' ela= 150 cellhash#=379339958 p2=0 p3=0
  obj#=-1 tim=1404678192234400
WAIT #0: nam='cell smart incremental backup' ela= 170 cellhash#=2133459483 p2=0 p3=0
  obj#=-1 tim=1404678192235378
WAIT #0: nam='cell smart incremental backup' ela= 160 cellhash#=3176594409 p2=0 p3=0
  obj#=-1 tim=1404678192237663
WAIT #0: nam='KSV master wait' ela= 70 p1=0 p2=0 p3=0 obj#=-1 tim=1404678192243295
WAIT #0: nam='KSV master wait' ela= 1438 p1=0 p2=0 p3=0 obj#=-1 tim=1404678192244762
WAIT #0: nam='ASM file metadata operation' ela= 57 msgop=33 locn=0 p3=0
  obj#=-1 tim=1404678192244786
WAIT #0: nam='KSV master wait' ela= 63 p1=0 p2=0 p3=0 obj#=-1 tim=1404678192251632
WAIT #0: nam='KSV master wait' ela= 1340 p1=0 p2=0 p3=0 obj#=-1 tim=1404678192253010
WAIT #0: nam='ASM file metadata operation' ela= 62 msgop=33 locn=0 p3=0
  obj#=-1 tim=1404678192253032
WAIT #0: nam='cell smart incremental backup' ela= 198 cellhash#=379339958 p2=0 p3=0
  obj#=-1 tim=1404678192255205
WAIT #0: nam='cell smart incremental backup' ela= 214 cellhash#=2133459483 p2=0 p3=0
  obj#=-1 tim=1404678192255495
WAIT #0: nam='cell smart incremental backup' ela= 212 cellhash#=3176594409 p2=0 p3=0
  obj#=-1 tim=1404678192255802
WAIT #0: nam='cell smart incremental backup' ela= 10 cellhash#=379339958 p2=0 p3=0
  obj#=-1 tim=1404678192256723
WAIT #0: nam='cell smart incremental backup' ela= 100 cellhash#=2133459483 p2=0 p3=0
  obj#=-1 tim=1404678192256846

You can see the effect of the block-change-tracking file in the change tracking file synchronous read. The actual backup work—scanning the data files for changes since the last backup—is performed on the cells as visible in the cell smart incremental backup event. You can gauge the effectiveness of the offloading to the cells in the v$backup_datafile view. Use the blocks_skipped_in_cell column and put it in relation to the blocks_read.

Parameters

The only parameter used for this event is P1, which shows which cell was responsible for generating the event:

P1 - Cell hash number
P2 - Not used
P3 - Not used
obj# - Not used

Note The obj# field is a part of many wait events, even some that are not specifically related to an individual object. Be aware that, in some cases, the value may be set by one event and then not cleared appropriately when the wait ends, resulting in meaningless values left in place for the next wait event. In the previous example, the obj# was cleared (set to a value of -1).

cell smart restore from backup

This event is used to measure time spent waiting on RMAN when doing a restore. Exadata optimizes RMAN restores by offloading processing to the storage cells.

Event Meaning

The event actually records time related to file initialization during a restore. Here’s an excerpt from a 10046 trace file taken while a restore was in progress:

WAIT #0: nam='KSV master wait' ela= 68 p1=0 p2=0 p3=0 obj#=-1 tim=1404740066131406
WAIT #0: nam='RMAN backup & recovery I/O' ela= 3320 count=1 intr=256 timeout=4294967295
 obj#=-1 tim=1404740066135786
WAIT #0: nam='RMAN backup & recovery I/O' ela= 3160 count=1 intr=256 timeout=4294967295
  obj#=-1 tim=1404740066139862
WAIT #0: nam='RMAN backup & recovery I/O' ela= 3201 count=1 intr=256 timeout=4294967295
  obj#=-1 tim=1404740066143790
WAIT #0: nam='RMAN backup & recovery I/O' ela= 2 count=1 intr=256 timeout=2147483647
  obj#=-1 tim=1404740066143822
WAIT #0: nam='cell smart restore from backup' ela= 853 cellhash#=2133459483 p2=0 p3=0
  obj#=-1 tim=1404740066146969
WAIT #0: nam='cell smart restore from backup' ela= 137 cellhash#=379339958 p2=0 p3=0
  obj#=-1 tim=1404740066148003
WAIT #0: nam='cell smart restore from backup' ela= 167 cellhash#=3176594409 p2=0 p3=0
  obj#=-1 tim=1404740066148948
WAIT #0: nam='cell smart restore from backup' ela= 269 cellhash#=2133459483 p2=0 p3=0
  obj#=-1 tim=1404740066149335
WAIT #0: nam='cell smart restore from backup' ela= 261 cellhash#=379339958 p2=0 p3=0
  obj#=-1 tim=1404740066149767
WAIT #0: nam='cell smart restore from backup' ela= 375 cellhash#=3176594409 p2=0 p3=0
  obj#=-1 tim=1404740066150294
WAIT #0: nam='cell smart restore from backup' ela= 124 cellhash#=3176594409 p2=0 p3=0
  obj#=-1 tim=1404740066152236

Parameters

The only parameter used for this event is P1, which shows which cell was responsible for generating the event.

P1 - Cell hash number
P2 - Not used
P3 - Not used

Exadata Wait Events in the Other and Idle Classes

These are relatively minor events that occur primarily during startup and shutdown of storage cells and fault conditions. You will probably not see them on normally functioning systems. There is one exception to this, the cell smart flash unkeep event. Table 10-2 lists the “cell” wait events in the Other class along with their parameters. A separate section will cover cell smart flash unkeep

Table 10-2. Miscellaneous Other and Idle Class Events

Event	Description
cell manager cancel work request	This event is not very informative, as all three of the parameters (P1, P2, P3) from the v$session_wait view are unused.
cell worker online completion	This appears to be a startup event. The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused.
cell worker retry	The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused.
cell worker idle	The P1, P2, and P3 columns from the v$session_wait view are unused in this idle event.

cell smart flash unkeep

This event records the time spent waiting when Oracle must flush blocks out of Exadata Smart Flash Cache. This can occur when a table that has a storage clause designating that it be pinned in Exadata Smart Flash Cache is truncated or dropped. This is an important distinction. Beginning with Exadata 11.2.3.3.0, the cell software will automatically cache data from single block IO as well as Smart Scans. This allows the next scan to read from disk and flash simultaneously, resulting in much better performance. Prior to 11.2.3.3.0, it was necessary to manually pin tables and partitions in Flash Cache. It is easier to understand this with an example. Consider the following use case: Table T1 in database DB12C has a data object ID of 79208. After the table has been created and light query activity against it in form of Smart Scans, you can observe a pattern similar to this one:

[oracle@enkdb03 ~]$ dcli -g cell_group -l cellmonitor 
> "cellcli -e list flashcachecontent where objectNumber = 79208 detail" | grep cache
enkcel04: cachedKeepSize:        0
enkcel04: cachedSize:            4374528
enkcel04: cachedWriteSize:       4358144
enkcel05: cachedKeepSize:        0
enkcel05: cachedSize:            5521408
enkcel05: cachedWriteSize:       5521408
enkcel06: cachedKeepSize:        0
enkcel06: cachedSize:            5177344
enkcel06: cachedWriteSize:       5177344

As you can see, the cache is already populated—thanks to the Smart Scans— without having specified the KEEP keyword in the CELL_FLASH_KEEP clause. You can also see that the KEEP size is 0, for the same reason.

Event Meaning

Truncating or dropping the table does not have an immediate effect for the Smart Flash Cache at this stage. So, when would you see the event?

Back to the CELL_FLASH_CACHE clause: As soon as the storage clause is changed to KEEP, the cell will cache data from the table, populating the values for cachedKeepSize. In the previous example, it required a few more queries until the keep cache is populated:

[oracle@enkdb03 ~]$ dcli -g cell_group -l cellmonitor 
> "cellcli -e list flashcachecontent where objectNumber = 79208 detail" | grep cache
enkcel04: cachedKeepSize:        1239744512
enkcel04: cachedSize:            1243955200
enkcel04: cachedWriteSize:       5365760
enkcel05: cachedKeepSize:        1406525440
enkcel05: cachedSize:            1410203648
enkcel05: cachedWriteSize:       5980160
enkcel06: cachedKeepSize:        1355415552
enkcel06: cachedSize:            1358880768
enkcel06: cachedWriteSize:       6381568

If you truncate the table now, it will show you the cell smart flash unkeep event. Here is a tkprof’d example for the truncate statement:

 truncate table t1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          6          0           0
Execute      1      0.24       0.57         12        480       5667           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.24       0.58         12        486       5667           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 198

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  enq: IV -  contention                          10        0.00          0.00
  row cache lock                                  4        0.00          0.00
  Disk file operations I/O                        1        0.00          0.00
  enq: RO - fast object reuse                     6        0.00          0.00
  reliable message                                2        0.00          0.00
  log file sync                                   2        0.00          0.00
  cell smart flash unkeep                        18        0.01          0.01
  cell single block physical read                12        0.00          0.00
  local write wait                                3        0.00          0.00
  gc current grant 2-way                         16        0.00          0.00
  DFS lock handle                              1452        0.00          0.30
  gc current grant busy                           7        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        4.00          4.00
********************************************************************************

When reviewing the raw trace file, you will note that the cell smart flash unkeep event is preceded by a little more than a handful of enq: RO – fast object reuse events, which are used to mark time associated with cleaning up the buffer cache after a drop or truncate. The cell smart flash unkeep is basically an extension of that event to clean up the Exadata Smart Flash Cache on the storage server as well.

Parameters

The only parameter used for this event is P1, which shows which cell was responsible for generating the event:

P1 - Cell hash number
P2 - Not used
P3 - Not used

Non-Exadata-Specific Events

In addition to the new cell events, there are also a few non-Exadata-specific wait events that you should be aware of. These are events that you may already be familiar with from managing Oracle on other platforms. They happen to also be important in an Exadata environment, so they represent cases in which your existing knowledge and skill can carry over and stand you in good stead as you move in to managing Exadata.

direct path read

Direct path reads are used by Oracle to read data directly into PGA memory (instead of into the buffer cache). They are an integral part of Exadata offloading because SQL processing can only be offloaded to the storage cells when the direct path read mechanism is used. The direct path read wait event is actually replaced by the cell smart table scan and cell smart index scan wait events when a query is offloaded. However, the direct path read mechanism is still used by the code covered by those new wait events. That is, either the plan must include a parallel (table) scan or Oracle must decide to use the serial direct path read mechanism as described in Chapter 2.

Event Meaning

This event records time that Oracle spends waiting on a direct path read to complete. You should know that the direct path read wait event can be very misleading. As with the Smart Scan events, both the number of events recorded and the timings associated with them can appear to be inaccurate. This is due to the fact that direct path reads are done in an asynchronous and overlapping fashion. In essence, this event is not an I/O latency wait because of the asynchronous/overlapping nature of how Oracle fires off I/O requests, while the classic, synchronous I/O events are. It also bears mentioning that Oracle—from 11gR2 onward—contains an enhancement that causes serial direct path reads to occur more frequently than in previous releases. See MOS Note 793845.1, which briefly mentions this change. Many DBAs must have wondered why the I/O profile changed after the database was migrated from 10.2 to 11.2 and 12c.

Although not as relevant as it is on other platforms, the direct path read wait event does still show up on the Exadata platform for various operations but generally not for full table scans unless the table (or partition) is relatively small. The other case where you might see a direct path read is when the scan of the segment in question is not offloadable, such as for an Index Organized Table (IOT). An example for a 10046 trace and direct path reads is shown in the description of cell smart table scan.

Parameters

The parameters for this event show you exactly which segment (obj) is scanned and which set of blocks were scanned during this event:

P1 - File number
P2 - First Database Block Address (DBA)
P3 - Block count
obj# - The object number of the table being scanned

As mentioned in the cell smart table scan section, the parameters contain more specific information about which file and object are being accessed. The offset into the file is also provided in the P2 parameter, along with the number of contiguous blocks read in the P3 parameter.

Enq: KO—fast object checkpoint

The enq:KO event has a very strange name. Don’t be put off by that. The event is essentially an object checkpoint event. The V$LOCK_TYPE view describes the KO lock as follows:

SQL> select type, name, description from v$lock_type
  2  where type = 'KO';

TYPE  NAME                           DESCRIPTION
----- ------------------------------ ---------------------------------------------
KO    Multiple Object Checkpoint     Coordinates checkpointing of multiple objects

Event Meaning

This event is used when a session is waiting for all dirty blocks to be flushed from the buffer cache for a segment prior to starting a direct path read or cell smart table scan or cell smart index scan. This event is important because the time required to do the checkpoint may outweigh the benefit of the direct path reads. This is unlikely on Exadata storage, though, where the additional Smart Scan benefits are only enabled by the direct path read mechanism. Nevertheless, the execution engine tries to factor the checkpoint cost in before the decision is made to offload the query to the cells. Chapter 2 contains a lot more information about the mechanism.

Parameters

The parameters for this event are not overly helpful, but the event does show which object is scanned. Here are the parameter definitions:

P1 - Name/ Mode
P2 - Not used
P3 - Not used
obj# - The object number of the object being check pointed

reliable message

The reliable message event is used to record time spent communicating with background processes, like the checkpoint process (CKPT). We have included it here because of its close association with the enq: KO – fast object checkpoint event.

Event Meaning

In Oracle 11.2 and later, this event is the precursor to the enq: KO – fast object checkpoint event (among others). The communication is done using an inter-process communication channel rather than a more normal post mechanism. This communication method allows the sender to request an ACK before it continues; thus, the reason it is called a reliable message. It is generally a very short duration event as it only records time for communicating between processes. Both the users foreground process and the CKPT process will wait on this event as they communicate with each other. Here is an excerpt of a 10046 trace file showing a complete reliable message event, taken from a 11.2.0.4 database:

=====================
PARSING IN CURSOR #140457550697520 len=27 dep=0 uid=44 oct=3 lid=44 tim=1404832965816981
  hv=521453784 ad='25d376810' sqlid='cdjur50gj9h6s'
select count(*) from bigtab
END OF STMT
PARSE #140457550697520:c=1000,e=1072,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2140185107,
  tim=1404832965816980
EXEC #1...0:c=0,e=35,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2140185107,tim=1404832965817073
WAIT #1...0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0
  obj#=18532 tim=1404832965817131
WAIT #1...0: nam='reliable message' ela= 1230 channel context=10035861184 channel
  handle=10055026416 broadcast message=10199413976 obj#=18532 tim=1404832965818567
WAIT #1...0: nam='enq: KO - fast object checkpoint' ela= 189 name|mode=1263468550 2=65571 0=1
  obj#=18532 tim=1404832965818827
WAIT #1...0: nam='enq: KO - fast object checkpoint' ela= 107 name|mode=1263468545 2=65571 0=2
  obj#=18532 tim=1404832965819004
WAIT #1...0: nam='cell smart table scan' ela= 199 cellhash#=2133459483 p2=0 p3=0 obj#=18532
  tim=1404832965820790
WAIT #1...0: nam='cell smart table scan' ela= 182 cellhash#=3176594409 p2=0 p3=0 obj#=18532
  tim=1404832965821746
WAIT #1...0: nam='cell smart table scan' ela= 163 cellhash#=379339958 p2=0 p3=0 obj#=18532
  tim=1404832965822672

Parameters

Here are the parameters for the reliable message event:

P1 - Channel context
P2 - Channel handle
P3 - Broadcast message
obj# - The object number of the object of interest (not always set)

Resource Manager Events

Before wrapping up the chapter, we will discuss a few Resource Manager events that you should be aware of. While these are not specific to Exadata, Resource Manager provides key functionality for combining mixed workloads on Exadata. There are actually eight separate events as of release 12.1.0.2. The following query against V$EVENT_NAME shows these events and their parameters:

SQL> select name,parameter1,parameter2,parameter3,wait_class
  2  from v$event_name where name like 'resmgr%' order by name;

NAME                           PARAMETER1 PARAMETER2           PARAMETER3  WAIT_CLASS
------------------------------     ---------    --------------------     ----------    ------------
resmgr:become active           location                                    Scheduler
resmgr:cpu quantum             location   consumer group id                Scheduler
resmgr:internal state change   location                                    Concurrency
resmgr:internal state cleanup  location                                    Other
resmgr:large I/O queued        location                                    Scheduler
resmgr:pq queued               location                                    Scheduler
resmgr:sessions to exit        location                                    Concurrency
resmgr:small I/O queued        location                                    Scheduler

8 rows selected.

There are only three of these events that are of interest.

resmgr:become active

You will see this event when sessions are waiting to begin execution. Consider, for example, where you define a consumer group to have a maximum number of concurrent sessions at a given point in time. Any further session will wait on “resmgr:become active” before it can start its work.

Event Meaning

The event indicates sessions are being held back by Resource Manager from beginning execution. If you create a plan directive limiting the number of concurrent sessions in a consumer group, you will see output similar to this:

SQL> select sid,serial#,username,seq#,event,resource:consumer_group rsrc_cons_grp
  2  from v$session where username = 'LMTD';

       SID    SERIAL# USERNAME         SEQ# EVENT                          RSRC_CONS_GRP
---------- ---------- ---------- ---------- ------------------------------ -----------------
       916       1413 LMTD            31124 cell smart table scan          LMTD_GROUP
      1045       2667 LMTD            29480 cell smart table scan          LMTD_GROUP
      1112      20825 LMTD               42 resmgr:become active           LMTD_GROUP
      1178         91 LMTD               43 resmgr:become active           LMTD_GROUP
      1239      22465 LMTD               46 resmgr:become active           LMTD_GROUP
      1311      19703 LMTD               47 resmgr:become active           LMTD_GROUP
      1374      32129 LMTD               44 resmgr:become active           LMTD_GROUP
      1432      29189 LMTD               40 resmgr:become active           LMTD_GROUP

8 rows selected.

To be more precise, the number of concurrent sessions for consumer group LMTD_GROUP is set to 2. Here is the corresponding configuration example:

BEGIN
 dbms_resource:manager.clear_pending_area();
 dbms_resource:manager.create_pending_area();

 dbms_resource:manager.UPDATE_PLAN_DIRECTIVE(
  plan => 'ENKITEC_DBRM',
  group_or_subplan => 'LMTD_GROUP',
  new_active_sess_pool_p1 => 2);

 dbms_resource:manager.validate_pending_area();
 dbms_resource:manager.submit_pending_area();
end;
/

With this setting, you will never have more than the defined number of sessions performing work at the same time.

Parameters

Here are the parameters for this event. Note that the obj# parameter exists but is not used:

P1 - Location
P2 - Not used
P3 - Not used
obj# - N/A

The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code. There are at least five distinct locations that we have observed. Unfortunately, Oracle does not publicly document where in the Oracle kernel these checks are performed.

resmgr:cpu quantum

This event is used to record forced idle time imposed by Database Resource Manager (DBRM) due to competition with higher priority work. Said another way, it is the time a process spent waiting for DBRM to allocate it a time slice. Interestingly, at of the time of this writing, no database event reflects throttling on the I/O Resource Manager level.

Event Meaning

DBRM behaves in an analogous manner to CPU scheduling algorithms in that it divides time into units (quantum) and either allows a process to run or not, depending on other workload on the system. Unlike CPU scheduling algorithms, though, DBRM throttling is interjected at key locations in the Oracle code to eliminate the possibility of a process being kicked off of the CPU when it is holding a shared resource such as a latch. This prevents some nasty behavior that may occur on heavily loaded systems such as priority inversion problems. In effect, the processes voluntarily go to sleep when they are not holding these shared resources. There are multiple locations in the code where these checks are implemented. Here’s an excerpt of a 10046 trace file showing the resmgr:cpu quantum event:

PARSING IN CURSOR #140473587700424 len=30 dep=1 uid=208 oct=3 lid=208 tim=4695862695328
  hv=2336211758 ad='22465dc70' sqlid='gq01wgy5mzhtf'
SELECT COUNT(*) FROM MARTIN.T1
END OF STMT
EXEC #1...4:c=0,e=97,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3724264953,tim=4695862695326
WAIT #1...4: nam='enq: KO - fast object checkpoint' ela= 216 name|mode=1263468550 2=65683 0=2
  obj#=79208 tim=4695862695834
WAIT #1...4: nam='reliable message' ela= 1126 channel context=10126085744 channel
  handle=10164815328 broadcast message=10179052064 obj#=79208 tim=4695862697054
WAIT #1...4: nam='enq: KO - fast object checkpoint' ela= 160 name|mode=1263468550
  2=65683 0=1 obj#=79208 tim=4695862709511
WAIT #1...4: nam='enq: KO - fast object checkpoint' ela= 177 name|mode=1263468545
  2=65683 0=2 obj#=79208 tim=4695862709811
WAIT #1...4: nam='cell smart table scan' ela= 1830 cellhash#=3249924569 p2=0 p3=0
  obj#=79208 tim=4695862713420
WAIT #1...4: nam='cell smart table scan' ela= 326 cellhash#=674246789 p2=0 p3=0 obj#=79208
  tim=4695862714610
WAIT #1...4: nam='cell smart table scan' ela= 362 cellhash#=822451848 p2=0 p3=0 obj#=79208
  tim=4695862715784
WAIT #1...4: nam='resmgr:cpu quantum' ela= 278337 location=2 consumer group id=80408  =0
  obj#=79208 tim=4695863561098
WAIT #1...4: nam='cell smart table scan' ela= 2713 cellhash#=822451848 p2=0 p3=0 obj#=79208
  tim=4695863564184

Parameters

Here are the parameters for this event. Note that the obj# parameter exists but is not used:

P1 - Location
P2 - Consumer group id
P3 - Not used
obj# - N/A

The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code as described above in the resmgr:become active event.

The consumer group number in the P2 parameter is pretty self-explanatory. It maps to the CONSUMER_GROUP_ID column in the DBA_RSRC_CONSUMER_GROUPS view. This parameter allows you to tell what consumer group a process was assigned to when its CPU usage was curtailed.

resmgr:pq queued

This event is used to record time spent waiting in the parallel query queue. Parallel statement queueing is a new feature introduced with 11g Release 2. One part of the new functionality that is enabled by setting parallel_degree_policy to Automatic is statement queueing. In Oracle 12c, you can also set the value to ADAPTIVE for the same purpose. Statement queuing allows you to queue for available parallel server slaves if the required degree of parallelism cannot be met instead of being downgraded, as with the old parallel automatic tuning option. Refer back to Chapter 6 for more information about parallel execution on Exadata.

Event Meaning

The parallel statement queuing feature comes with its own wait event. Statements that are queued due to insufficient parallel server processes or other directives clock time to this event. Here is an excerpt of a 10046 trace file showing the resmgr:pq queued event:

PARSING IN CURSOR #140583609334256 len=68 dep=0 uid=205 oct=3 lid=205 tim=4755825333777
  hv=2381215807 ad='220a20bc0' sqlid='944dxty6ywy1z'
select /*+ PARALLEL(32) STATEMENT_QUEUING */ count(*) from martin.t3
END OF STMT
PARSE #140583609334256:c=0,e=188,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3978228158,
  tim=4755825333776
WAIT #140583609334256: nam='resmgr:pq queued' ela= 25193140 location=1  =0  =0 obj#=-1
  tim=4755850527127
WAIT #140583609334256: nam='reliable message' ela= 889 channel context=10126085744 channel
  handle=10164818472 broadcast message=10179043672 obj#=-1 tim=4755850529508
WAIT #140583609334256: nam='enq: KO - fast object checkpoint' ela= 197 name|mode=1263468550
  2=65690 0=1 obj#=-1 tim=4755850529777
WAIT #140583609334256: nam='enq: KO - fast object checkpoint' ela= 124 name|mode=1263468545
  2=65690 0=2 obj#=-1 tim=4755850529959
WAIT #140583609334256: nam='PX Deq: Join ACK' ela= 39 sleeptime/senderid=268500992 passes=1
  p3=9216157728 obj#=-1 tim=4755850531333
WAIT #140583609334256: nam='PX Deq: Join ACK' ela= 42 sleeptime/senderid=268500993 passes=1
  p3=9216246432 obj#=-1 tim=4755850531420
WAIT #140583609334256: nam='PX Deq: Join ACK' ela= 39 sleeptime/senderid=268500994 passes=1
  p3=9216266144 obj#=-1 tim=4755850531507

While testing this use case, it turned out that the event is written to the tracefile only once, at the very beginning. Furthermore, the trace information about the queued statement is not emitted until the session actually starts work.

Parameters

Here are the parameters for the resmgr:pq queued event:

P1 - Location
P2 - Not used
P3 - Not used
obj# - NA

The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code as described above in the resmgr:become active event.

Summary

The wait interface has been expanded to cover several Exadata-specific features. In this chapter, you read about the new wait events you should know. By far the most interesting of the new events are cell smart table scan and cell smart index scan. These events cover the time spent waiting on an offloadable read I/O request to a storage cell. There is a lot of processing that occurs at the storage layer that is lumped together under these events. It’s important to understand that these events replace the direct path read event and that the mechanism of returning the data directly to the process PGA employed by the Smart Scan events is analogous to the way it is handled by direct path read.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10: Exadata Wait Events

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 10: Exadata Wait Events