Exadata Wait Events
The Oracle database is a very well-instrumented piece of code, and it has been so for quite a while. It keeps track of the amount of time spent in discrete operations via the use of wait events, unless the session in question is on the CPU. While the database software is quite complex, wait event analysis allows performance analysts to determine where the database is spending its time. Many difficult performance problems can be resolved by analyzing data from the wait interface and, more recently, the Active Session History (ASH). The introduction of Exadata has resulted in the creation of several new wait events to support the unique operations that are performed on the platform. This chapter will focus on describing these new events and how they relate to the activities actually being performed, while contrasting them with the wait events used by the database on non-Exadata platforms. It will also describe a few wait events that are not specific to Exadata but play an important role on Exadata platforms.
In some rare cases, the wait interface is not granular enough to allow you to work out what the database engine is spending time on. If you use your favorite search engine to search for “wait interface not enough,” you will find blog posts about that exact subject, as well as how the session statistics Oracle maintains can provide further insight into what a session does. Chapter 11 will help you understand the session statistics better. In most cases, though, an analysis of the wait events will be sufficient to troubleshoot the problem. A very useful piece of advice to this chapter’s author was not to get lost in the details too soon!
A wait event is actually a section of code that is timed and assigned a name. The pieces of code covered by these events are quite often discrete operating system calls, such as individual I/O requests, but some wait events cover considerably larger portions of code. The events may even include sections of code that contain other wait events. The naming of wait events has been somewhat inconsistent over the years, and many events have slightly misleading names. Even though some of the event names are acknowledged to be potentially misleading, Oracle has been understandably reluctant to change them. The Exadata platform has provided an excuse to rename some of the I/O-related events, and, as you will see shortly, the developers took the opportunity to do just that.
Wait events are externalized in many places in the database. The most common ways to view wait events are queries against V-dollar views, tracing, and SQL Monitor. Common V-dollar views used to view wait event information are V$SESSSION or V$SESSION_WAIT. These views show the current wait event for a session, but not the history of events. The introduction of Active Session History in 10g changed this for the better by sampling active sessions and recording relevant information for the performance analyst, but it requires an extra license. Always make sure that what you are doing is compliant with your licenses.
If you want to get all waits for a session, you have the option to enable SQL tracing. The well-established SQL traces allow you to record SQL statements issued against the database as well as all the associated wait events. The raw trace file, which is found in the diagnostic destination for the database, is then post-processed and transformed into something more human-readable.
An alternative to SQL tracing is the use of the SQL Monitor. Thankfully on the Exadata platform, you will not be in a position where this great tool is not technically unavailable (licensing again is a separate question). SQL Monitor was introduced in Oracle 11.1, which was the minimum supported version on Exadata. SQL Monitor allows you to peek into the execution of a single SQL statement in real time. During the SQL statement’s execution, you can see exactly where and in what row source of the SQL execution of a SQL the RDBMS engine is spending time, and you can see wait events, if applicable, as well. All of that is available even while the statement is still executing! To be able to make use of this great tool, you have to have the Diagnostic and Tuning Pack licensed for the database. The underlying technology enabling SQL Monitor again is Active Session History (ASH) and, as you just read, was introduced with Oracle 10g. ASH samples database activity every second and gathers information about active sessions, which it stores for about one hour with a one-second interval. After that, the information is aggregated and every tenth sample is preserved on disk in the SYSAUX tablespace. The persistent part is referred to as the Active Workload Repository (AWR). AWR resembles the STATSPACK tool only in that it keeps a record of database activity over time. The retention period for the information is user configurable. The amount of space it takes to store AWR information can be large, but you should not forget that more data allows you to make comparisons with past events a lot easier. As with everything, you need to find the balance between disk space usage and the advantage of comparing a perceived performance problem with past events. In my opinion, the default retention period of eight days is far too little and should be increased.
Events Specific to Exadata
There are actually no events that exist exclusively on the Exadata platform. Wait events are built into the database code. Since the compute nodes on Exadata run standard Oracle Database software, all the wait events that are used when invoking Exadata-specific features are available in databases that are running on non-Exadata platforms as well. But because the Exadata features are only available on the Exadata platform, no time is ever allocated to those events on other platforms. By way of proof, consider the following example, which compares the events from V$EVENT_NAME (which exposes the valid wait events) first on an Exadata Database Machine and then on a standard 12c Release 1 database on a non-Exadata platform:
SQL> select count(1) from v$event_name;
COUNT(1)
-----------
1650
SQL> select count(1) from v$event_name@lab12c;
COUNT(1)
-----------
1650
SQL> select name from v$event_name
2 minus
3 select name from v$event_name@lab12c;
no rows selected
So there are no differences in the events. This does make it a little difficult to come up with a list of “Exadata Only” events. The event names however are a good starting point.
The Oracle Exadata Storage Server Software User’s Guide 12c Release 1 provides a table of wait events. All of them start with the word “cell.” The manual lists ten such events. One of them (cell interconnect retransmit during physical read) still does not actually exist.
There is also another batch of event names with the word “cell” in their names that are not included in the documentation. Combining both the documented and undocumented events reveals the complete list of “cell” events. This list will be the starting point of this chapter. You can query V$EVENT_NAME for that list, and you will get the following results. Notice that most of the events are in one of the I/O classes. The number of wait events has not changed in 12.1.0.1.x; there are still the same 17 events as found in 11.2.0.2 used for this chapter in the first edition of the book.
SYS:db12c1> select name,wait_class
2 from v$event_name
3 where name like 'cell%'
4 order by name;
NAME WAIT_CLASS
-------------------------------------------------- --------------------
cell list of blocks physical read User I/O
cell manager cancel work request Other
cell manager closing cell System I/O
cell manager discovering disks System I/O
cell manager opening cell System I/O
cell multiblock physical read User I/O
cell single block physical read User I/O
cell smart file creation User I/O
cell smart flash unkeep Other
cell smart incremental backup System I/O
cell smart index scan User I/O
cell smart restore from backup System I/O
cell smart table scan User I/O
cell statistics gather User I/O
cell worker idle Idle
cell worker online completion Other
cell worker retry Other
17 rows selected.
Oracle 12.1.0.2 was the first release that added new cell-related wait events. These are as follows:
The events are covered in the later section with one exception: The external table scan event belongs to a different Oracle product and will not be covered here.
The following sections will cover all of the relevant Exadata wait events, along with a few of additional events that have special applicability to Exadata.
Plan Steps That Trigger Events
First, though, it might be interesting to see what operations (plan steps) cause the “cell” wait events to occur. Here is a query against DBA_HIST_ACTIVE_SESS_HISTORY on an active production system running on an Exadata system that shows cell events and the operations that caused them. The list is not exhaustive, of course, not every event is visible on every system!
SQL> select event, operation, count(*) from (
2 select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
3 from DBA_HIST_ACTIVE_SESS_HISTORY
4 where event like 'cell %')
5 group by operation, event
6 order by 1,2,3
7 /
EVENT OPERATION COUNT(*)
---------------------------------- ---------------------------------------- ----------
cell list of blocks physical read 62
DDL STATEMENT 2
INDEX FAST FULL SCAN 1
INDEX RANGE SCAN 3060
INDEX STORAGE FAST FULL SCAN 7
INDEX STORAGE SAMPLE FAST FULL SCAN 10
INDEX UNIQUE SCAN 1580
INSERT STATEMENT 6
TABLE ACCESS BY GLOBAL INDEX ROWID 151
TABLE ACCESS BY INDEX ROWID 5458
TABLE ACCESS BY LOCAL INDEX ROWID 131
TABLE ACCESS STORAGE FULL 183
TABLE ACCESS STORAGE SAMPLE 2
TABLE ACCESS STORAGE SAMPLE BY ROWID RAN 1
cell multiblock physical read 3220
DDL STATEMENT 157
INDEX FAST FULL SCAN 94
INDEX RANGE SCAN 2
INDEX STORAGE FAST FULL SCAN 6334
INDEX STORAGE SAMPLE FAST FULL SCAN 429
UNIQUE SCAN 2
VIEW ACCESS STORAGE FULL 634
MAT_VIEW ACCESS STORAGE SAMPLE 56
TABLE ACCESS BY GLOBAL INDEX ROWID 5
TABLE ACCESS BY INDEX ROWID 484
TABLE ACCESS BY LOCAL INDEX ROWID 3
TABLE ACCESS STORAGE FULL 41559
TABLE ACCESS STORAGE SAMPLE 1763
TABLE ACCESS STORAGE SAMPLE BY ROWID RAN 78
UPDATE 4
cell single block physical read 181186
BUFFER SORT 1
CREATE TABLE STATEMENT 67
DDL STATEMENT 985
DELETE 11204
DELETE STATEMENT 6
FIXED TABLE FIXED INDEX 352
FOR UPDATE 27
HASH GROUP BY 3
HASH JOIN 14
HASH JOIN RIGHT OUTER 1
INDEX BUILD NON UNIQUE 80
INDEX BUILD UNIQUE 6
INDEX FAST FULL SCAN 9
INDEX FULL SCAN 1101
INDEX RANGE SCAN 17597
INDEX RANGE SCAN (MIN/MAX) 1
INDEX RANGE SCAN DESCENDING 6
INDEX SKIP SCAN 691
INDEX STORAGE FAST FULL SCAN 313
INDEX STORAGE SAMPLE FAST FULL SCAN 72
INDEX UNIQUE SCAN 30901
INSERT STATEMENT 5174
LOAD AS SELECT 120
LOAD TABLE CONVENTIONAL 5827
MAT_VIEW ACCESS STORAGE FULL 3
MAT_VIEW ACCESS STORAGE SAMPLE 1
MERGE 12
PX COORDINATOR 1
SELECT STATEMENT 978
SORT CREATE INDEX 1
SORT GROUP BY 1
SORT JOIN 5
SORT ORDER BY 2
TABLE ACCESS BY GLOBAL INDEX ROWID 5812
TABLE ACCESS BY INDEX ROWID 65799
TABLE ACCESS BY LOCAL INDEX ROWID 4591
TABLE ACCESS BY USER ROWID 464
TABLE ACCESS CLUSTER 57
TABLE ACCESS STORAGE FULL 7168
TABLE ACCESS STORAGE SAMPLE 205
TABLE ACCESS STORAGE SAMPLE BY ROWID RAN 24
UNION-ALL 7
UPDATE 89353
UPDATE STATEMENT 367
WINDOW CHILD PUSHED RANK 2
WINDOW SORT 1
WINDOW SORT PUSHED RANK 1
cell smart file creation 35
DELETE 3
INDEX BUILD NON UNIQUE 5
LOAD AS SELECT 3
LOAD TABLE CONVENTIONAL 1
UPDATE 1
cell smart incremental backup 714
cell smart index scan 14
INDEX STORAGE FAST FULL SCAN 42
INDEX STORAGE SAMPLE FAST FULL SCAN 32
cell smart table scan 163
MAT_VIEW ACCESS STORAGE FULL 1
TABLE ACCESS STORAGE FULL 12504
Again, this output does not show all the possible combinations, but it should give you an idea of the relative frequency of events and which operations generally motivate them.
Exadata Wait Events in the User I/O Class
The User I/O Class is far and away the most important for Exadata. The most interesting events in this category are, of course, the two Smart Scan events (cell smart table scan and cell smart index scan). These are the events that record time for the primary query offloading optimizations provided by Exadata, which primarily include predicate filtering, column projection, and storage index usage. The User I/O Class also contains three events described as physical I/O events. These three events actually measure time for physical I/O using the more familiar multi-block and single-block read mechanisms you are used to seeing on non-Exadata platforms, although their names have been changed to something a little more meaningful. Finally, there are two events that do not really seem to belong in the User I/O category at all. One has to do with initialization of blocks when file space is allocated. The other has to do with gathering statistical information from the storage cells. Oracle 12.1.2.1 and database 12.1.0.2 introduced three new minor cell events as well related to user I/O that did not exist in 12.1.0.1 or 11.2.0.3. Each of these wait events are covered in turn in the next several sections, starting with the Smart Scan events.
cell smart table scan
The cell smart table scan event is what Oracle uses to account for time spent waiting for full table scans that are offloaded. It is the most important event on the Exadata platform for reporting workloads. Its presence or absence can be used to verify whether a statement benefited from offloading or not. As discussed in Chapter 2, offloading only occurs when Oracle is able to do direct path reads. Consequently, this event replaces the direct path read event in most cases on Exadata. As with non-Exadata direct path reads, data is returned directly to the PGA of the requesting process on the database server (either the user’s shadow process or a parallel slave process). Blocks are not returned to the buffer cache.
Event Meaning
Although the mechanism for performing reads across the InfiniBand network is very different from that for normal reads on non-Exadata platforms, the code path driving the Smart Scans is actually very similar to a direct path read on a non-Exadata platform. The main difference lies in the fact that each request to a storage cell contains a reference to the metadata of the statement, which in the case of Exadata includes the predicates and the list of columns to be returned, among others. Since the storage cells have access to this information, they can apply the filters and do the column projection before returning the data to the requesting process. These optimizations are applied to each set of blocks as they are requested. The processes on the database servers that request the data have access to the ASM extent map and can, therefore, request the required allocation units (AUs) from each storage cell. The storage cells read the requested AU in the absence of storage index segments and apply the predicate filters, among other tasks. If any rows satisfy the filters, the cells return the projected columns to the requesting process. The process then requests the next AU and the entire routine is repeated until all the data has been scanned. So this event will occur repeatedly in a large scan. It is important that, unlike some other I/O-related events, a single Smart Scan event cannot be used to derive I/O performance. You either use OEM 12c for that or the cellcli command to access metric information on the cells. We cover this in Chapter 12 in more detail.
Note Column Projection is one of the major optimizations provided by Smart Scans. The feature is slightly misunderstood. It does not pass only columns in the select list back to the database servers; it also passes back some columns from the WHERE clause. Older versions of cellsrv passed all columns specified in a WHERE clause back to the database tier. Later versions have corrected this behavior to include only columns that are involved in joining predicates. You can see the columns projected in the output from DBMS_XPLAN.DISPLAY_CURSOR, for example.
As all wait events, the cell smart table scan will be recorded in a SQL trace. But with Exadata, you might want to investigate further, since a trace file is not limited to contain information for a single event you set. It may not immediately be obvious that you can combine the output of multiple events in a single trace file. For Smart Scans, you could include traces for LIBCELL’s client library (mainly useful for inter-system communication), the Exadata Smart Scan layer, and finally for SQL Trace to get a combined picture of everything that is happening. Consider the following snippet to enable lots of tracing. You would not normally need to trace that much information, but it is extremely useful for research! Needless to say, you do not enable these events in production. This is for a development system only, outside of production and disaster recovery environments. The amount of information in the trace can be massive, filling up the /u01 mount point and effectively causing an outage.
SQL> select value from v$diag_info where name like 'Default%'
VALUE
----------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/db12c/db12c1/trace/db12c1_ora_11916.trc
SQL> alter session set events 'trace[LIBCELL.Client_Library.*] disk=highest';
Session altered.
SQL> alter session set events 'sql_trace level 8';
Session altered.
SQL> alter session set events 'trace[KXD.*] disk=highest';
Session altered.
SQL> select count(*) from t1;
COUNT(*)
----------
33554432
Oracle has been so kind as to document many of its traceable components, so you can actually see the code prefixes and map them to their respective code layer. The command to get the magic documentation is oradebug. Use oradebug doc event name and oradebug doc component to get more information about what can be traced, as in this example for LIBCELL.
SQL> oradebug doc component libcell
Components in library LIBCELL:
--------------------------
Client_Library Client Library
Disk_Layer Disk Layer
Network_Layer Network Layer
IPC_Layer IPC Layer
The asterisk in the above code example includes every sublayer without having to explicitly specify it. Be warned that the resulting trace will most likely be huge and can cause space problems in your database mount point. But let’s not digress—back to the SQL Trace. When tracing SQL alone, the trace file shows the following lines:
=====================
PARSING IN CURSOR #140...784 len=23 dep=0 uid=198 oct=3 lid=198 tim=3471433479182
hv=4235652837 ad='808033f0' sqlid='5bc0v4my7dvr5'
select count(*) from t1
END OF STMT
PARSE #140...784:c=0,e=138,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3724264953,tim=3471433479181
EXEC #140...784:c=0,e=51,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3724264953,tim=3471433479280
WAIT #140...784: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0
obj#=-1 tim=3471433479330
WAIT #140...784: nam='reliable message' ela= 968 channel context=10126085744 channel
handle=10164814784 broadcast message=10178993320 obj#=-1 tim=3471433480522
WAIT #140...784: nam='enq: KO - fast object checkpoint' ela= 194 name|mode=1263468550 2=65606
0=1 obj#=-1 tim=3471433480789
WAIT #140...784: nam='enq: KO - fast object checkpoint' ela= 125 name|mode=1263468545 2=65606
0=2 obj#=-1 tim=3471433480986
WAIT #140...784: nam='Disk file operations I/O' ela= 6 FileOperation=2 fileno=7 filetype=2
obj#=-1 tim=3471433481040
WAIT #140...784: nam='cell smart table scan' ela= 145 cellhash#=3249924569 p2=0 p3=0 obj#=61471
tim=3471433501731
WAIT #140...784: nam='cell smart table scan' ela= 149 cellhash#=674246789 p2=0 p3=0 obj#=61471
tim=3471433511233
WAIT #140...784: nam='cell smart table scan' ela= 143 cellhash#=822451848 p2=0 p3=0 obj#=61471
tim=3471433516295
WAIT #140...784: nam='cell smart table scan' ela= 244 cellhash#=3249924569 p2=0 p3=0 obj#=61471
tim=3471433561706
WAIT #140...784: nam='cell smart table scan' ela= 399 cellhash#=674246789 p2=0 p3=0
...
Unfortunately, Oracle changed the format of the cursor names somewhere in the life cycle in Oracle 11g. This would have caused the trace lines to wrap around in the output above. This is why the full cursor identifier has been truncated.
This portion of the trace file also shows the enq: KO - fast object checkpoint event, which is used to ensure that any dirty blocks for the scanned object are flushed to disk prior to beginning the scan. By the way, the direct path read event is not completely eliminated on Exadata platforms. In fact, it is possible to use a hint to disable offloading and see how the same statement behaves without offloading:
=====================
PARSING IN CURSOR #140...520 len=75 dep=0 uid=198 oct=3 lid=198 tim=3471777663904 hv=2068810426 ad='9114cd20' sqlid='44xptp5xnz2pu'
select /*+ opt_param('cell_offload_processing','false') */
count(*) from t1
END OF STMT
PARSE #140...520:c=2000,e=1381,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3724264953,
tim=3471777663903
EXEC #140...520:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3724264953,tim=3471777663997
WAIT #140...520: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0
obj#=61471 tim=3471777664057
WAIT #140...520: nam='enq: KO - fast object checkpoint' ela= 228 name|mode=1263468550 2=65677
0=2 obj#=61471 tim=3471777664460
WAIT #140...520: nam='reliable message' ela= 1036 channel context=10126085744 channel
handle=10164814784 broadcast message=10179010104 obj#=61471 tim=3471777665585
WAIT #140...520: nam='enq: KO - fast object checkpoint' ela= 134 name|mode=1263468550 2=65677
0=1 obj#=61471 tim=3471777665771
WAIT #140...520: nam='enq: KO - fast object checkpoint' ela= 126 name|mode=1263468545 2=65677
0=2 obj#=61471 tim=3471777665950
WAIT #140...520: nam='direct path read' ela= 807 file number=7 first dba=43778051 block cnt=13
obj#=61471 tim=3471777667104
WAIT #140...520: nam='direct path read' ela= 714 file number=7 first dba=43778081 block cnt=15
obj#=61471 tim=3471777668112
WAIT #140...520: nam='direct path read' ela= 83 file number=7 first dba=43778097 block cnt=15
obj#=61471 tim=3471777668322
WAIT #140...520: nam='direct path read' ela= 617 file number=7 first dba=43778113 block cnt=15
obj#=61471 tim=3471777669050
WAIT #140...520: nam='direct path read' ela= 389 file number=7 first dba=43778129 block cnt=15
obj#=61471 tim=3471777669549
Note that we still have the enq: KO – fast object checkpoint events for flushing dirty blocks. So, it is clear that the cell smart table scan event replaces this event.
Parameters
The parameters for this event are not particularly informative. Only the object ID of the table being scanned and the cell hash number are provided:
You should use the obj# reported in the trace to look up the table name in DBA_OBJECTS. Remember that Smart Scans operate on the segment level—you will need to query for the data_object_id rather than the object_id. You will notice that the direct path read event (which cell smart table scan replaces) provides additional information including the file number, the offset into the file (first dba), and the number of contiguous blocks read (block cnt). On the other hand, with the direct path read event, there is no indication how the read requests are routed to the individual cells.
The cell hash number reported with many of the wait events can be found in the V$CELL view. This view has only two columns, CELL_PATH and CELL_HASHVAL. The CELL_PATH column actually contains the IP address of the storage cell. If you trace using the KXD and LIBCELL components, you will see a lot more information prior and post each wait event. The additional trace information is well suited to give you extremely in-depth insights in the Smart Scan processing. A full explanation of these does not fit the scope of the chapter; please refer back to Chapter 2.
Unlike some of the non-Exadata I/O events, the Smart Scan event is not really suited to time I/O completion. Remember from Chapter 2 that Smart Scans are fired off asynchronously against various threads in the cell—each is a separate logical unit of work. The Oracle software may read the results of these I/O events out of their order of execution. This, as well as the complicated nature of Smart Scan processing, makes it impossible to derive I/O latencies using this event.
Time is clocked to the cell smart index scan event when fast full index scans are performed that are offloaded. This event is analogous to cell smart table scan, except that the object being scanned is an index. This index access path is not to be confused with any of the other index access paths available, such as index unique, range, or full scan. The latter indicate single block I/O calls and by definition cannot be offloaded. The cell smart index scan wait replaces the direct path read event and returns data directly to the PGA of the requesting process as opposed to the buffer cache.
Event Meaning
This event does not show up very often on the systems we have observed, probably for several reasons:
In addition, only fast full scans of indexes are eligible for Smart Scans (range scans and full scans are not eligible). As a result of these issues, cell smart index scans are fairly rare compared to cell smart table scans. It is, of course, possible to encourage the feature with hints (such as parallel_index) or by decorating specific indexes with a parallel degree setting of greater than 1. Here is an excerpt from a 10046 trace file showing the event:
=====================
PARSING IN CURSOR #140694139370464 len=100 dep=1 uid=198 oct=3 lid=198 tim=3551928848463
hv=3104270335 ad='85f450b8' sqlid='fg97a72whftzz'
select /*+ monitor gather_plan_statistics parallel_index(t1) */
/* ffs_test_001 */
count(id) from t1
END OF STMT
PARSE #140694139370464:c=0,e=114,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3371189879,
tim=3551928848462
WAIT #140694139370464: nam='cell smart index scan' ela= 173 cellhash#=674246789 p2=0 p3=0
obj#=78100 tim=3551928872228
WAIT #140694139370464: nam='cell smart index scan' ela= 127 cellhash#=674246789 p2=0 p3=0
obj#=78100 tim=3551928872442
WAIT #140694139370464: nam='cell smart index scan' ela= 8888 cellhash#=674246789 p2=0 p3=0
obj#=78100 tim=3551928881675
WAIT #140694139370464: nam='cell smart index scan' ela= 142 cellhash#=822451848 p2=0 p3=0
obj#=78100 tim=3551928900699
WAIT #140694139370464: nam='cell smart index scan' ela= 400 cellhash#=822451848 p2=0 p3=0
obj#=78100 tim=3551928901202
Note that this trace file was produced by one of the parallel slave processes and not the requesting process. Some of the PX messages have been removed for clarity. The trace produced for the same statement when offloading is disabled should look more familiar. Here is an excerpt, again from a query slave:
=====================
PARSING IN CURSOR #13...92 len=158 dep=1 uid=198 oct=3 lid=198 tim=3553633447559 hv=3263861856 ad='9180bab0' sqlid='35yb4ug18p530'
select /*+ opt_param('cell_offload_processing','false') monitor
gather_plan_statistics parallel_index(t1) */
/* ffs_test_002 */
count(id) from t1
END OF STMT
PARSE #13...92:c=0,e=152,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3371189879,tim=3553633447557
WAIT #13...92: nam='PX Deq: Execution Msg' ela= 9671 sleeptime/senderid=268566527 passes=1
p3=9216340064 obj#=-1 tim=3553633458202
WAIT #13...92: nam='direct path read' ela= 5827 file number=7 first dba=39308443 block cnt=128 obj#=78100 tim=3553633464791
WAIT #13...92: nam='PX Deq: Execution Msg' ela= 62 sleeptime/senderid=268566527 passes=1
p3=9216340064 obj#=78100 tim=3553633481394
WAIT #13...92: nam='direct path read' ela= 2748 file number=7 first dba=39370251 block cnt=128
obj#=78100 tim=3553633484390
WAIT #13...92: nam='PX Deq: Execution Msg' ela= 64 sleeptime/senderid=268566527 passes=1
p3=9216340064 obj#=78100 tim=3553633497671
WAIT #13...92: nam='direct path read' ela= 4665 file number=7 first dba=39357143 block cnt=128
obj#=78100 tim=3553633502565
WAIT #13...92: nam='PX Deq: Execution Msg' ela= 70 sleeptime/senderid=268566527 passes=1
p3=9216340064 obj#=78100 tim=3553633518439
WAIT #13...92: nam='direct path read' ela= 3001 file number=7 first dba=39337739 block cnt=128
obj#=78100 tim=3553633521674
Compare this to the previous example where the cells offloaded the scan. In that example the cell smart index scan event replaces the direct path read. The enq: KO – fast object checkpoint events for flushing dirty blocks prior to starting the direct path reads are still present. However, they are not shown in this excerpt because they occur in the query coordinator process, not in the parallel slave processes.
Parameters
Just as with the cell smart table scan event, the parameters for cell smart index scan do not contain a lot of details. The cell hash number and the object ID of the segment being scanned are the only information provided:
cell single block physical read
This event is equivalent to the db file sequential read event used on non-Exadata platforms. Single block reads are used most often for index access paths (both the index block reads and the table block reads via rowids from the index lookups). They can also be used for a wide variety of other operations where it makes sense to read a single block.
Event Meaning
Here is the output of a query that shows the operations that resulted in the cell single block physical read wait event on an active production system:
SQL> select event, operation, count(*) from (
2 select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
3 from DBA_HIST_ACTIVE_SESS_HISTORY
4 where event like 'cell single%')
5 group by operation, event
6 order by 1,2,3
7 /
EVENT OPERATION COUNT(*)
--------------------------------- ---------------------------------------- ----------
cell single block physical read 13321
CREATE TABLE STATEMENT 35
DDL STATEMENT 118
DELETE 269
FIXED TABLE FIXED INDEX 3
FOR UPDATE 2
HASH JOIN 4
HASH JOIN RIGHT OUTER 8
INDEX FULL SCAN 9283
INDEX FULL SCAN (MIN/MAX) 1
INDEX RANGE SCAN 2763
INDEX STORAGE FAST FULL SCAN 6
INDEX STORAGE SAMPLE FAST FULL SCAN 13
INDEX UNIQUE SCAN 1676
INSERT STATEMENT 1181
LOAD AS SELECT 6
LOAD TABLE CONVENTIONAL 92
MERGE 106
SELECT STATEMENT 41
SORT ORDER BY 6
TABLE ACCESS BY GLOBAL INDEX ROWID 10638
TABLE ACCESS BY INDEX ROWID 8714
TABLE ACCESS BY LOCAL INDEX ROWID 10446
TABLE ACCESS CLUSTER 12
TABLE ACCESS STORAGE FULL 776
TABLE ACCESS STORAGE SAMPLE 40
UPDATE 8116
As you can see, row access via an index is the most common operation that generates this event. You should also be aware that Exadata provides a large amount of Flash Cache on each storage cell. For that reason, physical reads (both multi-block and single-block) are considerably faster than on most disk-based storage systems, even without offloading. Here is an excerpt from an AWR report showing a histogram of single-block reads for the instance:
% of Waits
-----------------------------------------
Total
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
-------------------------- ----- ---- ---- ---- ---- ----- ----- ---- ----
cell single block physical 2940K 94.4 3.2 .3 .6 .9 .5 .2 .0
Notice that about 95 percent of the cell single block physical read events take less than 1ms. This is fairly representative of several production systems that we have observed.
Parameters
The cell single block physical read event provides more information than most cell events. The parameters allow you to tell exactly which object was read along with providing the disk and cell where the block was stored:
This is another renamed event. It is equivalent to the less clearly named db file scattered read event. On non-Exadata platforms, Oracle Database 11gR2 and 12c still uses the db file scattered read event whenever it issues a contiguous multi-block buffered read to the operating system. The “scattered” in the old event name reflects how the data read from disk is stored in the buffer cache: scattered around. It is not a reflection of how data is read from disk, which is actually contiguous.
Event Meaning
This event is generally used with full table scans and fast full index scans, although it can be used with many other operations. The new name on the Exadata platform is much more descriptive than the older name. For reporting workloads, this wait event is not nearly as prevalent on Exadata platforms as on non-Exadata platforms, because Exadata handles many full scan operations with Smart Scans that have their own wait events (cell smart table scan and cell smart index scan). The cell multiblock physical read event on Exadata platforms is used for serial full scan operations on tables that are below the threshold for serial direct path reads. That is to say, you will see this event used most often on full scans of relatively small tables. On the other hand, if your queries are not offloaded (caused by a very large buffer cache, for instance) you will see lots of these events as well. The threshold before a Smart Scan or direct path read is initiated is very high in these cases.
The event is also used for fast full index scans that are not executed with direct path reads. Here is the output of a query that shows the operations that resulted in the cell multiblock physical read wait event on an active production system:
EVENT OPERATION COUNT(*)
-------------------------------- ---------------------------------------- ----------
cell multiblock physical read 764
DDL STATEMENT 28
INDEX FAST FULL SCAN 2
INDEX STORAGE FAST FULL SCAN 657
INDEX STORAGE SAMPLE FAST FULL SCAN 133
TABLE ACCESS BY INDEX ROWID 74
TABLE ACCESS BY LOCAL INDEX ROWID 1428
TABLE ACCESS STORAGE FULL 5046
TABLE ACCESS STORAGE SAMPLE 916
Parameters
The cell multiblock physical read event also provides more information than most cell events. The parameters in the following list allow you to tell which object was read and identifies the disk and cell where the blocks were stored. The total bytes passed should be a multiple of the block size:
cell list of blocks physical read
This event is a replacement for the db file parallel read event on non-Exadata platforms. It appears that the developers took the opportunity to rename some of the events that are related to disk operations, and this is one of those events. The new name is actually much more descriptive than the previous name, since the wait event has nothing whatsoever to do with parallel query or parallel DML.
Event Meaning
This event is used for a multi-block read of non-contiguous blocks. This is more effective with asynchronous I/O, which is enabled on Exadata by default. This event can be provoked by several operations. The most common are index range scans, index unique scans, and table access by index rowid. The most common reason for the event is index pre-fetching. Here is the output of a query that shows the operations that resulted in this wait event on an Exadata system:
SQL> select event, operation, count(*) from (
2 select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
3 from DBA_HIST_ACTIVE_SESS_HISTORY
4 where event like 'cell list%')
5 group by operation, event
6 order by 1,2,3
7 /
EVENT OPERATION COUNT(*)
---------------------------------------- ----------------------------------- -------------
cell list of blocks physical read 2
INDEX RANGE SCAN 156
INDEX STORAGE FAST FULL SCAN 1
INDEX UNIQUE SCAN 66
TABLE ACCESS BY GLOBAL INDEX ROWID 90
TABLE ACCESS BY INDEX ROWID 1273
TABLE ACCESS BY LOCAL INDEX ROWID 2593
TABLE ACCESS STORAGE FULL 20
TABLE ACCESS STORAGE SAMPLE 1
As you can see, the vast majority of these events were motivated by index access paths. By the way, on non-Exadata platforms, noncontiguous multi-block reads still clock time to the old db file parallel read event. It is also possible for this older wait event name to show up on an Exadata platform for some operations.
Parameters
The cell list of blocks physical read event provides more information than most cell events. The following parameters allow you to tell exactly which object was read along with identifying the disk and cell where the block was stored:
Exadata has an optimization technique that allows the storage cells to do the initialization of blocks when a data file is created or extended. This occurs when a tablespace is created or a data file is manually added to a tablespace. However, it can also occur when a data file is automatically extended during DML operations.
Event Meaning
You have previously read in this chapter that this event seemed out of place in the User I/O class. However, if it occurs because of DML operations, it makes sense to have it in this category. At any rate, offloading the block formatting eliminates CPU usage and I/O from the database servers and moves it to the storage tier. When this occurs, time is collected in the cell smart file creation event. This event replaces the Data file init write event that is still used on non-Exadata platforms. Here is the output of a query from a busy production system showing operations that generated this event:
SYS@EXDB1> select event, operation, count(*) from (
2 select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
3 from DBA_HIST_ACTIVE_SESS_HISTORY
4 where event like 'cell smart file%')
5 group by operation, event
6 order by 1,2,3
7 /
EVENT OPERATION COUNT(*)
-------------------------- ------------------------- --------
cell smart file creation 35
DELETE 3
INDEX BUILD NON UNIQUE 5
LOAD AS SELECT 3
LOAD TABLE CONVENTIONAL 1
UPDATE 1
You will notice that on this particular system, the cell smart file creation event was occasionally generated by a DELETE statement. The fact that a DELETE could cause this event might be a little surprising. But remember that this event is actually timing a section of code that does block formatting, not file creation.
Parameters
The only parameter of interest for this event is P1, which shows which cell was being accessed when this event was generated:
The cell statistics gather event records time spent reading from various V$ and X$ tables. Although the event is grouped in the User I/O category, it does not refer to I/O in the sense of reading and writing to and from disk.
Event Meaning
Time is clocked to this event when a session is reading from the V$CELL family of views and a few other X$ tables in the same category. The event is miscategorized, in our opinion, and does not really belong in the User I/O category. Here is a typical example:
PARSING IN CURSOR #140003020408112 len=79 dep=0 uid=0 oct=3 lid=0 tim=4163271218132 hv=2136254865 ad='a2c23650' sqlid='fcjwbgdzp9acj'
select count(cell_name),cell_name from V$CELL_THREAD_HISTORY group by cell_name
END OF STMT
PARSE #14...12:c=17997,e=23189,p=5,cr=99,cu=0,mis=1,r=0,dep=0,og=1,plh=3108513074,tim=4
WAIT #14...12: nam='Disk file operations I/O' ela= 34 FileOperation=8 fileno=0
filetype=8 obj#=262 tim=4163271218207
EXEC #14...12:c=0,e=54,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3108513074,tim=4...
WAIT #14...12: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0
obj#=262 tim=4163271218304
WAIT #14...12: nam='cell statistics gather' ela= 162 cellhash#=0 p2=0 p3=0 obj#=262 tim=4163
WAIT #14...12: nam='cell statistics gather' ela= 248 cellhash#=0 p2=0 p3=0 obj#=262 tim=4163
WAIT #14...12: nam='cell statistics gather' ela= 236 cellhash#=0 p2=0 p3=0 obj#=262 tim=4163
WAIT #14...12: nam='cell statistics gather' ela= 250 cellhash#=0 p2=0 p3=0 obj#=262 tim=4163
Note You can read more about the V$CELL% family of views in Chapter 11.
The parameters for this event provide no additional information. In fact, values are not even set for the parameters in this event. Here are the parameter definitions:
Minor Events in the User/IO Class
As you read in the introduction, Oracle 12.1.0.2 is the first release in a long time that introduces new Exadata-related wait events. To recap, here are the events again:
Although great care has been taken when preparing this chapter, we could not find a way to generate waits for them with two exceptions: cell single block read request and cell multi-block read request have been found as part of the execution of DBMS_STATS.GATHER_SCHEMA_STATS. The corresponding SQL_IDs were a sure sign that the events have been triggered as part of the stats gathering job. The TOP_LEVEL_SQL_ID could be used to confirm this.
Exadata Wait Events in the System I/O Class
The Exadata wait events that are assigned to the System I/O class are of less importance and do not generally show up as major consumers of time. The backup events are the most interesting as they record time for sections of code that have been optimized on the Exadata platform. The others are simply housekeeping events. The non-backup events are listed in Table 10-1, while the backup events are detailed in the following sections.
Table 10-1. Miscellaneous System I/O Class Events
Event |
Description |
---|---|
cell manager closing cell |
This is a shutdown-related event. The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused. |
cell manager discovering disks |
This is a startup-related event. The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused. |
cell manager opening cell |
This is a startup-related event. The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused. |
This event is used to measure time spent waiting on RMAN when doing an incremental level 1 backup. Exadata optimizes incremental backups by offloading much of the processing to the storage tier. This new wait event was added to account for time spent waiting on the optimized incremental backup processing that is offloaded to the storage cells.
The processing for incremental backups in its unoptimized form is quite resource intensive. The backup process needs to capture every change to the data files after the previous incremental backup. A lot of I/O is required to preform this task as the data files have to be scanned from start to finish. To reduce the time it takes to do a level 1 backup, Oracle introduced block change tracking long before Exadata. Block change tracking is implemented using a small binary file that is updated when redo is generated. The introduction of the block-change-tracking file sped up incremental backups many times.
Event Meaning
Interestingly, an incremental level 0 backup does not result in this wait event written to a trace file even though the word “incremental” is in the RMAN command. That is because the Level 0 backup—despite the name—does not take an incremental backup at all. It generates a full backup that is flagged as a baseline for future incremental backups. Here is an excerpt from a 10046 trace file generated on a process that was creating an incremental level 1 backup, this time on 11.2.0.4 and block change tracking enabled:
*** 2014-07-06 15:23:11.171
*** SESSION ID:(200.471) 2014-07-06 15:23:11.171
*** CLIENT ID:() 2014-07-06 15:23:11.171
*** SERVICE NAME:(SYS$USERS) 2014-07-06 15:23:11.171
*** MODULE NAME:(backup incr datafile) 2014-07-06 15:23:11.171
*** ACTION NAME:(0000048 STARTED16) 2014-07-06 15:23:11.171
WAIT #0: nam='enq: TC - contention' ela= 30442809 name|mode=1413677062 checkpoint ID=65586 0=0
obj#=-1 tim=1404678191171373
WAIT #0: nam='enq: CF - contention' ela= 223 name|mode=1128660997 0=0 operation=0 obj#=-1
tim=1404678191172055
...
WAIT #0: nam='change tracking file synchronous read' ela= 523 block#=9344 blocks=64 p3=0
obj#=-1 tim=1404678192231084
WAIT #0: nam='change tracking file synchronous read' ela= 523 block#=9472 blocks=64 p3=0
obj#=-1 tim=1404678192231665
WAIT #0: nam='change tracking file synchronous read' ela= 501 block#=9920 blocks=64 p3=0
obj#=-1 tim=1404678192232261
WAIT #0: nam='cell smart incremental backup' ela= 150 cellhash#=379339958 p2=0 p3=0
obj#=-1 tim=1404678192234400
WAIT #0: nam='cell smart incremental backup' ela= 170 cellhash#=2133459483 p2=0 p3=0
obj#=-1 tim=1404678192235378
WAIT #0: nam='cell smart incremental backup' ela= 160 cellhash#=3176594409 p2=0 p3=0
obj#=-1 tim=1404678192237663
WAIT #0: nam='KSV master wait' ela= 70 p1=0 p2=0 p3=0 obj#=-1 tim=1404678192243295
WAIT #0: nam='KSV master wait' ela= 1438 p1=0 p2=0 p3=0 obj#=-1 tim=1404678192244762
WAIT #0: nam='ASM file metadata operation' ela= 57 msgop=33 locn=0 p3=0
obj#=-1 tim=1404678192244786
WAIT #0: nam='KSV master wait' ela= 63 p1=0 p2=0 p3=0 obj#=-1 tim=1404678192251632
WAIT #0: nam='KSV master wait' ela= 1340 p1=0 p2=0 p3=0 obj#=-1 tim=1404678192253010
WAIT #0: nam='ASM file metadata operation' ela= 62 msgop=33 locn=0 p3=0
obj#=-1 tim=1404678192253032
WAIT #0: nam='cell smart incremental backup' ela= 198 cellhash#=379339958 p2=0 p3=0
obj#=-1 tim=1404678192255205
WAIT #0: nam='cell smart incremental backup' ela= 214 cellhash#=2133459483 p2=0 p3=0
obj#=-1 tim=1404678192255495
WAIT #0: nam='cell smart incremental backup' ela= 212 cellhash#=3176594409 p2=0 p3=0
obj#=-1 tim=1404678192255802
WAIT #0: nam='cell smart incremental backup' ela= 10 cellhash#=379339958 p2=0 p3=0
obj#=-1 tim=1404678192256723
WAIT #0: nam='cell smart incremental backup' ela= 100 cellhash#=2133459483 p2=0 p3=0
obj#=-1 tim=1404678192256846
You can see the effect of the block-change-tracking file in the change tracking file synchronous read. The actual backup work—scanning the data files for changes since the last backup—is performed on the cells as visible in the cell smart incremental backup event. You can gauge the effectiveness of the offloading to the cells in the v$backup_datafile view. Use the blocks_skipped_in_cell column and put it in relation to the blocks_read.
Parameters
The only parameter used for this event is P1, which shows which cell was responsible for generating the event:
Note The obj# field is a part of many wait events, even some that are not specifically related to an individual object. Be aware that, in some cases, the value may be set by one event and then not cleared appropriately when the wait ends, resulting in meaningless values left in place for the next wait event. In the previous example, the obj# was cleared (set to a value of -1).
cell smart restore from backup
This event is used to measure time spent waiting on RMAN when doing a restore. Exadata optimizes RMAN restores by offloading processing to the storage cells.
Event Meaning
The event actually records time related to file initialization during a restore. Here’s an excerpt from a 10046 trace file taken while a restore was in progress:
WAIT #0: nam='KSV master wait' ela= 68 p1=0 p2=0 p3=0 obj#=-1 tim=1404740066131406
WAIT #0: nam='RMAN backup & recovery I/O' ela= 3320 count=1 intr=256 timeout=4294967295
obj#=-1 tim=1404740066135786
WAIT #0: nam='RMAN backup & recovery I/O' ela= 3160 count=1 intr=256 timeout=4294967295
obj#=-1 tim=1404740066139862
WAIT #0: nam='RMAN backup & recovery I/O' ela= 3201 count=1 intr=256 timeout=4294967295
obj#=-1 tim=1404740066143790
WAIT #0: nam='RMAN backup & recovery I/O' ela= 2 count=1 intr=256 timeout=2147483647
obj#=-1 tim=1404740066143822
WAIT #0: nam='cell smart restore from backup' ela= 853 cellhash#=2133459483 p2=0 p3=0
obj#=-1 tim=1404740066146969
WAIT #0: nam='cell smart restore from backup' ela= 137 cellhash#=379339958 p2=0 p3=0
obj#=-1 tim=1404740066148003
WAIT #0: nam='cell smart restore from backup' ela= 167 cellhash#=3176594409 p2=0 p3=0
obj#=-1 tim=1404740066148948
WAIT #0: nam='cell smart restore from backup' ela= 269 cellhash#=2133459483 p2=0 p3=0
obj#=-1 tim=1404740066149335
WAIT #0: nam='cell smart restore from backup' ela= 261 cellhash#=379339958 p2=0 p3=0
obj#=-1 tim=1404740066149767
WAIT #0: nam='cell smart restore from backup' ela= 375 cellhash#=3176594409 p2=0 p3=0
obj#=-1 tim=1404740066150294
WAIT #0: nam='cell smart restore from backup' ela= 124 cellhash#=3176594409 p2=0 p3=0
obj#=-1 tim=1404740066152236
Parameters
The only parameter used for this event is P1, which shows which cell was responsible for generating the event.
Exadata Wait Events in the Other and Idle Classes
These are relatively minor events that occur primarily during startup and shutdown of storage cells and fault conditions. You will probably not see them on normally functioning systems. There is one exception to this, the cell smart flash unkeep event. Table 10-2 lists the “cell” wait events in the Other class along with their parameters. A separate section will cover cell smart flash unkeep
Table 10-2. Miscellaneous Other and Idle Class Events
Event |
Description |
---|---|
cell manager cancel work request |
This event is not very informative, as all three of the parameters (P1, P2, P3) from the v$session_wait view are unused. |
cell worker online completion |
This appears to be a startup event. The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused. |
cell worker retry |
The cell hash number is contained in the P1 column of the v$session_wait view for this event. The P2 and P3 columns are unused. |
cell worker idle |
The P1, P2, and P3 columns from the v$session_wait view are unused in this idle event. |
.
This event records the time spent waiting when Oracle must flush blocks out of Exadata Smart Flash Cache. This can occur when a table that has a storage clause designating that it be pinned in Exadata Smart Flash Cache is truncated or dropped. This is an important distinction. Beginning with Exadata 11.2.3.3.0, the cell software will automatically cache data from single block IO as well as Smart Scans. This allows the next scan to read from disk and flash simultaneously, resulting in much better performance. Prior to 11.2.3.3.0, it was necessary to manually pin tables and partitions in Flash Cache. It is easier to understand this with an example. Consider the following use case: Table T1 in database DB12C has a data object ID of 79208. After the table has been created and light query activity against it in form of Smart Scans, you can observe a pattern similar to this one:
[oracle@enkdb03 ~]$ dcli -g cell_group -l cellmonitor
> "cellcli -e list flashcachecontent where objectNumber = 79208 detail" | grep cache
enkcel04: cachedKeepSize: 0
enkcel04: cachedSize: 4374528
enkcel04: cachedWriteSize: 4358144
enkcel05: cachedKeepSize: 0
enkcel05: cachedSize: 5521408
enkcel05: cachedWriteSize: 5521408
enkcel06: cachedKeepSize: 0
enkcel06: cachedSize: 5177344
enkcel06: cachedWriteSize: 5177344
As you can see, the cache is already populated—thanks to the Smart Scans— without having specified the KEEP keyword in the CELL_FLASH_KEEP clause. You can also see that the KEEP size is 0, for the same reason.
Event Meaning
Truncating or dropping the table does not have an immediate effect for the Smart Flash Cache at this stage. So, when would you see the event?
Back to the CELL_FLASH_CACHE clause: As soon as the storage clause is changed to KEEP, the cell will cache data from the table, populating the values for cachedKeepSize. In the previous example, it required a few more queries until the keep cache is populated:
[oracle@enkdb03 ~]$ dcli -g cell_group -l cellmonitor
> "cellcli -e list flashcachecontent where objectNumber = 79208 detail" | grep cache
enkcel04: cachedKeepSize: 1239744512
enkcel04: cachedSize: 1243955200
enkcel04: cachedWriteSize: 5365760
enkcel05: cachedKeepSize: 1406525440
enkcel05: cachedSize: 1410203648
enkcel05: cachedWriteSize: 5980160
enkcel06: cachedKeepSize: 1355415552
enkcel06: cachedSize: 1358880768
enkcel06: cachedWriteSize: 6381568
If you truncate the table now, it will show you the cell smart flash unkeep event. Here is a tkprof’d example for the truncate statement:
truncate table t1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 6 0 0
Execute 1 0.24 0.57 12 480 5667 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.24 0.58 12 486 5667 0
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 198
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
enq: IV - contention 10 0.00 0.00
row cache lock 4 0.00 0.00
Disk file operations I/O 1 0.00 0.00
enq: RO - fast object reuse 6 0.00 0.00
reliable message 2 0.00 0.00
log file sync 2 0.00 0.00
cell smart flash unkeep 18 0.01 0.01
cell single block physical read 12 0.00 0.00
local write wait 3 0.00 0.00
gc current grant 2-way 16 0.00 0.00
DFS lock handle 1452 0.00 0.30
gc current grant busy 7 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 4.00 4.00
********************************************************************************
When reviewing the raw trace file, you will note that the cell smart flash unkeep event is preceded by a little more than a handful of enq: RO – fast object reuse events, which are used to mark time associated with cleaning up the buffer cache after a drop or truncate. The cell smart flash unkeep is basically an extension of that event to clean up the Exadata Smart Flash Cache on the storage server as well.
Parameters
The only parameter used for this event is P1, which shows which cell was responsible for generating the event:
In addition to the new cell events, there are also a few non-Exadata-specific wait events that you should be aware of. These are events that you may already be familiar with from managing Oracle on other platforms. They happen to also be important in an Exadata environment, so they represent cases in which your existing knowledge and skill can carry over and stand you in good stead as you move in to managing Exadata.
Direct path reads are used by Oracle to read data directly into PGA memory (instead of into the buffer cache). They are an integral part of Exadata offloading because SQL processing can only be offloaded to the storage cells when the direct path read mechanism is used. The direct path read wait event is actually replaced by the cell smart table scan and cell smart index scan wait events when a query is offloaded. However, the direct path read mechanism is still used by the code covered by those new wait events. That is, either the plan must include a parallel (table) scan or Oracle must decide to use the serial direct path read mechanism as described in Chapter 2.
Event Meaning
This event records time that Oracle spends waiting on a direct path read to complete. You should know that the direct path read wait event can be very misleading. As with the Smart Scan events, both the number of events recorded and the timings associated with them can appear to be inaccurate. This is due to the fact that direct path reads are done in an asynchronous and overlapping fashion. In essence, this event is not an I/O latency wait because of the asynchronous/overlapping nature of how Oracle fires off I/O requests, while the classic, synchronous I/O events are. It also bears mentioning that Oracle—from 11gR2 onward—contains an enhancement that causes serial direct path reads to occur more frequently than in previous releases. See MOS Note 793845.1, which briefly mentions this change. Many DBAs must have wondered why the I/O profile changed after the database was migrated from 10.2 to 11.2 and 12c.
Although not as relevant as it is on other platforms, the direct path read wait event does still show up on the Exadata platform for various operations but generally not for full table scans unless the table (or partition) is relatively small. The other case where you might see a direct path read is when the scan of the segment in question is not offloadable, such as for an Index Organized Table (IOT). An example for a 10046 trace and direct path reads is shown in the description of cell smart table scan.
Parameters
The parameters for this event show you exactly which segment (obj) is scanned and which set of blocks were scanned during this event:
As mentioned in the cell smart table scan section, the parameters contain more specific information about which file and object are being accessed. The offset into the file is also provided in the P2 parameter, along with the number of contiguous blocks read in the P3 parameter.
Enq: KO—fast object checkpoint
The enq:KO event has a very strange name. Don’t be put off by that. The event is essentially an object checkpoint event. The V$LOCK_TYPE view describes the KO lock as follows:
SQL> select type, name, description from v$lock_type
2 where type = 'KO';
TYPE NAME DESCRIPTION
----- ------------------------------ ---------------------------------------------
KO Multiple Object Checkpoint Coordinates checkpointing of multiple objects
Event Meaning
This event is used when a session is waiting for all dirty blocks to be flushed from the buffer cache for a segment prior to starting a direct path read or cell smart table scan or cell smart index scan. This event is important because the time required to do the checkpoint may outweigh the benefit of the direct path reads. This is unlikely on Exadata storage, though, where the additional Smart Scan benefits are only enabled by the direct path read mechanism. Nevertheless, the execution engine tries to factor the checkpoint cost in before the decision is made to offload the query to the cells. Chapter 2 contains a lot more information about the mechanism.
Parameters
The parameters for this event are not overly helpful, but the event does show which object is scanned. Here are the parameter definitions:
The reliable message event is used to record time spent communicating with background processes, like the checkpoint process (CKPT). We have included it here because of its close association with the enq: KO – fast object checkpoint event.
Event Meaning
In Oracle 11.2 and later, this event is the precursor to the enq: KO – fast object checkpoint event (among others). The communication is done using an inter-process communication channel rather than a more normal post mechanism. This communication method allows the sender to request an ACK before it continues; thus, the reason it is called a reliable message. It is generally a very short duration event as it only records time for communicating between processes. Both the users foreground process and the CKPT process will wait on this event as they communicate with each other. Here is an excerpt of a 10046 trace file showing a complete reliable message event, taken from a 11.2.0.4 database:
=====================
PARSING IN CURSOR #140457550697520 len=27 dep=0 uid=44 oct=3 lid=44 tim=1404832965816981
hv=521453784 ad='25d376810' sqlid='cdjur50gj9h6s'
select count(*) from bigtab
END OF STMT
PARSE #140457550697520:c=1000,e=1072,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2140185107,
tim=1404832965816980
EXEC #1...0:c=0,e=35,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2140185107,tim=1404832965817073
WAIT #1...0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0
obj#=18532 tim=1404832965817131
WAIT #1...0: nam='reliable message' ela= 1230 channel context=10035861184 channel
handle=10055026416 broadcast message=10199413976 obj#=18532 tim=1404832965818567
WAIT #1...0: nam='enq: KO - fast object checkpoint' ela= 189 name|mode=1263468550 2=65571 0=1
obj#=18532 tim=1404832965818827
WAIT #1...0: nam='enq: KO - fast object checkpoint' ela= 107 name|mode=1263468545 2=65571 0=2
obj#=18532 tim=1404832965819004
WAIT #1...0: nam='cell smart table scan' ela= 199 cellhash#=2133459483 p2=0 p3=0 obj#=18532
tim=1404832965820790
WAIT #1...0: nam='cell smart table scan' ela= 182 cellhash#=3176594409 p2=0 p3=0 obj#=18532
tim=1404832965821746
WAIT #1...0: nam='cell smart table scan' ela= 163 cellhash#=379339958 p2=0 p3=0 obj#=18532
tim=1404832965822672
Parameters
Here are the parameters for the reliable message event:
Before wrapping up the chapter, we will discuss a few Resource Manager events that you should be aware of. While these are not specific to Exadata, Resource Manager provides key functionality for combining mixed workloads on Exadata. There are actually eight separate events as of release 12.1.0.2. The following query against V$EVENT_NAME shows these events and their parameters:
SQL> select name,parameter1,parameter2,parameter3,wait_class
2 from v$event_name where name like 'resmgr%' order by name;
NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS
------------------------------ --------- -------------------- ---------- ------------
resmgr:become active location Scheduler
resmgr:cpu quantum location consumer group id Scheduler
resmgr:internal state change location Concurrency
resmgr:internal state cleanup location Other
resmgr:large I/O queued location Scheduler
resmgr:pq queued location Scheduler
resmgr:sessions to exit location Concurrency
resmgr:small I/O queued location Scheduler
8 rows selected.
There are only three of these events that are of interest.
You will see this event when sessions are waiting to begin execution. Consider, for example, where you define a consumer group to have a maximum number of concurrent sessions at a given point in time. Any further session will wait on “resmgr:become active” before it can start its work.
Event Meaning
The event indicates sessions are being held back by Resource Manager from beginning execution. If you create a plan directive limiting the number of concurrent sessions in a consumer group, you will see output similar to this:
SQL> select sid,serial#,username,seq#,event,resource:consumer_group rsrc_cons_grp
2 from v$session where username = 'LMTD';
SID SERIAL# USERNAME SEQ# EVENT RSRC_CONS_GRP
---------- ---------- ---------- ---------- ------------------------------ -----------------
916 1413 LMTD 31124 cell smart table scan LMTD_GROUP
1045 2667 LMTD 29480 cell smart table scan LMTD_GROUP
1112 20825 LMTD 42 resmgr:become active LMTD_GROUP
1178 91 LMTD 43 resmgr:become active LMTD_GROUP
1239 22465 LMTD 46 resmgr:become active LMTD_GROUP
1311 19703 LMTD 47 resmgr:become active LMTD_GROUP
1374 32129 LMTD 44 resmgr:become active LMTD_GROUP
1432 29189 LMTD 40 resmgr:become active LMTD_GROUP
8 rows selected.
To be more precise, the number of concurrent sessions for consumer group LMTD_GROUP is set to 2. Here is the corresponding configuration example:
BEGIN
dbms_resource:manager.clear_pending_area();
dbms_resource:manager.create_pending_area();
dbms_resource:manager.UPDATE_PLAN_DIRECTIVE(
plan => 'ENKITEC_DBRM',
group_or_subplan => 'LMTD_GROUP',
new_active_sess_pool_p1 => 2);
dbms_resource:manager.validate_pending_area();
dbms_resource:manager.submit_pending_area();
end;
/
With this setting, you will never have more than the defined number of sessions performing work at the same time.
Parameters
Here are the parameters for this event. Note that the obj# parameter exists but is not used:
The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code. There are at least five distinct locations that we have observed. Unfortunately, Oracle does not publicly document where in the Oracle kernel these checks are performed.
resmgr:cpu quantum
This event is used to record forced idle time imposed by Database Resource Manager (DBRM) due to competition with higher priority work. Said another way, it is the time a process spent waiting for DBRM to allocate it a time slice. Interestingly, at of the time of this writing, no database event reflects throttling on the I/O Resource Manager level.
Event Meaning
DBRM behaves in an analogous manner to CPU scheduling algorithms in that it divides time into units (quantum) and either allows a process to run or not, depending on other workload on the system. Unlike CPU scheduling algorithms, though, DBRM throttling is interjected at key locations in the Oracle code to eliminate the possibility of a process being kicked off of the CPU when it is holding a shared resource such as a latch. This prevents some nasty behavior that may occur on heavily loaded systems such as priority inversion problems. In effect, the processes voluntarily go to sleep when they are not holding these shared resources. There are multiple locations in the code where these checks are implemented. Here’s an excerpt of a 10046 trace file showing the resmgr:cpu quantum event:
PARSING IN CURSOR #140473587700424 len=30 dep=1 uid=208 oct=3 lid=208 tim=4695862695328
hv=2336211758 ad='22465dc70' sqlid='gq01wgy5mzhtf'
SELECT COUNT(*) FROM MARTIN.T1
END OF STMT
EXEC #1...4:c=0,e=97,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3724264953,tim=4695862695326
WAIT #1...4: nam='enq: KO - fast object checkpoint' ela= 216 name|mode=1263468550 2=65683 0=2
obj#=79208 tim=4695862695834
WAIT #1...4: nam='reliable message' ela= 1126 channel context=10126085744 channel
handle=10164815328 broadcast message=10179052064 obj#=79208 tim=4695862697054
WAIT #1...4: nam='enq: KO - fast object checkpoint' ela= 160 name|mode=1263468550
2=65683 0=1 obj#=79208 tim=4695862709511
WAIT #1...4: nam='enq: KO - fast object checkpoint' ela= 177 name|mode=1263468545
2=65683 0=2 obj#=79208 tim=4695862709811
WAIT #1...4: nam='cell smart table scan' ela= 1830 cellhash#=3249924569 p2=0 p3=0
obj#=79208 tim=4695862713420
WAIT #1...4: nam='cell smart table scan' ela= 326 cellhash#=674246789 p2=0 p3=0 obj#=79208
tim=4695862714610
WAIT #1...4: nam='cell smart table scan' ela= 362 cellhash#=822451848 p2=0 p3=0 obj#=79208
tim=4695862715784
WAIT #1...4: nam='resmgr:cpu quantum' ela= 278337 location=2 consumer group id=80408 =0
obj#=79208 tim=4695863561098
WAIT #1...4: nam='cell smart table scan' ela= 2713 cellhash#=822451848 p2=0 p3=0 obj#=79208
tim=4695863564184
Parameters
Here are the parameters for this event. Note that the obj# parameter exists but is not used:
The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code as described above in the resmgr:become active event.
The consumer group number in the P2 parameter is pretty self-explanatory. It maps to the CONSUMER_GROUP_ID column in the DBA_RSRC_CONSUMER_GROUPS view. This parameter allows you to tell what consumer group a process was assigned to when its CPU usage was curtailed.
resmgr:pq queued
This event is used to record time spent waiting in the parallel query queue. Parallel statement queueing is a new feature introduced with 11g Release 2. One part of the new functionality that is enabled by setting parallel_degree_policy to Automatic is statement queueing. In Oracle 12c, you can also set the value to ADAPTIVE for the same purpose. Statement queuing allows you to queue for available parallel server slaves if the required degree of parallelism cannot be met instead of being downgraded, as with the old parallel automatic tuning option. Refer back to Chapter 6 for more information about parallel execution on Exadata.
Event Meaning
The parallel statement queuing feature comes with its own wait event. Statements that are queued due to insufficient parallel server processes or other directives clock time to this event. Here is an excerpt of a 10046 trace file showing the resmgr:pq queued event:
PARSING IN CURSOR #140583609334256 len=68 dep=0 uid=205 oct=3 lid=205 tim=4755825333777
hv=2381215807 ad='220a20bc0' sqlid='944dxty6ywy1z'
select /*+ PARALLEL(32) STATEMENT_QUEUING */ count(*) from martin.t3
END OF STMT
PARSE #140583609334256:c=0,e=188,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3978228158,
tim=4755825333776
WAIT #140583609334256: nam='resmgr:pq queued' ela= 25193140 location=1 =0 =0 obj#=-1
tim=4755850527127
WAIT #140583609334256: nam='reliable message' ela= 889 channel context=10126085744 channel
handle=10164818472 broadcast message=10179043672 obj#=-1 tim=4755850529508
WAIT #140583609334256: nam='enq: KO - fast object checkpoint' ela= 197 name|mode=1263468550
2=65690 0=1 obj#=-1 tim=4755850529777
WAIT #140583609334256: nam='enq: KO - fast object checkpoint' ela= 124 name|mode=1263468545
2=65690 0=2 obj#=-1 tim=4755850529959
WAIT #140583609334256: nam='PX Deq: Join ACK' ela= 39 sleeptime/senderid=268500992 passes=1
p3=9216157728 obj#=-1 tim=4755850531333
WAIT #140583609334256: nam='PX Deq: Join ACK' ela= 42 sleeptime/senderid=268500993 passes=1
p3=9216246432 obj#=-1 tim=4755850531420
WAIT #140583609334256: nam='PX Deq: Join ACK' ela= 39 sleeptime/senderid=268500994 passes=1
p3=9216266144 obj#=-1 tim=4755850531507
While testing this use case, it turned out that the event is written to the tracefile only once, at the very beginning. Furthermore, the trace information about the queued statement is not emitted until the session actually starts work.
Parameters
Here are the parameters for the resmgr:pq queued event:
The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code as described above in the resmgr:become active event.
Summary
The wait interface has been expanded to cover several Exadata-specific features. In this chapter, you read about the new wait events you should know. By far the most interesting of the new events are cell smart table scan and cell smart index scan. These events cover the time spent waiting on an offloadable read I/O request to a storage cell. There is a lot of processing that occurs at the storage layer that is lumped together under these events. It’s important to understand that these events replace the direct path read event and that the mechanism of returning the data directly to the process PGA employed by the Smart Scan events is analogous to the way it is handled by direct path read.