CHAPTER 17

image

Unlearning Some Things We Thought We Knew

Oracle can do some things very differently when running on Exadata than when running on non-Exadata platforms. The optimizations provided by Exadata are designed to take a different approach than Oracle has traditionally followed. This change means that you need to attack some problems with a different mindset. That is not to say that everything is different—quite the contrary. In fact, most of the fundamental principles remain unchanged. After all, the same database software runs on Exadata that runs on other platforms. But there are some things that are just different. As you read in Chapter 13, you can just take your database and deploy it 1:1 on Exadata. Depending on the type of deployment, this might be OK. However, if you are interested in getting the most out of your investment, you should probably take a step back and review what you can do to further optimize the database on Exadata. Since there are a few things that are different on Exadata compared to a standard, non-Exadata deployment, this chapter is worth reading for phase 2 of your migration. In this chapter, we will focus on how we should change our approach when running databases on Exadata.

A Tale of Two Systems

The way we think about systems running on Exadata largely depends on the workload being executed. Online-Transaction-Processing (OLTP)-oriented workloads tend to focus us on using Smart Flash Cache for speeding up small physical reads. But frankly, this type of workload is not able to take advantage of most of the performance advantages provided by Exadata. Data-Warehouse (DW)-oriented workloads tend to focus on making use of Smart Scans at every opportunity and trying to use all the available resources (CPU resources on the storage and database tiers, for example). This is where Exadata’s built-in advantages can have the most impact. Unfortunately, most systems exhibit characteristics of both DW and OLTP workloads. These “mixed workloads” are the most difficult and, ironically, the most common. They are the most common because it is rare to see an OLTP-oriented system that does not have some reporting component that produces long-running, throughput-sensitive queries. It is also quite common to see DW-oriented workloads that have OLTP-like trickle feeds or similar Extraction, Load, Transformation (ELT) processing needs. These combination systems require the most difficult thought process because, depending on the issue at hand, you will have to be constantly resetting your approach to managing performance. They are also difficult because DW workloads are generally constrained by the data-flow dynamics that control throughput, while OLTP systems are usually constrained by latency issues. Hence, for mixed workloads, you basically need to train yourself to evaluate each scenario and categorize it as either latency-sensitive or throughput-sensitive. This evaluation of workload characteristics should be done prior to beginning any analysis.

If you thought that was an interesting challenge, a third variable can be thrown into the equation. Very few Exadata systems deployed today are home to just one single application. Whereas it used to be quite common to place the critical application on Exadata when the first two generations were made available, the increase in capacity for both Smart Flash Cache as well as disk and the ever-increasing CPU power offered by Intel make Exadata an interesting consolidation platform. This has been recognized by many users and, instead of one database with mixed workload requirements, you quite often have to arrange the needs of multiple databases—potentially with mixed workload requirements—on your hardware. Thankfully, Oracle offers tools to deal with this situation.

OLTP-Oriented Workloads

Even though there is little to say about running OLTP workloads on Exadata, there are a handful of points to keep in mind with this type of system. Since Exadata runs standard Oracle database software, you should not have to adjust your basic approach significantly.

Exadata Smart Flash Cache (ESFC)

The key component of Exadata when it comes to OLTP workloads is Exadata Smart Flash Cache (ESFC), which can significantly reduce disk-access times for small reads. For that reason, it is important to verify that ESFC is working correctly. For this type of workload, you should also expect that a large percentage of physical I/O operations are being satisfied by ESFC. This can be inferred fairly easily by looking at the average single-block read times. A single-block read should take approximately 0.5 ms if it is satisfied by Flash Cache. By contrast, single-block reads take on average approximately 5 ms if they are satisfied by actual disk reads. Standard AWR reports provide both average values and a histogram of wait events. If the average single-block read times are well above the 1 ms range, you should be looking for a systemic problem—such as flash cards that are not working or a critical table has been defined to never be cached—using the CELL_FLASH_CACHE NONE syntax. I/O Resource Management should also be checked. The histograms should be used as well to verify that the average is not covering up a significant number of outliers. Here is the cellcli syntax to check the status of the flash cards:

CellCLI> list flashcache detail
      name:                   dm01cel03_FLASHCACHE
      cellDisk:               FD_00_dm01cel03,FD_01_dm01cel03,FD_02_dm01cel03,
                              FD_03_dm01cel03,FD_04_dm01cel03,FD_05_dm01cel03,
                              FD_06_dm01cel03,FD_07_dm01cel03,FD_08_dm01cel03,
                              FD_09_dm01cel03,FD_10_dm01cel03,FD_11_dm01cel03,
                              FD_12_dm01cel03,FD_13_dm01cel03,FD_14_dm01cel03,FD_15_dm01cel03
      creationTime:           2010-03-22T17:39:46-05:00
      id:                     850be784-714c-4445-91a8-d3c961ad924b
      size:                   365.25G
      status:                 critical

Note that the status attribute on this cell is critical. As you might expect, this is not a good thing. On this particular system, the Flash Cache had basically disabled itself. We noticed it because the single-block read times had slowed down. This example is from an early version of cellsrv. The later versions include a little more information. Here is an example from cellsrv 12.1.2.1.0:

CellCLI> list flashcache detail
         name:                   enkx4cel01_FLASHCACHE
         cellDisk:               FD_04_enkx4cel01,FD_06_enkx4cel01,FD_11_enkx4cel01,
                                 FD_02_enkx4cel01,FD_13_enkx4cel01,FD_12_enkx4cel01,
                                 FD_00_enkx4cel01,FD_14_enkx4cel01,FD_03_enkx4cel01,
                                 FD_09_enkx4cel01,FD_10_enkx4cel01,FD_15_enkx4cel01,
                                 FD_08_enkx4cel01,FD_07_enkx4cel01,FD_01_enkx4cel01,
                                 FD_05_enkx4cel01
         creationTime:           2015-01-19T21:33:37-06:00
         degradedCelldisks:
         effectiveCacheSize:     5.8193359375T
         id:                     3d415a32-f404-4a27-b9f2-f6a0ace2cee2
         size:                   5.8193359375T
         status:                 normal

Notice the new attribute degradedCelldisks. Also notice that the Flash Cache on this cell shows a status of normal. Monitoring storage software behavior is covered in more detail in Chapter 12.

Scalability

Another thing to keep in mind when dealing with OLTP workloads is that the Exadata platform provides exceptional scalability. Upgrading from a half rack to full rack doubles the number of CPUs at both the database layer and the storage layer. The amount of ESFC is also doubled, as is the available memory. This allows Exadata to scale in a nearly linear fashion for many systems.

To add to this point, we can share a little anecdote. When presenting Exadata Smart Scan capabilities during workshops, we used to run a query against a table with all Exadata optimizations disabled, purely hard-disk access at first against an X2-2 quarter rack. When the X4-2 was introduced, we were lucky to have access to an X4-2 half rack. CPU differences aside, the difference between using three cells vs. seven cells was staggering. Add to the mix that most of the table was served via ESFC and you can imagine response times dropping drastically. In order to get back to the timings we were used to, we had to significantly increase the size of the table. This is just one example where Exadata scales very nicely, without any change to the application itself.

Write-Intensive OLTP Workloads

Write-intensive workloads are a subset of OLTP-oriented systems. There are some systems that just constantly perform single-row inserts followed by a commit or, using different words, employing the slow-by-slow (row-by-row) approach. These systems are often limited by the speed at which commits can be done, which often depends on the speed with which writes to the log files can be accomplished. This is one area where Exadata competes with other platforms on a fairly even playing field when operating ESFC in the default write-through mode. There are no major enhancements that make Exadata run orders of magnitudes faster for systems that are bottlenecked on write operations. Flash Logging can help here. Beginning with cellsrv 11.2.2.4, Exadata performs redo writes to disk and Flash devices simultaneously. The one that completes first “wins,” allowing the log writer to continue processing while the other write is completed. Additionally, the cache on the disk controllers has doubled to 1GB with the X5 hardware generation.

Smart Flash Logging is no magic bullet, and established methods of tuning— such as minimizing commits—are much more appropriate for these types of systems. You would be amazed what changing from row-by-row processing to set-based processing can do in terms of performance, without even having to think about hardware! Many web sites, most prominently perhaps Tom Kyte’s, Ask Tom have lots of references showing how row-by-row processing followed by commits every time some work has been performed are suboptimal for performance.

Log Writer is not the only component that longs for low latency I/O performance. Until Exadata version 11.2.3.2.1, using the Smart Flash Cache in write-through mode was the only available option. Beginning with 11.2.3.2.1, it is possible to use Flash Cache in write-back mode. Switching to write-back mode does not remove the need for Flash Log, by the way—after the switch you will still find it defined. Enabling write-back mode can improve performance for write-intensive workloads, as you saw in Chapter 5. However, you should assess carefully if using Write Back Cache is worth it, using the information provided in the aforementioned Chapter 5 on Flash Cache.

DW-Oriented Workloads

Exadata was initially designed to speed up long-running queries against large volumes of data. Therefore, it should come as no surprise that data warehousing is where it is important to change some of our basic thought processes. Of course, one of the major techniques is to be constantly looking for opportunities to allow Exadata optimizations to kick in. This means making sure that the application can take advantage of Smart Scans.

Enabling Smart Scans

The most important concept to keep in mind when dealing with DW-oriented workloads is that long-running statements should usually be offloaded. Here are the steps to follow:

  1. Determine whether Smart Scans are being used.
  2. If Smart Scans are not being used, fix things so that they will be used.

These points seem so obvious that it should not really be necessary to repeat them. A large portion of the optimizations built into Exadata work only when Smart Scans are used. One of the first changes you need to make in the way you think is to train yourself to be constantly considering whether Smart Scans are being used appropriately or not. This means you need to have a good understanding of which statements (or parts of statements) can be offloaded and be able to determine whether statements are being offloaded or not. The requirements for Smart Scans and some of the techniques that can be used to verify whether they are being performed or not have been covered extensively throughout the book. But at the risk of being repetitive, here you go again.

Essentially, there are two main prerequisites that must be met in order for Smart Scans to occur. The first is that the optimizer must choose to do a full scan of a table or a materialized view, or the optimizer must choose to do a fast full scan of an index. Note that Smart Scans are not limited to queries or even to sub-queries. The optimizer can also choose to use full scans for DELETEs and UPDATEs when a large percentage of the rows will be affected. However, if your application is doing this, you might want to consider modifying it to do something like a truncate and rebuild. As the saying goes, “it depends”.

The second requirement for Smart Scans is that the scans must be performed using the direct path read mechanism. Note that the optimizer was not mentioned intentionally in the description of the second requirement. This is because the optimizer does not make the decision about whether to use direct path reads or not. It is a heuristic decision that is made after the plan has been determined. As such, it is not directly exposed by any of the tools like explain plan or other performance-related utilities. What this means in practice is that it is easy to verify that the first requirement has been met, but more challenging to verify the second requirement.

On most Exadata implementations, a fairly high percentage of long-running queries are offloaded. You can check to see what percentage of your long-running SQL statements have been offloaded by selecting all the statements from v$sql (or from the AWR table DBA_HIST_SQLSTAT provided you have the license for it) that have an average run time of over some number of seconds, or that have an average logical I/O value that is greater than some reasonable value. Actually, the logical I/O is a better metric to use, as some of the offloaded statements will run very quickly and may not meet your minimum time criteria, which will give you a distorted perspective on the percentage of statements that are being offloaded. Here is an example (note that the scripts are included in the online code repository):

SQL> @offload_percent
Enter value for sql_text:
Enter value for min_etime:
Enter value for min_avg_lio: 500000

     TOTAL  OFFLOADED OFFLOADED_%
-------------------------------
        13         11      84.62%

SQL> /
Enter value for sql_text: SELECT%
Enter value for min_etime:
Enter value for min_avg_lio: 500000

     TOTAL  OFFLOADED OFFLOADED_%
-------------------------------
        11         11        100%

The listing makes use of the offload_percent.sql script, which calculates a percentage of statements currently in the shared pool that have been offloaded. It was initially used to evaluate all statements that had over 500,000 logical I/Os. It was run a second time where the investigation was limited to statements that begin with the word SELECT. In the next listing, you can see the output of a different script (fsxo.sql) that allows you to see the actual statements that contribute to the OFFLOAD_% calculated in the previous listing:

SQL> @fsxo
Enter value for sql_text:
Enter value for sql_id:
Enter value for min_etime:
Enter value for min_avg_lio: 500000
Enter value for offloaded:

SQL_ID         EXECS  AVG_ETIME OFFLOAD IO_SAVED_% SQL_TEXT
----------------------------------------------------------------------------
0bvt5z48t18by      1        .10 Yes         100.00 select count(*) from skew3 whe
0jytfr1y0jdr1      1        .09 Yes         100.00 select count(*) from skew3 whe
12pkwt8zjdhbx      1        .09 Yes         100.00 select count(*) from skew3 whe
2zbt555tg123s      2       4.37 Yes          71.85 select /*+ parallel (a 8) */ a
412n404yughsy      1        .09 Yes         100.00 select count(*) from skew3 whe
5zruc4v6y32f9      5      51.13 No             .00 DECLARE job BINARY_INTEGER :=
6dx247rvykr72      1        .10 Yes         100.00 select count(*) from skew3 whe
6uutdmqr72smc      2      32.83 Yes          71.85 select /* avgskew3.sql */ avg(
7y09dtyuc4dbh      1       2.87 Yes          71.83 select avg(pk_col) from kso.sk
b6usrg82hwsa3      5      83.81 No             .00 call dbms_stats.gather_databas
fvx3v0wpvxvwt      1      11.05 Yes          99.99 select count(*) from skew3 whe
gcq9a53z7szjt      1        .09 Yes         100.00 select count(*) from skew3 whe
gs35v5t21d9yf      1       8.02 Yes          99.99 select count(*) from skew3 whe

13 rows selected.

The fsxo.sql script provides the same limiting factors as the offload_percent.sql script, namely a minimum average elapsed time and a minimum average logical I/O. It also optionally allows you to limit the statements to only those that are offloaded or those that are not offloaded. Please refer to the scripts for further details and keep in mind that these techniques can also be applied to the data recorded by AWR for a historical perspective.

In the next section, we will discuss some issues that can complicate your efforts to enable Smart Scans.

Things That Can Cripple Smart Scans

There are several common coding “techniques” that either disable Smart Scans completely or cause them to be much less effective than they could be. Some of the techniques are just bad practices, regardless of whether you are on the Exadata platform or not. Others do not carry as significant a penalty on non-Exadata platforms, but, when run on Exadata, they can prevent the storage software from doing all that it could do. That is a common observation that can be made when development is located on a non-Exadata platform. Such a system makes developing a well-performing application difficult at best. Ensuring that an application performs on a non-Exadata system will most likely use techniques that do not lend themselves for best performance on the Exadata platform, and additional work is needed to optimize the code once on Exadata.

Many of these issues have been discussed throughout this book. Here are a few that you should keep in mind because of the fundamentally different behavior on the Exadata platform.

Functions in WHERE Clauses

Oracle provides a large set of functions that can be applied directly in SQL statements. As discussed in Chapter 2, not all of those functions are offloadable. Knowing which functions are not offloadable is important because the use of those functions in WHERE clauses disables predicate filtering that may otherwise provide a massive reduction in the amount of data to be transferred back to the database layer. Obviously, custom-written PL/SQL functions fall into the category of “non-offloadable” functions as well.

This issue is somewhat counterintuitive since you are often doing full table scans anyway with data warehouse systems. On non-Exadata platforms, applying a function in a WHERE clause of a statement that is executed via a full table scan does not impose much of a penalty with regard to the amount of data that must be returned because the database must already return all blocks from the table to the database server. With Exadata, though, applying a function that can disable predicate filtering can impose a huge performance penalty. By the way, using custom PL/SQL functions in a WHERE clause is generally also a bad idea on non-Exadata platforms because additional CPU will be required to process PL/SQL for each row, as opposed to the optimized functions, based on C code, provided by Oracle.

Image Note  You can query V$SQLFN_METADATA to see which functions are offloadable.

Furthermore, “offloadable” functions can also impose large performance penalties. Here is a very simple example showing the negative effect of an offloadable function in a WHERE clause:

SQL> select /* example001 */ count(*) from SALES where prod_id < 1;

  COUNT(*)
----------
         0

Elapsed: 00:00:00.75
SQL> select /* example001 */ count(*) from SALES where abs(prod_id) < 1;

  COUNT(*)
----------
         0

Elapsed: 00:00:34.26
SQL> @fsx4
Enter value for sql_text: %example001%
Enter value for sql_id:

SQL_ID         CHILD OFFLOAD IO_SAVED_%  AVG_ETIME SQL_TEXT
------------- ------ ------- ---------- ---------- ----------------------------------------
7ttw461bngzn0      0 Yes         100.00        .75 select /* example001 */ count(*) from SA
fktc9145xy6qg      0 Yes          99.98      33.81 select /* example001 */ count(*) from SA

Elapsed: 00:00:00.08
SQL> select name, offloadable from v$sqlfn_metadata
  2  where name = 'ABS';

NAME                           OFF
------------------------------ ---
ABS                            YES
ABS                            YES
ABS                            YES

3 rows selected.

ABS() is an offloadable function, yet when used in the WHERE clause of this particular statement, the result was a large degradation in performance. If you have been following this example on your own environment, you may already have a pretty good idea why. Here is the solution:

SQL> select name,value from v$statname natural join v$mystat
  2  where name like '%storage index%';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0

SQL> set timing on
SQL> select /* example001 */ count(*) from SALES where abs(prod_id) < 1;

  COUNT(*)
----------
         0

Elapsed: 00:00:33.90
SQL> select name,value from v$statname natural join v$mystat where name like '%storage index%';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0

Elapsed: 00:00:00.00
SQL> select /* example001 */ count(*) from SALES where prod_id < 1;

  COUNT(*)
----------
         0

Elapsed: 00:00:00.76
SQL> select name,value from v$statname natural join v$mystat where name like '%storage index%';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1.4517E+11

Elapsed: 00:00:00.00

Storage indexes are disabled by functions, just like regular indexes. This is not too surprising, but, again, it can easily be missed. When we see a full scan, we have trained ourselves to not worry about functions in the WHERE clause that could disable indexes. Exadata is different.

Chained Rows

This is a very broad generalization, but basically any Oracle processing that requires reading an extra block to complete a row causes the Exadata storage software to revert to block shipping or passthrough mode. You read about this in several places in the previous chapters-Chapter 11 specifically provides most details. A simple example is a chained row, but there are other situations that can cause Oracle to revert to passthrough mode. What this means in practice is that some operations that cause slight delays on non-Exadata platforms can potentially have a more severely impact performance on Exadata. The primary diagnostic symptom of this issue is the presence of many single-block-read wait events in combination with cell Smart Scan wait events. In such situations, you may find that you are better off not using offloading for the statements in question as an immediate remedy before addressing the problem more thoroughly. Here is an example showing where Oracle spends its time when selecting from a table with chained rows during the first execution. The example is specifically engineered to exaggerate the problem and make it reproducible, as the dictionary information about the table shows.

SQL> select num_rows,chain_cnt,avg_row_len from tabs where table_name = 'CHAINS';

  NUM_ROWS  CHAIN_CNT AVG_ROW_LEN
---------- ---------- -----------
    600000     600000       20005

1 row selected.

SQL> select segment_name,partition_name,round(bytes/power(1024,3),2) gb
  2   from user_segments
  3  where segment_name = 'CHAINS';

SEGMENT_NAME                   PARTITION_NAME                         GB
------------------------------ ------------------------------ ----------
CHAINS                                                             12.75

SQL> alter system flush buffer_cache;

System altered.

SQL> select /*+ gather_plan_statistics monitor */ avg(length(col2)) from chains;

AVG(LENGTH(C))
--------------
          3990

1 row selected.

An investigation into where the execution time is spent could see the following commands being used:

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID  6xpsmzknmkutw, child number 0
-------------------------------------
select /*+ gather_plan_statistics monitor */ avg(length(col2)) from chains

Plan hash value: 1270987893

-------------------------------------------------------------------------------------
| Id  | Operation                  | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |        |       |       |   450K(100)|          |
|   1 |  SORT AGGREGATE            |        |     1 |  3990 |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| CHAINS |   600K|  2283M|   450K  (1)| 00:00:18 |
-------------------------------------------------------------------------------------

Please note that the above output shows the estimates for time and number of rows returned by the query. If you want the actual statistics, either use SQL Monitor (provided you have the license to) or provide ALLSTATS LAST as the format parameter in DBMS_XPLAN.DISPLAY_CURSOR. Other useful tools include session snapper and fsx- family of scripts mentioned in Chapter 2. Querying Active Session History (which also requires a license) can also provide interesting insights into what is currently happening, but make sure you filter appropriately. Ultimately, a trace will reveal every single wait that occurred. After processing the raw trace file, the following information has been gathered. The actual processing time is prolonged due to the large number of entries in the trace file:

SQL ID: 6xpsmzknmkutw Plan Hash: 1270987893

select /*+ gather_plan_statistics monitor */ avg(length(col2)) from chains


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     95.91     497.32    2321346    2321394          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     95.91     497.32    2321346    2321394          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 65
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
         1          1          1 SORT AGGREGATE (cr=2321394 pr=2321346 pw=0 time=497324130 us)
    600000     600000     600000  TABLE ACCESS STORAGE FULL CHAINS (cr=2321394 pr=2321346 pw=0
                                    time=484769611 us cost=450678 size=2394000000 card=600000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  row cache lock                                  5        0.00          0.00
  Disk file operations I/O                        2        0.00          0.00
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read            660743        0.42        448.04
  enq: KO - fast object checkpoint                3        0.00          0.00
  reliable message                                1        0.00          0.00
  cell smart table scan                        1681        0.00          0.45
  latch: redo allocation                          1        0.00          0.00
  SQL*Net message from client                     2       16.25         16.25
********************************************************************************

In this example, a query was executed against a table in which every row was chained (again a bit of an extreme, but a 100%-reproducible example). Before starting the execution of the main query, a 10046/SQL Trace was started, which ran throughout the execution of the statement. You saw the output of tkrprof on the trace file, which produced the list of wait events just shown. Notice that the statement had an elapsed time of about 8 minutes (497 seconds), of which the vast majority was spent doing single-block physical reads. The statement used a full table scan and was clearly offloaded in parts, as indicated by the cell smart table scan wait events, but it spent most of its time doing single-block reads. Of course, the single-block reads were a result of the chained rows. The next listing shows the difference between running the query with offloading and without offloading. First, the example with all Exadata features disabled:

SQL> alter session set "_serial_direct_read"=always;

Session altered.

Elapsed: 00:00:00.01
SQL> alter session set cell_offload_processing=false;

Session altered.

Elapsed: 00:00:00.00
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:01.99
SQL> select /*+ gather_plan_statistics monitor */ avg(length(col2)) from chains;

AVG(LENGTH(COL2))
-----------------
             3990

Elapsed: 00:00:50.10

As you can see with all offloading disabled and direct path reads, the execution time is 50 seconds. The rows are still chained—nothing has happened to the table. In the next example, direct path reads are disabled as well:

SQL> alter session set "_serial_direct_read"=never;

Session altered.

Elapsed: 00:00:00.01
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.14
SQL> select /*+ gather_plan_statistics monitor */ avg(length(col2)) from chains;

AVG(LENGTH(COL2))
-----------------
             3990

Elapsed: 00:00:28.33

And, finally, everything back to default—cell offloading enabled and Smart Scans:

SQL> alter session set "_serial_direct_read"=always;

Session altered.

Elapsed: 00:00:00.00
SQL> alter session set cell_offload_processing=true;

Session altered.

Elapsed: 00:00:00.00
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.08
SQL> select /*+ gather_plan_statistics monitor */ avg(length(col2)) from chains;

AVG(LENGTH(COL2))
-----------------
             3990

Elapsed: 00:07:02.29

Ouch—the last example clearly showed severe performance degradation. However, it is very important at this stage to point out that this is the most extreme example, as every row was chained. In fact, many of the records spanned three blocks. In this extreme case, it is clear that avoiding offloading actually improves the performance significantly. Of course, eliminating the chained rows where possible would be the best solution.

Very Busy Cells

When a storage cell becomes very busy with CPU operations, it is possible for the storage software to begin refusing to perform offload activities on some portion of the requests. That is, if a particular cell becomes extremely busy and is getting offloadable requests from a database server that is not particularly busy, it may decide to send some data back to the database server that has not been processed or has only partially been processed. In some cases, column projection may be done but not filtering; in other cases, cellsrv may revert to shipping entire blocks back to the database layer. While this issue does not result in a complete shutdown of offloading, it can reduce the amount of work that is done on the storage tier.

This is an extremely complex topic, and it is difficult to observe the behavior directly. The goal of the feature is to utilize available CPU resources, regardless of whether they are on the database tier or the storage tier. This behavior was introduced in cellsrv version 11.2.2.3.1 with databases running 11.2.0.2 with bundle patch 6 or later. There is a statistic called cell physical IO bytes pushed back due to excessive CPU on cell in 11.2.0.2 with BP 6 and later that shows this is happening. Note that the statistic name later changed to cell physical IO bytes sent directly to DB node to balance CPU. The feature is designed to improve throughput on very busy systems, but it may also cause some degree of instability in the performance of certain statements. It is possible to disable this feature if your cellsrv is erroneously deciding it is too busy to take on additional work; however, in general, if you observe this behavior, you are probably getting close to the limits of the system. Adding additional resources at the storage layer (more storage cells) may be a viable option if reducing the load created is not possible.

Hinted Code

Hints are very useful for coercing the optimizer to do what you want it to do. Unfortunately, hints are not well documented and even less well understood. In many cases, hints are used to resolve a problem that is caused by some misconfiguration of the database (and then forgotten during the upgrade, potentially causing problems with the new major release). Their intended purpose is to allow humans to help the optimizer make the right choices in situations where it just cannot do the job effectively (or consistently) on its own. This happens in situations where the optimizer is just not smart enough (yet) to arrive at the best execution plan in a specific situation. However, even when hints are used appropriately and are generating the expected behavior, they can prevent Exadata from taking advantage of some of its built-in abilities. When migrating to Exadata, the best approach is to allow ample time for testing. If your application makes use of hints, one of the important steps in the test plan should be to test its behavior without the hints. This can easily be accomplished by setting the hidden parameter _optimizer_ignore_hints to “true” in a session for the upgrade/regression test. Only after the unnecessary hints have been weeded out should the system go into production on Exadata. Ultimately, it is far easier to remove hints when the migration testing is performed than after the system is live. You simply have more resources and time dedicated to testing!

Indexes

This may seem like a strange topic, but indexes can work against Smart Scans as well. The optimizer will try to use indexes if they are available. In a pure data warehouse environment, analytic indexes may not be necessary at all. You are about to read more about indexes in the next section on mixed workloads, but it is important to understand that index usage often means that offloading is less likely to occur.

Row-at-a-Time Processing

It is amazing how often we see row-at-a-time processing in very large data sets. This type of coding is rarely a good idea on non-Exadata platforms, and it is definitely not a good idea on Exadata. In fact, the differences in performance can be even more dramatic on Exadata because of the optimizations built into the platform.

Other Things to Keep in Mind

There are a few other things that you should keep in mind when working with DW systems: the use of Exadata Smart Flash Cache, compression, and partitioning.

Exadata Smart Flash Cache: To Keep or Not to Keep

Exadata Smart Flash Cache (EFSC) is thought of primarily as providing benefit to latency-sensitive SQL statements, satisfying single-block reads from faster Flash devices. What is good for single-block I/O cannot be bad for multi-block I/O. In this context, it is important to remember that scanning from Flash can dramatically improve scan performance as well. Up until Exadata version 11.2.3.3.0, Smart Scans ignore ESFC and only scan using hard disks by default. In version 11.2.3.3.0 and later, Smart Scans can and will transparently cache data in the Flash Cache, too. That particular Exadata feature had a radical effect on the way scans were performed on the platform. From that release onward, scan performance has improved by many magnitudes. Transparently storing entire segments or parts thereof in ESFC will benefit both Smart Scans as well as non-offloaded single- and multi-block reads. The new feature largely eliminates the fine-tuning of which objects may use Flash Cache by using the segment’s storage clause for those segments critical for processing.

If you want to demonstrate the usefulness of ESFC for table scans in current Exadata versions, you have to reverse the situation from before 11.2.3.3.0: Instead of pinning segments to Flash Cache to enable Smart Scans from benefiting from ESFC, you specifically have to forbid segments from using it. The following two tables are 100% identical, with the exception that table T1_NOCOMPRESS_NOESFC is forbidden to benefit from ESFC.

SQL> select table_name, num_rows, compression, partitioned, cell_flash_cache
  2  from tabs where table_name like 'T1_NOCOMPRESS%';

TABLE_NAME                       NUM_ROWS COMPRESS PAR CELL_FL
------------------------------ ---------- -------- --- -------
T1_NOCOMPRESS_NOESFC             10000000 DISABLED NO  NONE
T1_NOCOMPRESS                    10000000 DISABLED NO  DEFAULT

2 rows selected.

SQL> select segment_name, bytes/power(1024,2) m, blocks
  2  from user_segments where segment_name like 'T1_NOCOMPRESS%';

SEGMENT_NAME                            M     BLOCKS
------------------------------ ---------- ----------
T1_NOCOMPRESS                       13056    1671168
T1_NOCOMPRESS_NOESFC                13056    1671168

2 rows selected.

In preparation for the demonstration, a number of full scans were executed against T1_NOCOMPRESS so as to ensure that the table contents are largely found in Flash Cache. Remember from Chapter 5 that you can use cellcli's list flashcachecontents command to see what is actually cached on a cell. A Smart Scan against T1_NOCOMPRESS will set the baseline for the comparison:

SQL> select count(*) from T1_NOCOMPRESS;

  COUNT(*)
----------
  10000000

Elapsed: 00:00:01.24

The query has deliberately been chosen to not feature a WHERE clause to rule out any optimization by storage indexes. There are no indexes involved either, as you can see from the execution plan:

SQL> select * from table(dbms_xplan.display_cursor)
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID  38ttfy95yg3sd, child number 0
-------------------------------------
select count(*) from T1_NOCOMPRESS

Plan hash value: 3825536868

------------------------------------------------------------------------------------
| Id  | Operation                  | Name          | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |               |       |   452K(100)|          |
|   1 |  SORT AGGREGATE            |               |     1 |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| T1_NOCOMPRESS |    10M|   452K  (1)| 00:00:18 |
------------------------------------------------------------------------------------

14 rows selected.

Digging into the execution statistics using session snapper or mystats.sql, you can see that lots of the read requests are optimized. The output below is taken from mystats; statistics not necessary for this discussion have been removed:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                          13,653,352,448
STAT    cell blocks helped by minscn optimization                                1,666,679
STAT    cell flash cache read hits                                                  11,576
STAT    cell physical IO bytes eligible for predicate offload               13,653,336,064
STAT    cell physical IO interconnect bytes returned by smart scan             269,067,552
STAT    cell scans                                                                       1
STAT    physical read IO requests                                                   13,048
STAT    physical read bytes                                                 13,653,336,064
STAT    physical read requests optimized                                            11,576
STAT    physical read total IO requests                                             13,048
STAT    physical read total bytes                                           13,653,336,064
STAT    physical read total bytes optimized                                 12,111,839,232
STAT    physical read total multi block requests                                    13,037
STAT    physical reads                                                           1,666,667
STAT    physical reads direct                                                    1,666,667

------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

You can see that out of all these, approximately 13GB read using 13,048 I/O requests, a large portion was satisfied via Flash Cache, namely 11,576. You can put this into perspective by comparing physical read total bytes optimized to physical read total bytes. And, by the way, once the data is in Flash Cache, traditional single-block and multi-block reads will benefit from the fact that the segment is on faster storage, too, at no extra penalty.

On the other hand, if a table cannot benefit from ESFC such as T1_NOCOMPRESS_NOESFC, things look a little different:

SQL> select count(*) from T1_NOCOMPRESS_NOESFC;

  COUNT(*)
----------
  10000000

Elapsed: 00:00:11.27

Repeated executions of the statement cannot have an effect: There will not be any caching on Flash Cache as this operation is administratively prohibited. The execution plan is identical to the first one shown:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID  9gdnw7yk14mpw, child number 0
-------------------------------------
select count(*) from T1_NOCOMPRESS_NOESFC

Plan hash value: 4286875364

-------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                 | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                      |       |   452K(100)|          |
|   1 |  SORT AGGREGATE            |                      |     1 |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| T1_NOCOMPRESS_NOESFC |    10M|   452K  (1)| 00:00:18 |
-------------------------------------------------------------------------------------------

14 rows selected.

Interestingly, the optimizer assumes the same elapsed time for the scan, namely 18 seconds. The difference in the execution time can be found in the execution statistics:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                          13,653,336,064
STAT    cell physical IO bytes eligible for predicate offload               13,653,336,064
STAT    cell physical IO interconnect bytes returned by smart scan             269,067,184
STAT    cell scans                                                                       1
STAT    physical read IO requests                                                   13,046
STAT    physical read bytes                                                 13,653,336,064
STAT    physical read total IO requests                                             13,046
STAT    physical read total bytes                                           13,653,336,064
STAT    physical read total multi block requests                                    13,037
STAT    physical reads                                                           1,666,667
STAT    physical reads direct                                                    1,666,667

------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

The effect of Flash Cache is visible in the I/O-related statistics in V$SQL, you can query physical_read_requests and optimized_phy_read_requests. The output is rearranged for better readability:

SQL> @fsx4.sql
Enter value for sql_text: %esfc_example%
Enter value for sql_id:

SQL_ID        CHILD OFFLOAD IO_SAVED_% AVG_ETIME SQL_TEXT
------------- ------ ------- ---------- ---------- --------------------------------------- ...
aqmusjaqj5yy6      0 Yes          98.03       9.23 select /* esfc_example */ count(*) from ...
g85ux15kbh9hr      0 Yes          98.03       1.23 select /* esfc_example */ count(*) from ...

... SQL_ID        PHYSICAL_READ_REQUESTS OPTIMIZED_PHY_READ_REQUESTS
... ------------- ---------------------- ---------------------------
... aqmusjaqj5yy6                  13046                           0
... g85ux15kbh9hr                  13046                       11576

2 rows selected.

Thankfully, a lot of the complexity around pinning segments to Flash Cache has been resolved with the introduction of Exadata 11.2.3.3. And you do not even have to spend too much time thinking about it. In the output of the query against user_tables shown earlier, you will notice that the attributes to cell_flash_cache are NONE and DEFAULT, but none of them is set to KEEP. For the transparent caching of data for Smart Scans alone, it was worth upgrading to 11.2.3.3.

Compression

Exadata’s Hybrid Columnar Compression (HCC) is a big step forward in its ability to reduce the size of data stored inside of Oracle databases. The compression ratios that are achievable with HCC turn the traditional concept of information life-cycle management on its head. HCC makes it practical to consider using compression instead of tiered storage or archiving and purging strategies. Because partitions of a table can be defined to use different compression methods, the combination of partitioning and compression can provide a much more robust solution for “archiving” data than actually purging it from the database.

You should remember, though, that HCC is not appropriate for data that is being subject to DML. A better approach is to partition data such that HCC can be applied to data that is no longer being changed. This leads us to the next topic—partitioning.

Partitioning

Partitioning has been and still is a very key component for data warehousing systems. The optimizations provided by Exadata do not alleviate the need for a well thought-out partitioning strategy. Of course, date-based strategies are very useful from a management standpoint. Being able to use more aggressive compression on older data is often a good approach. But partition elimination is still a technique that you will want to use. And, of course, storage indexes can work well with partitioning, providing behavior comparable to partition elimination on additional columns.

You should keep in mind that the sizes of partitions can affect Oracle’s decision to use Smart Scans. When performing a serial scan on a partitioned object, the decision to do direct path reads is based on the individual segment (table, partition, subpartition) size, not the overall size of the object. This can result in situations where scans of some partitions are offloaded, while scans of others are not. This is particularly relevant for colder partitions that have been compressed. Consider a table created with a number of random dates, range partitioned by date using monthly intervals:

SQL> select segment_name, partition_name, bytes/power(1024,2) m
  2  from user_segments where segment_name= 'SMARTSCANHCC';

SEGMENT_NAME                   PARTITION_NAME                          M
------------------------------ ------------------------------ ----------
SMARTSCANHCC                   SYS_P9676                             944
SMARTSCANHCC                   SYS_P9677                             968
SMARTSCANHCC                   SYS_P9678                             384

3 rows selected.

SQL> select partition_name, high_value from user_tab_partitions
  2  where table_name = 'SMARTSCANHCC'

PARTITION_NAME                 HIGH_VALUE
------------------------------ --------------------------------------------------
P_START                        TO_DATE(' 1995-01-01 00:00:00', 'SYYYY-MM-DD HH24:
SYS_P9676                      TO_DATE(' 2014-05-01 00:00:00', 'SYYYY-MM-DD HH24:
SYS_P9677                      TO_DATE(' 2014-09-01 00:00:00', 'SYYYY-MM-DD HH24:
SYS_P9678                      TO_DATE(' 2014-10-01 00:00:00', 'SYYYY-MM-DD HH24:

SQL> select partition_name, last_analyzed, num_rows from user_tab_partitions
  2  where table_name = 'SMARTSCANHCC';

PARTITION_NAME                 LAST_ANALYZED         NUM_ROWS
------------------------------ ------------------- ----------
P_START                        2015-03-15:16:31:52          0
SYS_P9676                      2015-03-15:16:31:55     837426
SYS_P9677                      2015-03-15:16:31:58     863109
SYS_P9678                      2015-03-15:16:31:59     335539

4 rows selected.

The table is interval-partitioned, partition P_START is empty, and—thanks to deferred segment creation—has not even be created. The size of the partitions enables Smart Scans. The SQL Monitor report has been cut down to fit the page, and only relevant information is shown:

SQL Monitoring Report

SQL Text
------------------------------
select /*+ monitor gather_plan_statistics */ count(*) from smartscanhcc partition (SYS_P9676)

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  MARTIN (591:27854)
 SQL ID              :  76yr0u2rhkqq8
 SQL Execution ID    :  16777217
 Execution Started   :  03/15/2015 16:32:52
 First Refresh Time  :  03/15/2015 16:32:52
 Last Refresh Time   :  03/15/2015 16:32:53
 Duration            :  1s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  [email protected] (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
========================================================================================
| Elapsed |   Cpu   |    IO    | Application | Fetch | Buffer | Read | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Calls |  Gets  | Reqs | Bytes | Offload |
========================================================================================
|    0.33 |    0.13 |     0.21 |        0.00 |     1 |   120K |  940 | 935MB |  97.64% |
========================================================================================

Imagine next that the partition undergoes maintenance and is HCC compressed:

SQL> alter table smartscanhcc modify partition SYS_P9676 column store compress for query high;

Table altered.

SQL> alter table smartscanhcc move partition SYS_P9676;

Table altered.

SQL> select segment_name, partition_name, bytes/power(1024,2) m
  2  from user_segments where segment_name= 'SMARTSCANHCC';

SEGMENT_NAME                   PARTITION_NAME                          M
------------------------------ ------------------------------ ----------
SMARTSCANHCC                   SYS_P9677                             968
SMARTSCANHCC                   SYS_P9678                             384
SMARTSCANHCC                   SYS_P9676                              16

3 rows selected.

As expected, the compressed size of this partition smaller than it was before. If a user now executes a query against the segment, it is most likely not Smart Scanned. And, indeed, this can be confirmed, for example by using a SQL Monitor report (alternatively if you do not have the license to use it, you can query V$SQL):

SQL Monitoring Report

SQL Text
------------------------------
select /*+ monitor gather_plan_statistics */ count(*) from smartscanhcc partition (SYS_P9676)

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  MARTIN (1043:51611)
 SQL ID              :  76yr0u2rhkqq8
 SQL Execution ID    :  16777216
 Execution Started   :  06/09/2015 11:02:38
 First Refresh Time  :  06/09/2015 11:02:38
 Last Refresh Time   :  06/09/2015 11:02:38
 Duration            :  .123527s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  [email protected] (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
======================================================================================
| Elapsed |   Cpu   |    IO    | Cluster  |  Other   | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
======================================================================================
|    0.12 |    0.02 |     0.09 |     0.00 |     0.01 |     1 |   1245 |   11 |  10MB |
======================================================================================

The missing column about cell offload efficiency is an indicator for a traditional read and the absence of a Smart Scan. There are cases when the absence of a Smart Scan does not matter when partitions are small. In the above example the data set was reduced from 944MB to 16MB. These can be read very quickly if you looked at the elapsed time.

Mixed Workloads

There is a third type of system that is a combination of the other two. In fact, one could argue that the pure form of the other two (OLTP and DW) rarely exist in the real world. There are many systems that do not fall neatly into the two main categories already described. In fact, most systems display characteristics of both. Consider, for example, the case where an “OLTP” system is performing short, distinct little transactions during the day with a lot of reporting during the evening hours. Or take the point of view of the data warehouse where you run lots of reports but have a scheduled ELT (Extraction Load Transform) process that makes generous use of the merge clause, which of course requires a lookup.

Combining long-running, throughput-sensitive queries with fast, latency-sensitive statements definitely introduces some additional issues that must be dealt with. One of the main issues in systems of this type is how to deal with indexes.

To Index or Not to Index?

One of the biggest debates we have had during Exadata implementations is whether to drop indexes or not. The problem has somewhat been exacerbated by claims that you do not need any indexes on Exadata. Access paths that use indexes are generally not able to take advantage of Exadata-specific optimizations. Yes, you read that right—it is generally because offloading can occur in cases where the optimizer chooses to execute a fast full scan on an index, but this is not the most common usage pattern for indexes. The more common pattern is to use them for retrieving relatively few records from a table using an index range scan, and this operation is not currently offloadable. Generally speaking, you will want to use index range scans on selective predicates. However, since Exadata is so effective at scanning disks, in many cases the index-based access paths are no longer faster than the scan-based access operations. The frequency of how many times this query is executed starts to play an important role. If you are scanning a multi-million-row table in a couple of seconds, then this is undoubtedly fast. However, if you need to do this 10,000 times during a merge an index on the lookup might speed things up. It is really a case of getting our bearings all over again with respect to when we want to use indexes and when we would expect a full scan to perform better.

One of the things we commonly heard when Exadata was first starting to appear at customer sites was that indexes were no longer necessary and that they should be dropped. For pure data warehouse workloads, this may actually be reasonably good advice for analytical indexes. However, we rarely see anything you could call a “pure data warehouse” workload. Most systems have a mix of access patterns, with one set of statements hoping for low latency and another set hoping for high throughput. In these cases, dropping all indexes just will not work. This is why this discussion was saved for this section. The problem with mixed workloads, where it is necessary to keep indexes for specific sets of statements, is that the optimizer is not as well equipped to choose between using and ignoring them as one might hope. However there might be a way around this situation by making creative use of invisible indexes. This 11g feature allows you to hide indexes from the optimizer when it comes to developing an execution plan. The indexes are still there and will also be maintained, so you might want to review index use. Not every index needs to be dropped, but, likewise, not every analytical index needs to be kept.

The following example is a relatively simple implementation on how to have indexes in the database but make only selective use of them. The dbm01 database has been modified, and two services have been created and started. DSSSRV, as the name implies, is a service users should use when performing decision support queries or those with high demand on throughput and less need for low latency. As you would imagine, the opposite is true for the OLTPSRV. Users connecting through that service care a lot for low latency but less for bandwidth. A small PL/SQL procedure can be written to check which service a session used to connect and change the parameter optimizer_use_invisible_indexes. The index on table T1_WITH_INDEX is invisible:

SQL> select index_name, visibility
  2  from user_indexes
  3  where index_name = 'I_T1_WITH_INDEXES_1';

INDEX_NAME                     VISIBILIT
------------------------------ ---------
I_T1_WITH_INDEXES_1            INVISIBLE

The little procedure being used here just checks for the service name and changes the optimizer’s visibility of the index:

SQL> create procedure check_service is
  2  begin
  3   if lower(sys_context('userenv','service:name')) = 'dsssrv' then
  4    execute immediate 'alter session set optimizer_use_invisible_indexes = false';
  5   elsif lower(sys_context('userenv','service:name')) = 'oltpsrv' then
  6    execute immediate 'alter session set optimizer_use_invisible_indexes = true';
  7   end if;
  8  end;
  9  /

Procedure created.

The following execution plans show that the index is used based on the service the session connects through. The first example uses the OLTPSRV connection:

SQL> select sys_context('userenv','service:name') from dual;

SYS_CONTEXT('USERENV','SERVICE_NAME')
-----------------------------------------------------------------------------------------
oltpsrv

SQL> exec check_service

SQL> select /* oltpsrv */ count(*) from t1_with_index where id between 200 and 400;

  COUNT(*)
----------
       201

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
SQL_ID  46auh11c8ddts, child number 0
-------------------------------------
select /* oltpsrv */ count(*) from t1_with_index where id between 200
and 400

Plan hash value: 2861271559

-----------------------------------------------------------------------------------------
| Id  | Operation         | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                     |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE   |                     |     1 |     6 |            |          |
|*  2 |   INDEX RANGE SCAN| I_T1_WITH_INDEXES_1 |   202 |  1212 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=200 AND "ID"<=400)

20 rows selected.

Notice the index-driven execution plan. When connecting via DSSSRV, the situation changes:

SQL> select sys_context('userenv','service:name') from dual;

SYS_CONTEXT('USERENV','SERVICE_NAME')
--------------------------------------------------------------------------------------------
dsssrv

Elapsed: 00:00:00.00
SQL> exec check_service

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL> select /* dsssrv */ count(*) from t1_with_index where id between 200 and 400;

  COUNT(*)
----------
       201

Elapsed: 00:00:00.56
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID  0ym2m0whwsn1y, child number 0
-------------------------------------
select /* dsssrv */ count(*) from t1_with_index where id between 200
and 400

Plan hash value: 1131101492

--------------------------------------------------------------------------------------------
| Id  | Operation                  | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |               |       |       |   452K(100)|          |
|   1 |  SORT AGGREGATE            |               |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| T1_WITH_INDEX |   202 |  1212 |   452K  (1)| 00:00:18 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage(("ID"<=400 AND "ID">=200))
       filter(("ID"<=400 AND "ID">=200))

21 rows selected.

Unlike the previous example, you do not see any index in the execution plan. If you consider putting the little PL/SQL block into a nicer format, you can easily embed it in a login trigger and control index usage that way. Further optimizer limitations in the context of Exadata are discussed in the next section.

The Optimizer Doesn’t Know

You have read several times in this book that the optimizer is not aware that it is running on Exadata. In general, the principles that guide the optimizer decisions are sound, regardless of the storage platform. The fact that the code on the database tier is identical—regardless of whether it is running on Exadata or not—means that an application will behave similarly on Exadata in terms of plan selection. So, you should not expect any application to experience a large number of changes in the plans caused simply by moving to Exadata if you remain on the same version and same memory settings. Having the same software generating the execution plans help a lot toward stability! The situation might be different if you are migrating to Exadata from a lower Oracle version, such as during a 11.2 to 12.1 migration, or from single instance to RAC, but you would expect similar changes outside the Exadata platform, too.

The downside is that the optimizer is not aware that Exadata has optimizations that can cause full scans to perform much better than on other platforms, apart from the EXADATA system statistics you will read about in the next section. So mixed-workload systems that have many indexes make the optimizer’s job more challenging. In fact, as you might expect, the optimizer will tend to pick index-oriented plans in preference to full scan-based plans in situations where indexes are available, despite the fact that the full scan-based plans are often much faster.

There are several ways to deal with the optimizer’s tendency to prefer index access over full table scans. System statistics, optimizer parameters, and hints all come to mind as potential solutions. You can read more about these in the following sections.

System Statistics

System statistics provide the optimizer with additional information about the “system,” including how long it takes to do a single-block read (typical of index lookups) and how long it takes to do a multi-block read (typical of full table scans). This may appear to be an ideal mechanism to manipulate the optimizer by giving it the additional information it needs to make the right decisions. Unfortunately, Smart Scans are not based on traditional multi-block reads and, in fact, Smart Scans can be orders of magnitude faster than multi-block reads. Hence, modifying System Statistics is probably not the best option in this case.

In fact, the question whether or not to gather system statistics in WORKLOAD mode comes up quite often during discussions about Exadata deployments. For the reasons outlined above, it is probably not a wise idea to gather them, as it potentially introduces plan regression. Introducing WORKLOAD statistics can also have far-reaching effects.

Another alternative exists, however, for database version 11.2.0.2 BP18 and 11.2.0.3 BP8 and newer, according to DOC ID 1274318.1: gathering statistics the Exadata way. The Document on My Oracle Support specifically states that this is not a generic recommendation, but should be assessed carefully. To enable Exadata system statistics, you can use the following command:

SQL> exec DBMS_STATS.GATHER_SYSTEM_STATS('EXADATA')

As a result of this call, the database engine is told that it can read more data in a single request on Exadata, thus lowering the cost of full scans. It does not prevent the optimizer from selecting an index though. The change in the costing model is the reason why you should only introduce the change after careful testing! The aforementioned note also recommends that if the application is developed from the ground up on Exadata, the effect of gathering Exadata-statistics can be controlled more easily and any adverse side effects can be caught in testing, before going live. Whichever way-careful testing is needed.

Optimizer Parameters

There are a couple of initialization parameters that can push the optimizer toward or away from index usage. The parameters OPTIMZER_INDEX_CACHING and OPTIMIZER_INDEX_COST_ADJ can both be used for this purpose. While these are big knobs that can affect the core functionality of the optimizer, they were designed for the very purpose of making indexes more or less attractive to the optimizer. Using the parameters in a limited way, such as with an alter session command, before running large batch processes is a viable approach in some cases. These parameters can also be set at the statement level using the OPT_PARAM hint. Here is a very simple example:

SQL> show parameter optimizer_index

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     100

SQL> select /*+ parallel(2) gather_plan_statistics monitor chap17-f */
  2  count(*), a.state
  3  from bigt a, t1_sml b
  4  where a.id = b.id
  5  and b.state = 'RARE'
  6  group by a.state
  7  /

no rows selected

Elapsed: 00:00:25.46

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  4q6vqy2r1yn5w, child number 0
-------------------------------------
select /*+ parallel(2) gather_plan_statistics monitor chap17-f */
count(*), a.state from bigt a, t1_sml b where a.id = b.id and b.state =
'RARE' group by a.state

Plan hash value: 1484706486

---------------------------------------------------------------------------------------
| Id  | Operation                                     | Name         | Rows  | Bytes |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                              |              |       |       |
|   1 |  PX COORDINATOR                               |              |       |       |
|   2 |   PX SEND QC (RANDOM)                         | :TQ10003     |     1 |    32 |
|   3 |    HASH GROUP BY                              |              |     1 |    32 |
|   4 |     PX RECEIVE                                |              |     1 |    32 |
|   5 |      PX SEND HASH                             | :TQ10002     |     1 |    32 |
|   6 |       HASH GROUP BY                           |              |     1 |    32 |
|*  7 |        HASH JOIN                              |              |     1 |    32 |
|   8 |         JOIN FILTER CREATE                    | :BF0000      |     8 |   128 |
|   9 |          PX RECEIVE                           |              |     8 |   128 |
|  10 |           PX SEND BROADCAST                   | :TQ10001     |     8 |   128 |
|  11 |            TABLE ACCESS BY INDEX ROWID BATCHED| T1_SML       |     8 |   128 |
|  12 |             SORT CLUSTER BY ROWID             |              |     8 |       |
|  13 |              BUFFER SORT                      |              |       |       |
|  14 |               PX RECEIVE                      |              |     8 |       |
|  15 |                PX SEND HASH (BLOCK ADDRESS)   | :TQ10000     |     8 |       |
|  16 |                 PX SELECTOR                   |              |       |       |
|* 17 |                  INDEX RANGE SCAN             | T1_SML_STATE |     8 |       |
|  18 |         JOIN FILTER USE                       | :BF0000      |   100M|  1525M|
|  19 |          PX BLOCK ITERATOR                    |              |   100M|  1525M|
|* 20 |           TABLE ACCESS STORAGE FULL           | BIGT         |   100M|  1525M|
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("A"."ID"="B"."ID")
  17 - access("B"."STATE"='RARE')
  20 - storage(:Z>=:Z AND :Z<=:Z AND SYS_OP_BLOOM_FILTER(:BF0000,"A"."ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"A"."ID"))

Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - Degree of Parallelism is 2 because of hint

SQL> alter session set optimizer_index_cost_adj=10000;

Session altered.

SQL> select /*+ parallel(2) gather_plan_statistics monitor chap17-f */
  2  count(*), a.state
  3  from bigt a, t1_sml b
  4  where a.id = b.id
  5  and b.state = 'RARE'
  6  group by a.state
  7  /

no rows selected

Elapsed: 00:00:15.40

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID  4q6vqy2r1yn5w, child number 1
-------------------------------------
select /*+ parallel(2) gather_plan_statistics monitor chap17-f */
count(*), a.state from bigt a, t1_sml b where a.id = b.id and b.state =
'RARE' group by a.state

Plan hash value: 3199786897

------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |          |       |       |  2510K(100)|
|   1 |  PX COORDINATOR                    |          |       |       |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001 |     1 |    32 |  2510K  (1)|
|   3 |    HASH GROUP BY                   |          |     1 |    32 |  2510K  (1)|
|   4 |     PX RECEIVE                     |          |     1 |    32 |  2510K  (1)|
|   5 |      PX SEND HASH                  | :TQ10000 |     1 |    32 |  2510K  (1)|
|   6 |       HASH GROUP BY                |          |     1 |    32 |  2510K  (1)|
|*  7 |        HASH JOIN                   |          |     1 |    32 |  2510K  (1)|
|   8 |         JOIN FILTER CREATE         | :BF0000  |     8 |   128 |   137   (4)|
|*  9 |          TABLE ACCESS STORAGE FULL | T1_SML   |     8 |   128 |   137   (4)|
|  10 |         JOIN FILTER USE            | :BF0000  |   100M|  1525M|  2510K  (1)|
|  11 |          PX BLOCK ITERATOR         |          |   100M|  1525M|  2510K  (1)|
|* 12 |           TABLE ACCESS STORAGE FULL| BIGT     |   100M|  1525M|  2510K  (1)|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("A"."ID"="B"."ID")
   9 - storage("B"."STATE"='RARE')
       filter("B"."STATE"='RARE')
  12 - storage(:Z>=:Z AND :Z<=:Z AND SYS_OP_BLOOM_FILTER(:BF0000,"A"."ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"A"."ID"))
Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - Degree of Parallelism is 2 because of hint

In this simple example, pushing the optimizer away from indexes with the alter session caused the optimizer to pick a plan that was considerably faster. The plans show that the improvement in elapsed time was a result of doing a full table scan, instead of using the index.

Hints

Of course, hints can also be used to help the optimizer make the right choices, but that is somewhat of a slippery slope. This is especially true with the aforementioned mixed-workload scenarios. Nevertheless, telling Oracle that you would prefer to do a hash join or ignore a specific index is an option. As mentioned in the previous section, the OPT_PARAM hint can also prove useful for setting some initialization parameters that can influence the optimizer’s decisions. SQL patches can help you by injecting hints into code outside of your control. Until a fix is available, Oracle 12c should reduce the necessity to use hints to influence join methods with the introduction of adaptive optimization.

Using Resource Manager

Unfortunately, it is still a commonly held belief that Oracle databases cannot be configured to adequately handle both DW and OLTP workloads at the same time. And, in truth, keeping them on separate systems does make them easier to manage. The downside of this approach is that it is expensive. Many companies dedicate the majority of their computing resources to moving data between platforms. The power of Exadata makes it tempting to combine these environments. Keep in mind that Exadata has additional capabilities for dividing resources between multiple databases that are not available on other platforms. I/O Resource Manager can prevent long-running DW queries from crippling latency-sensitive statements that are running on the same system. Having a good understanding of Oracle’s resource management capabilities should change the way you think about what is possible in a mixed-workload or consolidated environment. Resource management is covered in depth in Chapter 7.

Summary

Exadata is different from traditionally deployed Oracle database. To make the best use of it, you will need to think differently. This does not imply that you have to rewrite your application when moving it to Exadata, but it is a good opportunity to perform a general review of it. Quite often, in today’s world, DBAs look after tens or hundreds of databases. “Know your data” is becoming wishful thinking in such situations. The DBA might be assigned a problem ticket to be closed, and the sheer number of tickets to handle often does not allow any deeper analysis of the root cause for as long as the system “ticks along” nicely.

When it is decided that a database is migrated to Exadata, this decision frequently implies an update to a newer Oracle version. Platform changes and database release changes are the most opportune moments to review a database environment for further performance gains. If it is possible and you are not facing massive time constraints when moving to Exadata, we would like to encourage you not to stop working on the system after the migration completed successfully, but to continue to push the boundaries of what is possible. The Exadata system is very powerful when using Smart Scans, and you should harness that performance where you can and where it makes sense.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset