Exadata Parallel Operations
Exadata doesn’t have a special way of executing parallel operations that is not available on other platforms running 11gR2 and 12c. However, parallel processing is a key component of Exadata because efficient handling of data warehouse workloads was a primary design goal for Exadata. In addition, because offloading/Smart Scan depends on direct path reads, which can be used by parallel query slaves, parallel operations take on a whole new importance. Traditionally, the efficient use of parallel query has required careful control of concurrency in order to maximize the use of available resources without overwhelming the system. Oracle’s previous attempts at throttling parallel operations to allow them to be used in multiuser environments have not been entirely successful. Version 11gR2 provided some new capabilities for controlling parallel operations. In particular, a queueing mechanism has been introduced that allows the number of concurrent parallel processes to be managed more effectively. This approach appears to be much better suited to allowing a high degree of parallelism without overwhelming the available resources than previous attempts. 11gR2 also introduced the ability for Oracle to automatically calculate a degree of parallelism on a statement-by-statement basis. Version 12c introduced a Parallel Statement Queueing enhancement and process memory usage control. Also introduced in version 12c is a Database Resource Manager (DBRM) directive which allows the critical queries to bypass the parallel statement queue, and a new database parameter that limits the total Program Global Area (PGA) size, for which the idea is to avoid excessive physical memory usage that leads to high rates of swapping.
In this chapter, we will focus on parallel query. Other forms of parallelism provided by the database, such as recovery parallelism, will not be covered here. We will briefly cover parallel query mechanics and demonstrate specifically how the queueing mechanism and automatic degree of parallelism work with Exadata. We will also briefly cover the 11gR2 In-Memory Parallel Execution feature and discuss how Exadata storage parallelizes I/O operations. We will not cover all the details of parallel query or parallel DML with the assumption that the reader is familiar with basic Oracle parallel concepts. Note also that the discussion and examples will refer to behavior in Oracle Database 11g Release 2 (11.2.0.4) and 12c Release 1 (12.1.0.2.0). In most cases, the comments apply equally to version 11.2.0.1 as well. Exceptions are explicitly called out.
Parameters
Before describing the various new features and how they apply to Exadata, you should review the parameters that affect how parallel queries are handled by the database. Table 6-1 shows the parameters along with a brief description of each one’s purpose.
Table 6-1. Database Parameters Controlling 11gR2 and 12c Parallel Features
Parameter |
Default |
Description |
---|---|---|
parallel_max_servers |
The maximum number of parallel slave process that may be created on an instance. The default value is calculated by parallel_threads_per_cpu * cpu_count * concurrent_parallel_users * 5. The concurrent_parallel_users value is determined by: - If SGA_TARGET is set: 4. - If PGA_AGGREGATE_TARGET is set: 2. - Else 1. | |
parallel_servers_target |
The upper limit on the number of parallel slaves that may be in use on an instance at any given time if parallel queueing is enabled. The default is calculated by parallel_threads_per_cpu * cpu_count * concurrent_parallel_users * 2. The concurrent_parallel_users value is determined by: - If SGA_TARGET is set: 4. - If PGA_AGGREGATE_TARGET is set: 2. - Else 1. | |
parallel_min_servers |
0 |
The minimum number of parallel slave processes that should be kept running, regardless of usage. Usually set to eliminate the overhead of creating and destroying parallel processes. |
parallel_threads_per_cpu |
2 |
Used in various parallel calculations to represent the number of concurrent processes that a CPU can support. Please note that 2 is the Oracle default value, the Oracle Exadata recommendation is 1. |
MANUAL |
Controls several parallel features including automatic degree of parallelism (Auto DOP), Parallel Statement Queueing, and In-Memory Parallel Execution. | |
parallel_execution_message_size |
16384 |
The size of parallel message buffers in bytes. |
parallel_degree_level |
100 |
New in 12c. The scaling factor for default DOP calculations. 100 represents 100%, so setting it to 50 will reduce the calculated DOP to half. |
parallel_force:local |
FALSE |
Determines whether parallel query slaves will be forced to execute only on the node that initiated the query (TRUE), or whether they will be allowed to spread on to multiple nodes in a RAC cluster (FALSE). |
parallel_instance:group |
Used to restrict parallel slaves to certain instances in a RAC cluster. This is done by specifying a service name that is configured to a limited number of instances. | |
parallel_io_cap_enabled |
FALSE |
This parameter is deprecated, replaced by parallel_degree_limit when it has been set to IO. Used in conjunction with the DBMS_RESOURCE_MANAGER.CALIBRATE_IO function to limit default DOP calculations based on the I/O capabilities of the system. |
parallel_degree_limit |
CPU |
With automatic degree of parallelism, specifies what degree of parallelism to use. When set to CPU, maximum is parallel_threads_per_cpu * cpu_count * instances. With IO, maximum is dependent on per process I/O bandwidth / total system throughput. When a number is set, it limits the maximal degree of parallelism to that number for a statement. |
pga_aggregate_limit |
New in 12c. This parameter limits the PGA memory usage by the instance. See Chapter 7, resource management. | |
parallel_adaptive_multi_user |
TRUE |
Old mechanism of automatic parallel query usage tuning. This mechanism works by reducing the requested DOP depending on system load at query startup time. |
parallel_min_time_threshold |
AUTO |
The minimum estimated serial execution time that will trigger Auto DOP. The default is AUTO, which translates to 10 seconds. When the PARALLEL_DEGREE_POLICY parameter is set to AUTO, ADAPTIVE, or LIMITED, parallelism is considered after the set amount of seconds. If all tables referenced use the In-Memory Column Store, this parameter defaults to 1. |
parallel_server |
FALSE |
This parameter has nothing to do with parallel queries. Set to TRUE or FALSE depending on whether the database is RAC enabled or not. This parameter was deprecated long ago and has been replaced by the CLUSTER_DATABASE parameter. |
parallel_server_instances |
1 |
This parameter has nothing to do with parallel queries, either. This parameter is set to the number of instances in a RAC cluster. |
parallel_automatic_tuning |
FALSE |
Deprecated since 10g. This parameter enabled an automatic DOP calculation on objects for which a parallelism attribute is set. |
parallel_min_percent |
0 |
Old throttling mechanism. When Parallel Statement Queueing is not enabled (when PARALLEL_DEGREE_POLICY is set to MANUAL or LIMITED), this represents the minimum percentage of parallel servers that are needed for a parallel statement to execute. |
The parameters shown in Table 6-2 control distinct features which are enabled when automatic degree of parallelism is enabled (often referred to as “auto DOP”). The parameters are underscore parameters, which means that they should not be used in production before consulting Oracle support and getting their blessing to use them.
Table 6-2. Selected Underscore Database Parameters Controlling Parallel Features
Parameter |
Default |
Description |
---|---|---|
_parallel_statement_queueing |
FALSE |
Related to automatic DOP. If set to TRUE, this enables Parallel Statement Queueing. |
_parallel_cluster_cache_policy |
ADAPTIVE |
Related to automatic DOP. If set to CACHED, this enables In-Memory Parallel Execution. |
_parallel_cluster_cache_pct |
80 |
Percentage of the total buffer cache size in the cluster to be used for In-Memory Parallel Execution. By default, segments larger than 80% of the total buffer cache size are not considered a candidate for In-Memory Parallel Execution. |
Parallelization at the Storage Tier
Exadata has a lot of processing power at the storage layer. The setup with the fixed eighth, quarter, half, or full rack configurations up to the X4 generation has more CPU resources available at the storage layer than at the computing layer. With the X5 generation, the storage server has gone from two six-core CPUs with X4 to two eight-core CPUs with X5, while the computing layer for the two socket servers has gone from two 12-core CPUs with X4 to two 18-core CPUs with X5, which means the number of cores is higher on the computing layer with the X5 generation two socket servers. Needless to say, the new “elastic configurations,” meaning any kind of configuration could be created, abandons the concept of a balance between computing and storage layer altogether and should only be considered as a last resort and with great care.
Since Smart Scans offload a lot of processing to the storage cells, every query involving Smart Scans is effectively parallelized across the CPUs on the storage cells. This type of parallelization is completely independent from the traditional database parallel processing. The Smart Scan parallelization occurs even when the activity is driven by a single process on a single database server. This introduces some interesting issues that should be considered with regard to normal parallelization at the database tier. Since one of the primary jobs of a parallelized query is to allow multiple processes to participate in the I/O operations and since the I/O operations are already spread across multiple processes, the degree of parallelism required by statements running on the Exadata platform should be smaller than on other platforms.
Auto DOP
One of the major changes to parallel operations in Oracle Database 11g Release 2 was the addition of a feature affectionately known as Auto DOP (automatic degree of parallelism). It was designed to overcome the problems associated with the fact that there is rarely a single DOP value that is appropriate for all queries touching a particular object. Prior to 11gR2, the DOP could be specified at the statement level via hints or at the object level via the DEGREE and INSTANCE settings. Realistically, using hints at the statement level makes more sense in most situations for the reason just mentioned. However, it requires that the developers understand the platform that the statements will be running on and the workload that the hardware will be supporting at the time of execution, as well the concurrency of other processes requiring resources. Getting the settings correct can be a tedious trial-and-error process and, unfortunately, the DOP cannot be changed while a statement is running. Once it starts, your only options are to let it complete or kill it, change the DOP settings, and try again. This makes fine-tuning in a “live” environment a painful process.
Operation and Configuration
When Auto DOP is enabled, Oracle evaluates each statement to determine whether it should be run in parallel and, if so, what DOP should be used. Basically, any statement that the optimizer concludes will take longer than 10 seconds to run serially will be a candidate to run in parallel. The 10-second threshold can be controlled by setting the PARALLEL_MIN_TIME_THRESHOLD parameter. This decision is made regardless of whether any of the objects involved in the statement have been decorated with a parallel degree setting or not.
Auto DOP is enabled by setting the PARALLEL_DEGREE_POLICY parameter to a value of AUTO, LIMITED, or ADAPTIVE (12c). The default setting for this parameter is MANUAL, which disables all three of the new 11gR2 parallel features (Auto DOP, Parallel Statement Queueing, In-Memory Parallel Execution). Unfortunately, PARALLEL_DEGREE_POLICY is one of those parameters that control more than one thing. The following list shows the effects of the various settings for this parameter:
Although the only documented way to enable Parallel Statement Queueing and In-Memory Parallel Execution is via the all-or-nothing setting of AUTO or ADAPTIVE, the developers have thoughtfully provided hidden parameters that provide independent control of these features. Table 6-3 shows the parameters and how the settings of PARALLEL_DEGREE_POLICY alter the hidden parameters.
Table 6-3. Hidden Parameters Affected by PARALLEL_DEGREE_POLICY
Parallel_Degree_Policy |
Parameter |
Value |
---|---|---|
MANUAL |
_parallel_statement_queuing |
FALSE |
_parallel_cluster_cache_policy |
ADAPTIVE | |
LIMITED |
_parallel_statement_queuing |
FALSE |
_parallel_cluster_cache_policy |
ADAPTIVE | |
AUTO |
_parallel_statement_queuing |
TRUE |
_parallel_cluster_cache_policy |
CACHED | |
ADAPTIVE (12c) |
_parallel_statement_queuing |
TRUE |
_parallel_cluster_cache_policy |
CACHED |
It’s pretty obvious what the _PARALLEL_STATEMENT_QUEUING parameter controls. When it is set to TRUE, queueing is enabled. The purpose of the _PARALLEL_CLUSTER_CACHE_POLICY parameter is a little less obvious. It turns out that it controls In-Memory Parallel Execution. Setting the value of the _PARALLEL_CLUSTER_CACHE_POLICY parameter to CACHED enables In-Memory Parallel Execution. You should note that In-Memory Parallel Execution is arguably of less value on the Exadata platform because the Smart Scan optimizations will not be available when using this feature to use parallel processes to scan and use data in the buffer cache. We will discuss that in more detail a little later. In the meantime, here is an example showing Auto DOP in action:
SQL> select owner, table_name, status, last_analyzed, num_rows, blocks, degree
2 from dba_tables where owner = 'MARTIN' and table_name = 'BIGT';
OWNER TABLE_NAME STATUS LAST_ANAL NUM_ROWS BLOCKS DEGREE
------ -------------- -------- --------- --------- ---------- --------
MARTIN BIGT VALID 24-MAR-15 100000000 16683456 1
SQL> alter system set parallel_degree_policy=auto;
System altered.
SQL> select /* frits1 */ avg(id) from martin.bigt;
AVG(ID)
-----------
500000005
SQL> @find_sql
Enter value for sql_text: %frits1%
Enter value for sql_id:
SQL_ID CHILD PLAN_HASH EXECS ETIME AVG_ETIME USERNAME SQL_TEXT
------------- ----- ---------- ----- -------- --------- -------- ---------------------------
djwmfmzgtjfqu 1 3043090422 1 1087.45 1087.45 SYS select /* frits1 */ avg(id)
from martin.bigt
SQL> !cat dplan.sql
set verify off
set pages 9999
set lines 150
select * from table(dbms_xplan.display_cursor('&sql_id','&child_no',''))
/
SQL> @dplan
Enter value for sql_id: djwmfmzgtjfqu
Enter value for child_no:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID djwmfmzgtjfqu, child number 1
-------------------------------------
select /* frits1 */ avg(id) from martin.bigt
Plan hash value: 3043090422
------------------------------------------------------------------------------------------------------
|Id|Operation | Name |Rows|Bytes|Cost(%CPU)|Time | TQ |IN-OUT|PQ Distrib |
------------------------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | | | 104K(100)| | | | |
| 1| SORT AGGREGATE | | 1 | 7 | | | | | |
| 2| PX COORDINATOR | | | | | | | | |
| 3| PX SEND QC (RANDOM) |:TQ10000| 1 | 7 | | |Q1,00| P->S |QC (RAND) |
| 4| SORT AGGREGATE | | 1 | 7 | | |Q1,00| PCWP | |
| 5| PX BLOCK ITERATOR | |100M| 667M| 104K (1)|00:00:05 |Q1,00| PCWC | |
|*6| TABLE ACCESS STORAGE FULL| BIGT |100M| 667M| 104K (1)|00:00:05 |Q1,00| PCWP | |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - storage(:Z>=:Z AND :Z<=:Z)
Note
-----
- automatic DOP: Computed Degree of Parallelism is 48 because of degree limit
As you can see, enabling Auto DOP allowed the statement to be parallelized, even though the table was not decorated with a parallel setting (which means the degree property of the table was set to 1, not an higher value or DEFAULT). Also, notice that the plan output produced by DBMS_XPLAN shows that Auto DOP was enabled and the calculated DOP was 48, and that the value 48 was limited by “degree limit.” In fact, this is set by the parameter PARALLEL_DEGREE_LIMIT. In this case, the parameter PARALLEL_DEGREE_LIMIT is set to CPU (default), which means the limit is PARALLEL_THREADS_PER_CPU * CPU_COUNT * instances, which in this case of this example is 1 * 24 * 2, which is 48.
I/O Calibration
Oracle Database version 11.2.0.2 introduced a restriction to Auto DOP requiring that the I/O system be calibrated before statements will be automatically parallelized. This restriction is lifted with version 12c. The calibration is done by the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure, which generates a random read-only workload and spreads it across all instances in a RAC cluster. The procedure can put a significant load on the system. The documentation recommends running it when the system is idle or very lightly loaded. Here is an example of what happens if the calibration procedure has not been run and PARALLEL_DEGREE_POLICY has been set to limited, auto, or adaptive on 11.2.0.4:
SQL> @dplan
Enter value for sql_id: 05cq2hb1r37tr
Enter value for child_no:
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID 05cq2hb1r37tr, child number 0
-------------------------------------
select avg(pk_col) from kso.skew a where col1 > 0
Plan hash value: 568322376
---------------------------------------------------------------------------
| Id | Operation | Name | Rows| Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 44298 (100)| |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| SKEW | 32M| 335M| 44298 (1)| 00:01:29 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL1">0)
Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing
23 rows selected.
As you can see, when the I/O calibration has not been done, Auto DOP is disabled and the optimizer generates a serial execution plan. When automatic DOP is enabled on version 12.1.0.1 and above and I/O calibration has not been done, the optimizer will do the automatic DOP consideration! There are two views that provide additional information about the calibration. The V$IO_CALIBRATION_STATUS view shows whether the calibration has been done, and the DBA_RSRC_IO_CALIBRATE view shows the results of the procedure. Here is an example showing how to the calibration details, count the number of disks in the DATA disk group, and use the CALIBRATE_IO procedure:
SQL> select * from V$IO_CALIBRATION_STATUS;
STATUS CALIBRATION_TIME
------------- -------------------------------------------------------------
NOT AVAILABLE
If you are unsure about the number of disks, you can query the number cell disks with the DATA prefix as seen by the database to calibrate. If your DATA disk group has a slightly different name, like the rack name added to DATA, add that between DATA and CD:
SQL> select count(*) from v$asm_disk where path like '%DATA_CD%';
COUNT(*)
----------
36
SQL> !cat calibrate_io.sql
SET SERVEROUTPUT ON
DECLARE
lat INTEGER;
iops INTEGER;
mbps INTEGER;
BEGIN
-- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops, mbps, lat);
DBMS_RESOURCE_MANAGER.CALIBRATE_IO (&no_of_disks, 10, iops, mbps, lat);
DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
DBMS_OUTPUT.PUT_LINE ('latency = ' || lat);
dbms_output.put_line('max_mbps = ' || mbps);
end;
/
SQL> @calibrate_io
Enter value for no_of_disks: 36
The database procedure will run for some time—around 15 minutes on our system. The procedure will cause the database to spawn csnn processes on all instances, for which the amount of these processes is set by the the number of disks parameter. If the procedure is finished, it will show the result of the calibration. For example:
max_iops = 11237
latency = 8
max_mbps = 5511
PL/SQL procedure successfully completed.
SQL> select * from V$IO_CALIBRATION_STATUS;
STATUS CALIBRATION_TIME CON_ID
------------- ------------------------------------------------------------------- ----------
READY 24-MAY-15 07.17.45.086 AM 0
SQL> select start_time, MAX_IOPS, MAX_MBPS, MAX_PMBPS, LATENCY, NUM_PHYSICAL_DISKS
from DBA_RSRC_IO_CALIBRATE;
START_TIME MAX_IOPS MAX_MBPS MAX_PMBPS LATENCY NUM_PHYSICAL_DISKS
------------------------------ ---------- ---------- ---------- ---------- ------------------
24-MAY-15 07.11.38.691358 AM 11237 5511 400 8 36
The changes require a database restart to take effect. Once restarted, Auto DOP is using the calibration details. In certain cases, it might not be feasible to run the calibration, or a bug affecting calibration (11.2.0.2 before before bundle patch 4) to generate a too high number for Auto DOP, which means it will not consider it. In these cases, the important calibration details can be manually set. This is described in My Oracle Support (MOS) note 1269321.1. Here is how it is done:
SQL> delete from resource:io_calibrate$;
SQL> insert into resource:io_calibrate$ values(current_timestamp,
2 current_timestamp, 0, 0, 200, 0, 0);
SQL> commit;
A database restart is required after setting the value. The value of 200 is recommended by Oracle Support for customers running on Exadata, and it is what Oracle uses to test Auto DOP on Exadata.
Auto DOP Wrap-Up
The end result of setting PARALLEL_DEGREE_POLICY to AUTO is that all kinds of statements will be run in parallel, even if no objects have been specifically decorated with a parallel degree setting. This is truly automatic parallel processing because the database decides what to run in parallel and with how many slaves. On top of that, by default, the slaves may be spread across multiple nodes in a RAC database. Unfortunately, this combination of features is a little like the Wild West with things running in parallel all over the place. However, the ability to queue parallel statements does provide some semblance of order, which leads us to the next topic.
We found Auto DOP being too optimistic about the calculated DOP in a lot of cases. A parameter worth pointing out is PARALLEL_DEGREE_LEVEL, which can be used to dial down the calculated DOP when setting the value of the parameter to a value lower than the default value 100. Other ways of limiting (Auto) DOP are PARALLEL_DEGREE_LIMIT and the database resource manager.
When Oracle first introduced the Parallel Query feature with Oracle version 7, Larry Ellison did a demo using a multiprocessor server, on which he was the only user. The individual CPU utilization was shown graphically, and all the CPUs where running full force when the Parallel Query feature was demoed. We wonder what would have happened if there were other database users in that database and what their experience would be during that demo. Probably their experience would not have been a good one. That is exactly what Parallel Statement Queueing tries to solve.
Oracle’s parallel capabilities have been a great gift, but they have also been a curse because controlling the beast in an environment where there are multiple users trying to share the resources is difficult at best. There have been attempts to come up with a reasonable way of throttling big parallel statements. But to date I do not think those attempts have been overly successful.
One of the most promising aspects of Exadata is its potential to run mixed workloads (OLTP and DW) without crippling one or the other. In order to do that, Oracle needs some mechanism to separate the workloads and, just as importantly, to throttle the resource intensive parallel queries. Parallel Statement Queueing appears to be just such a tool. And when combined with the Resource Manager, it provides a pretty robust mechanism for throttling the workload to a level that the hardware can support.
The Old Way
Before we get to the new Parallel Queueing functionality, we should probably review how it was done in previous versions. The best tool we had at our disposal was Parallel Adaptive Multiuser, which provided the ability to automatically downgrade the degree of parallelism for a given statement based on the workload when a query executes. It was actually a powerful mechanism and it was the best approach we had prior to 11gR2. This feature is enabled by setting the PARALLEL_ADAPTIVE_MULTI_USER parameter to TRUE. This is still the default in 12c, by the way, so this is definitely a parameter that you may want to consider changing. The downside of this approach is that parallelized statements can have a wildly varying degree of parallism and thus execution times. As you can imagine, a statement that gets 32 slaves one time and then gets downgraded to serial execution the next time will probably not make the users very happy.
The argument for this type of approach is that queries are going to run slower if the system is busy, regardless of what you do, and that users expect it to run slower when the system is busy. The first part of that statement may be true, but I do not believe the second part is (at least in most cases). The bigger problem with the downgrade mechanism, though, is that the decision about how many slaves to use is based on a single point in time—the point when the parallel statement starts. Recall that once the degree of parallelism (DOP) is set for an execution plan, it cannot be changed. The statement will run to completion with the number of slaves it was assigned, even if additional resources become available while it is running.
Consider the statement that takes one minute to execute with 32 slaves, and suppose that same statement gets downgraded to serial due to a momentarily high load. Now say that a few seconds after it starts, the system load drops back to more normal levels. Unfortunately, the serialized statement will continue to run for nearly 30 minutes with its single process, even though on average the system is not busier than usual. This sort of erratic performance can wreak havoc for those using the system and for those supporting it.
The New Way
Now let’s compare Parallel Adaptive Multi User (the old way) with the new mechanism introduced in 11gR2 that allows parallel statements to be queued. This mechanism separates long running parallel queries from the rest of the workload. The mechanics are pretty simple. Turn the feature on. Set a target number of parallel slaves using the PARALLEL_SERVERS_TARGET parameter. Run what should be a resource intensive query. If a statement that requires exceeding the target tries to start, it will be queued until the required number of slaves becomes available. There are, of course, many details to consider and other control mechanisms that can be applied to manage the process. Let’s look at how it behaves:
SQL> alter system set parallel_degree_policy=auto;
System altered.
SQL> alter system set parallel_servers_target=10;
System altered.
SQL> @parms
Enter value for parameter: parallel%
Enter value for isset:
Enter value for show_hidden:
NAME VALUE ISDEFAUL
-------------------------------------------------- --------- --------
parallel_adaptive_multi_user TRUE FALSE
parallel_automatic_tuning FALSE TRUE
parallel_degree_level 100 TRUE
parallel_degree_limit CPU TRUE
parallel_degree_policy AUTO FALSE
parallel_execution_message_size 16384 FALSE
parallel_force:local FALSE TRUE
parallel_instance:group TRUE
parallel_io_cap_enabled FALSE TRUE
parallel_max_servers 240 FALSE
parallel_min_percent 0 TRUE
parallel_min_servers 0 FALSE
parallel_min_time_threshold AUTO TRUE
parallel_server TRUE TRUE
parallel_server_instances 2 TRUE
parallel_servers_target 10 TRUE
parallel_threads_per_cpu 1 FALSE
In order to execute multiple SQLs that get to use automatic DOP and make them queue, we need a little script, which is shown below. The script executes 10 sqlplus processes, which are put into the background using “&.” The “wait” command at the end waits for all backgrounded processes to finish.
T=0
while [ $T -lt 10 ]; do
echo "select avg(id) from t1;" | sqlplus -S ts/ts &
let T=$T+1
done
wait
echo "Finished."
Now the script above is executed. If we look at the DBMS_XPLAN output via we see Auto DOP calculated 10 slaves for the execution because of PARALLEL_SERVERS_TARGET set to that value above:
SQL> @dplan b46903fft8uz4
Enter value for sql_id: b46903fft8uz4
Enter value for child_no:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID b46903fft8uz4, child number 2
-------------------------------------
select avg(id) from t1
Plan hash value: 3110199320
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows| Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 64140 (100)| | | | |
| 1 | SORT AGGREGATE | | 1 | 6 | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 6 | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 6 | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 100M| 572M| 64140 (1)| 00:00:02 | Q1,00 | PCWC | |
|* 6 | TABLE ACCESS STORAGE FULL| T1 | 100M| 572M| 64140 (1)| 00:00:02 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - storage(:Z>=:Z AND :Z<=:Z)
Note
-----
- automatic DOP: Computed Degree of Parallelism is 10
Directly after execution of the script to start up multiple sqlplus sessions to execute the parallellized table scan, we can see the statement queueing in effect. First, if we look at the number of parallel query processes active, we see that the number of parallel query processes active does not exceed the limit of 10 set:
SQL> select * from v$px_process_sysstat where statistic like '%In Use%';
STATISTIC VALUE CON_ID
------------------------------ ---------- ----------
Servers In Use 10 0
If fact, the reason the limit is not exceeded is because of the Parallel Statement Queueing, which can be seen in V$SQL_MONITOR and in the wait interface:
SQL> select sid, sql_id, sql_exec_id, sql_text from v$sql_monitor where status='QUEUED';
SID SQL_ID SQL_EXEC_ID SQL_TEXT
----- ------------- ----------- ------------------------------
588 b46903fft8uz4 16777250 select avg(id) from t1
396 b46903fft8uz4 16777254 select avg(id) from t1
331 b46903fft8uz4 16777251 select avg(id) from t1
265 b46903fft8uz4 16777253 select avg(id) from t1
463 b46903fft8uz4 16777255 select avg(id) from t1
654 b46903fft8uz4 16777256 select avg(id) from t1
263 b46903fft8uz4 16777252 select avg(id) from t1
200 b46903fft8uz4 16777249 select avg(id) from t1
166 b46903fft8uz4 16777249 select avg(id) from t1
9 rows selected.
SQL> @snapper ash=event+wait_class 1 1 all
Sampling SID all with interval 1 seconds, taking 1 snapshots...
-- Session Snapper v4.11 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
---------------------------------------------------------------
Active% | EVENT | WAIT_CLASS
---------------------------------------------------------------
800% | resmgr:pq queued | Scheduler
700% | cell smart table scan | User I/O
300% | ON CPU | ON CPU
-- End of ASH snap 1, end=2015-05-25 07:17:51, seconds=1, samples_taken=5
PL/SQL procedure successfully completed.
There are several things worth mentioning in this listing. To set up the desired conditions, we turned on Auto DOP, which also enables Parallel Queueing, and then set the PARALLEL_SERVER_TARGET parameter to a very low number (10) in order to trigger queueing more easily. We then used a shell script to execute 10 statements that are subject to Auto DOP. When looking at V$PX_PROCESS_SYSSTAT, there are no more than 10 parallel query servers active, as was set with the PARALLEL_SERVERS_TARGET parameter. V$SQL_MONITOR showed that the statements were indeed queueing. This is an important point. All statements using parallel query will show up in the V$SQL_MONITOR view. Please mind this view requires the tuning pack license. If they have a status of QUEUED, they are not actually executing but are instead waiting until enough parallel slaves become available. We ran Tanel Poder’s Snapper script to see what event the queued statements were waiting on. As you can see, it was the resmgr: pq queued wait event.
Note There is one other thing you should be aware of regarding the wait events. There is a wait event change that relates to parallel queueing. This example was created using Oracle Database version 12.1.0.2. If you are using 11.2.0.1, you will see a different set of wait events (there are two). The first is PX Queuing: statement queue. This is the event that a statement waits on when it is next to run. The other is enq: JX - SQL statement queue. This event is what a statement waits on when there are other statements ahead of it in the queue. This scheme seems quite unwieldy, which is probably why it was changed in the later release.
There are several mechanisms in place for controlling how the Parallel Statement Queueing feature behaves. The basic approach is to use a first-in, first-out queueing mechanism. But there are ways to prioritize work within the queueing framework. It is also possible to completely bypass the queueing mechanism via a hint. And, conversely, it is possible to enable queueing for a statement via a hint even when the Parallel Statement Queueing feature is not enabled at the database level. There are also a few parameters that affect the queueing behavior. And, finally, the Resource Manager has the capability to affect how statements are queued.
Controlling Queueing with Parameters
There are a handful of parameters that affect how Parallel Queueing behaves. The first two are PARALLEL_MAX_SERVERS, which sets the maximum number of parallel query slaves per instance, and the PARALLEL_SERVERS_TARGET parameter, which sets the amount of parallel query slaves to be in use after which statements that want to execute parallel need to be queued.
The default value of PARALLEL_MAX_SERVERS is calculated by:
parallel_threads_per_cpu * cpu_count * concurrent_parallel_users * 5
The default value of PARALLEL_SERVERS_TARGET is calculated by:
parallel_threads_per_cpu * cpu_count * concurrent_parallel_users * 2
Concurrent_parallel_users is determined by:
− If SGA_TARGET is set: 4.
− If PGA_AGGREGATE_TARGET is set: 2.
− Else 1.
This calculated value is almost certainly higher than you would want for most mixed workload systems, as it is geared at completely consuming the available CPU resources with parallel query processes. Allowing long-running parallel statements to consume the server completely means that response time-sensitive, OLTP-type statements could suffer. You should also note that it is possible to have more server processes active than the parameter allows. Since the number of slaves assigned to a query may be twice the DOP, the target can occasionally be exceeded.
In 12c, if the PARALLEL_STMT_CRITICAL resource management directive is used (discussed later in this chapter), it is recommended that the PARALLEL_SERVERS_TARGET be set to 50–75% of the PARALLEL_MAX_SERVERS so that critical queries that bypass the parallel queue can make use of the remaining parallel slave processes.
Note The PARALLEL_MAX_SERVERS parameter will be reduced to a value below the parameter PROCESSES if the calculated value of PARALLEL_MAX_SERVERS is greater than the value of PROCESSES parameter. You will see the occurrence of this behavior on the alert log at instance startup.
Mon May 06 18:43:06 2013
Adjusting the default value of parameter parallel_max_servers
from 160 to 135 due to the value of parameter processes (150)
Starting ORACLE instance (normal)
Another parameter that deserves some discussion is the hidden parameter, _PARALLEL_STATEMENT_QUEUING, which turns the feature on and off. As already discussed in the Auto DOP section, this parameter is set to TRUE when the PARALLEL_DEGREE_POLICY parameter is set to AUTO. However, the hidden parameter can also be set manually to turn Parallel Queueing off and on independently.
Auto DOP calculations are still a little scary, so it is nice that there is a way to turn on the Parallel Queueing feature without enabling Oracle to take complete control of which statements run in parallel. Of course, since this involves setting a hidden parameter, you should not do this in a production environment without approval from Oracle support. Nevertheless, here is another quick example showing that queueing can be turned on without enabling Auto DOP or In-Memory Parallel Execution:
SQL > alter system set parallel_degree_policy=manual sid='*';
System altered.
SQL > alter table ts.t1 parallel (degree 8);
Table altered.
SQL> @parms
Enter value for parameter: parallel
Enter value for isset:
Enter value for show_hidden:
NAME VALUE
-------------------------------------------------- ------
parallel_adaptive_multi_user TRUE
parallel_automatic_tuning FALSE
parallel_degree_level 100
parallel_degree_limit CPU
parallel_degree_policy MANUAL
parallel_execution_message_size 16384
parallel_force:local TRUE
parallel_instance:group
parallel_io_cap_enabled FALSE
parallel_max_servers 128
parallel_min_percent 0
parallel_min_servers 32
parallel_min_time_threshold AUTO
parallel_server TRUE
parallel_server_instances 2
parallel_servers_target 10
parallel_threads_per_cpu 1
T=0
while [ $T -lt 10 ]; do
echo "select avg(id) from t1;" | sqlplus -S ts/ts &
let T=$T+1
done
wait
echo "Finished."
SQL> select * from v$px_process_sysstat where statistic like '%In Use%';
STATISTIC VALUE CON_ID
------------------------------ ----- ----------
Servers In Use 80 0
First, we set PARALLEL_DEGREE_POLICY to manual. This disables automatic DOP and statement queueing and In-Memory Parallel Query. Next, we decorated the T1 table with a degree of eight. Then we looked at the values of the parameters that have PARALLEL in the name. For the sake of the example, we have set PARALLEL_FORCE_LOCAL to TRUE, so all the sqlplus sessions of the above script will use parallel query slaves in the current instance. Then we executed the script. As you can see, the 10 sessions all allocated their eight parallel query slaves because we see 80 servers in use.
Now set the undocumented parameter _PARALLEL_STATEMENT_QUEUING to TRUE and re-run the script to create 10 sessions:
SQL> alter system set "_parallel_statement_queuing"=true sid='*';
System altered.
T=0
while [ $T -lt 10 ]; do
echo "select avg(id) from t1;" | sqlplus -S ts/ts &
let T=$T+1
done
wait
echo "Finished."
SQL> select * from v$px_process_sysstat where statistic like '%In Use%';
STATISTIC VALUE CON_ID
------------------------------ ----- ----------
Servers In Use 12 0
This output shows statement queueing enabled without turning on Auto DOP. Please note that this involved a hidden parameter, which means that you should discuss setting this parameter with Oracle Support if you want to apply this technique in production.
Controlling Statement Queueing with Hints
There are two hints that can be used to control Parallel Statement Queueing at the statement level. One hint, NO_STATEMENT_QUEUING, allows the queueing process to be completely bypassed, even if the feature is turned on at the instance level. The other hint, STATEMENT_QUEUING, turns on the queueing mechanism, even if the feature is not enabled at the instance level. The STATEMENT_QUEUING hint provides a documented avenue for using the queuing feature without enabling Auto DOP.
Controlling Queueing with Resource Manager
Oracle’s Database Resource Manager (DBRM) provides additional capability to control Parallel Statement Queueing. While a thorough discussion of DBRM is beyond the scope of this chapter, we will cover some specific features related to parallel query. Chapter 7 covers DBRM in more detail.
Without DBRM, the parallel statement queue behaves strictly as a first-in, first-out (FIFO) queue. DBRM provides several directive attributes that can be used to provide additional control on a consumer group basis. Many of these controls were introduced in version 11.2.0.2. Table 6-4 contains a list of additional capabilities provided by DBRM.
Table 6-4. DBRM Parallel Statement Queueing Controls
Control |
Description |
---|---|
Specify a Timeout |
The PARALLEL_QUEUE_TIMEOUT directive attribute can be used to set a maximum queue time for a consumer group. The time limit is set in seconds and, once it has expired, the statement will terminate with an error (ORA-07454). Note that this directive did not become available until version 11.2.0.2 of the database. |
Specify Maximum DOP |
The PARALLEL_DEGREE_LIMIT_P1 directive attribute sets a maximum number of parallel slaves that may be assigned to an individual statement. This is equivalent to the PARALLEL_DEGREE_LIMIT database parameter but is used to set limits for different sets of users based on consumer groups. |
Manage Order of Dequeueing |
The MGMT_P1, MGMT_P2, ... MGMT_P8 directive attributes can be used to alter the normal FIFO processing. This attribute allows prioritization of dequeueing aside from being used as a resource percentage allocation for CPU & I/O (Intra-database IO Resource Management). Each of the eight attributes essentially provides a distinct dequeueing priority level. All statements with an MGMT_P1 attribute will be dequeued prior to any statement with MGMT_P2. In addition to the dequeueing priority, a probability number can be assigned to regulate dequeueing of statements within the same level. |
Limit Percentage of Parallel Slaves |
The PARALLEL_TARGET_PERCENTAGE directive attribute can be used to limit a consumer group to a percentage of the parallel slaves available to the system. So, if a system allowed 64 slaves to be active before starting to queue and PARALLEL_TARGET_PERCENTAGE was set to 50, the consumer group would only be able to consume 32 slaves. Note that this directive did not become available until version 11.2.0.2 of the database. In 12c the PARALLEL_SERVER_LIMIT replaced this directive. |
Queue Multiple SQLs as a Set |
The BEGIN_SQL_BLOCK and END_SQL_BLOCK procedures in DBMS_RESOURCE_MANAGER package work with Parallel Statement Queueing by treating individual statements as if they had been submitted at the same time. The idea is that the all the statements in the block are all dequeued, preventing individual statements from being kept queued. The mechanism requires surrounding independent SQL statements with calls to the BEGIN and END procedures. Note that this procedure did not become available until version 11.2.0.2 of the database. |
Critical parallel statement prioritization |
With version 12c, the PARALLEL_STMT_CRITICAL directive is introduced, which can make parallel statements bypass the statement queue when the directive is set to BYPASS_QUEUE for a consumer group. All queries issued by users belonging to a consumer group where the attribute is set to BYPASS_QUEUE will bypass the parallel statement queue and execute immediately. With the bypass mechanism, the total number of parallel slaves requested may be greater than PARALLEL_SERVERS_TARGET. Please be aware the critical queries are run regardless of actual parallel query processes currently in use, which means that if current use of parallel query slaves already reached PARALLEL_MAX_SERVERS, it is possible to encounter downgrades or, even worse, serial execution. This is why you should keep some headroom between PARALLEL_SERVERS_TARGET and PARALLEL_MAX_SERVERS. |
The DBRM directives can be quite involved. Here is a modified example from the Oracle® Database VLDB and Partitioning Guide 12c Release 1 (12.1.0.2) that shows the directives making use of the Parallel Statement Queueing features:
BEGIN
DBMS_RESOURCE_MANAGER.CLEAR_PENDING_AREA();
DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();
--new plan
DBMS_RESOURCE_MANAGER.CREATE_PLAN(
'DAYTIME_PLAN',
'Plan that priorizes short running queries'
);
--consumer groups
DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(
'MEDIUM_TIME',
'Medium: time running between 1 and 10 minutes'
);
DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(
'LONG_TIME',
'Long: time running more than 10 minutes'
);
--directives
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'DAYTIME_PLAN',
'SYS_GROUP',
'Directive for SYS and high priority queries',
MGMT_P1 => 100,
PARALLEL_STMT_CRITICAL => 'BYPASS_QUEUE'
);
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'DAYTIME_PLAN',
'OTHER_GROUPS',
'Directive for SQL running less than 1 minute',
MGMT_P2 => 70,
PARALLEL_DEGREE_LIMIT_P1 => 4,
SWITCH_TIME => 60,
SWITCH_ESTIMATE => TRUE,
SWITCH_FOR_CALL => TRUE,
SWITCH_GROUP => 'MEDIUM_TIME'
);
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'DAYTIME_PLAN',
'MEDIUM_TIME',
'Directive for SQL running between 1 and 10 minutes',
MGMT_P2 => 20,
PARALLEL_SERVER_LIMIT => 75,
SWITCH_TIME => 600,
SWITCH_ESTIMATE => TRUE,
SWITCH_FOR_CALL => TRUE,
SWITCH_GROUP => 'LONG_TIME'
);
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'DAYTIME_PLAN',
'LONG_TIME',
'Directive for SQL running more than 10 minutes',
MGMT_P2 => 10,
PARALLEL_SERVER_LIMIT => 50,
PARALLEL_QUEUE_TIMEOUT => 3600
);
DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA();
DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
END;
/
This example bears some explanation:
There are a few additional settings that have to be done before this plan can truly work. A session needs to be granted a consumer group to be its initial consumer group if the consumer group is other than OTHER_GROUPS. This is not the case in our example, so we do not have to use the DBMS_RESOURCE_MANAGER.SET_INITIAL_CONSUMER_GROUP procedure. However, we want sessions to switch to the MEDIUM_TIME or LONG_TIME group when the rules as set in the directive apply. This means we must use the DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP procedure before this plan works. In this example, the KSO user is granted the switches. This needs to be done for all the database users that are subject to this plan:
BEGIN
DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP('KSO','MEDIUM_TIME',FALSE);
DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP('KSO','LONG_TIME',FALSE);
END;
/
Parallel Statement Queueing Wrap-Up
This section has been considerably more detailed than the coverage of the other two new parallel features in 11gR2. It also contains some of the enhancements that were added in version 12c. That is because this feature is a critical component in allowing Exadata to handle mixed workloads effectively. It enables us to effectively handle a mixture of statements that are sensitive to both throughput and response time. Without this feature, it would be very difficult to provide adequate resources without severely compromising one or the other.
In-Memory Parallel Execution
Prior to 11gR2, queries that were parallelized totally ignored the buffer cache. Oracle assumed that parallel queries would only be done on very large tables that would probably never have a large percentage of their blocks in the buffer cache. This assumption led to the conclusion that it would be faster to just read the data from disk. In addition, flooding the buffer cache with a large number of blocks from a full table scan was not desirable, so Oracle developed a mechanism called direct path reads, which bypassed the normal caching mechanism in favor of reading blocks directly in to the user’s Program Global Area (PGA).
The In-Memory Parallel Execution feature takes a different approach. It attempts to make use of the buffer cache for parallel queries. The feature is cluster-aware and is designed to spread the data across the cluster nodes (that is, the RAC database instances). The data blocks are also affinitized to a single node, reducing the number of communication and data transfers between the nodes. The goal, of course, is to speed up the parallel query by eliminating disk I/O. This can be a viable technique because many systems now have very large amounts of memory, which of course can provide a significant speed advantage over disk operations. There are some downsides to this approach though. The biggest disadvantage with regard to Exadata is that all the Smart Scan optimizations are disabled by this feature. In this respect, the new option introduced in 12c to split the buffer cache into a Big Table Scan Cache and the OLTP cache by means of an initialization parameter offers little consolation—a Smart Scan is most likely faster than an n-way scan across multiple buffer caches.
Note that we have not actually seen In-Memory Parallel Query in the wild on Exadata. This is probably good since many of the optimizations built into Exadata rely on offloading, which depends on direct path reads. Of course, direct path reads will not be done if blocks are being accessed in memory on the database servers. On most platforms, memory access would be much faster than direct path reads from disk. But with Exadata, eliminating the disk I/O also eliminates a large number of CPUs that could be applied to filtering and other operations. This means that accessing the buffer cache for parallel execution could actually be less efficient than allowing the storage servers to participate in the execution of the SQL statement.
Note The 12.1.0.2 database patchset introduced the In-Memory Option (In-Memory Column Store, Vector Processing, and In-Memory Aggregation). Version 12.1.0.1 introduced other In-Memory Caching (Automatic Big Table Caching and Full Database Caching) features, which further speeds up the analytical capabilities of Exadata while simultaneously running the OLTP. Although these are really great new features, covering them further is beyond the scope of this chapter.
A little demonstration is probably in order at this point. It is worth noting that getting this feature to kick in at all takes quite a bit of effort. Here are the basic steps we had to take in order to get it to work. First, we had to find a query that the optimizer estimated would run for longer than the number of seconds specified by the PARALLEL_MIN_TIME_THRESHOLD parameter (assuming the statement was not parallelized). The default for this parameter is AUTO, meaning 10 seconds. We set this parameter to one second to make it easier to trigger the In-Memory Parallel behavior. This query also had to be on a table that would almost completely fit in the aggregate buffer cache provided by the combination of all the participating RAC instances. To simplify things, we limited the processing to a single instance by setting the PARALLEL_FORCE_LOCAL parameter to TRUE. Of course, we had to set PARALLEL_DEGREE_POLICY to AUTO to enable the feature. We also set very low values for PARALLEL_SERVERS_TARGET and PARALLEL_DEGREE_LIMIT. Here are the parameter settings and pertinent information about the query we used to test the feature:
SYS@dbm2> @parms
Enter value for parameter: parallel
Enter value for isset:
Enter value for show_hidden:
NAME VALUE ISDEFAUL
-------------------------------------------------- -------- ----------
fast_start_parallel_rollback LOW TRUE
parallel_adaptive_multi_user FALSE FALSE
parallel_automatic_tuning FALSE TRUE
parallel_degree_level 100 TRUE
parallel_degree_limit 8 FALSE
parallel_degree_policy AUTO FALSE
parallel_execution_message_size 16384 FALSE
parallel_force:local TRUE FALSE
parallel_instance:group TRUE
parallel_io_cap_enabled FALSE TRUE
parallel_max_servers 720 TRUE
parallel_min_percent 0 TRUE
parallel_min_servers 0 FALSE
parallel_min_time_threshold 1 FALSE
parallel_server TRUE TRUE
parallel_server_instances 4 TRUE
parallel_servers_target 8 FALSE
parallel_threads_per_cpu 1 FALSE
recovery_parallelism 0 TRUE
19 rows selected.
SYS@dbm2> @pool_mem
AREA MEGS
------------------------------ ----------
384.0
free memory 1,162.5
fixed_sga 7.3
streams pool .0
log_buffer 248.7
shared pool 2,011.3
large pool 26.2
buffer_cache 4,352.0
----------
sum 8,192.0
8 rows selected.
SYS@dbm2> @table_size
Enter value for owner: KSO
Enter value for table_name: SKEWIMPQ
Enter value for type:
OWNER SEGMENT_NAME TYPE TOTALSIZE_MEGS TABLESPACE_NAME
------------ ------------------------------ -------------- -------------- ------------------
KSO SKEWIMPQ TABLE 3,577.0 USERS
--------------
sum 3,577.0
SYS@dbm2> select owner, table_name, status, last_analyzed, num_rows, blocks, degree, cache
2 from dba_tables where owner = 'KSO' and table_name = 'SKEWIMPQ';
OWNER TABLE_NAME STATUS LAST_ANALYZED NUM_ROWS BLOCKS DEGREE CACHE
---------- ---------- -------- ------------------- ---------- ---------- ------ ----
KSO SKEWIMPQ VALID 2015-02-14:06:52:31 89599778 451141 1 Y
So, the buffer cache on this instance is about 4.3G, and the table is about 3.5G. The query we used is simple and will not benefit from storage indexes, as virtually all the records satisfy the single WHERE clause:
SYS@dbm2> select count(*) from kso.skewimpq;
COUNT(*)
----------
89599778
1 row selected.
SYS@dbm2> select count(*) from kso.skewimpq where col1 > 0;
COUNT(*)
----------
89599776
1 row selected.
Now we will show some statistics prior to running the query, after running the query for the first time, and after running the query for the second time:
SYS@dbm2> alter system flush buffer_cache;
System altered.
SYS@dbm2> @mystats
Enter value for name: reads
NAME VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads 0
cold recycle reads 0
data blocks consistent reads - undo records applied 0
gc cluster flash cache reads failure 0
gc cluster flash cache reads received 0
gc cluster flash cache reads served 0
gc flash cache reads served 0
lob reads 0
physical reads 7
physical reads cache 7
physical reads cache for securefile flashback block new 0
physical reads cache prefetch 0
physical reads direct 0
physical reads direct (lob) 0
physical reads direct for securefile flashback block new 0
physical reads direct temporary tablespace 0
physical reads for flashback new 0
physical reads prefetch warmup 0
physical reads retry corrupt 0
recovery array reads 0
session logical reads 14
session logical reads - IM 0
session logical reads in local numa group 0
session logical reads in remote numa group 0
transaction tables consistent reads - undo records applied 0
25 rows selected.
SYS@dbm2> select avg(pk_col) from kso.skewimpq where col1 > 0;
AVG(PK_COL)
-----------
16228570.2
SYS@dbm2> @mystats
Enter value for name: reads
NAME VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads 0
cold recycle reads 0
data blocks consistent reads - undo records applied 0
gc cluster flash cache reads failure 0
gc cluster flash cache reads received 0
gc cluster flash cache reads served 0
gc flash cache reads served 0
lob reads 0
physical reads 450216
physical reads cache 450216
physical reads cache for securefile flashback block new 0
physical reads cache prefetch 446512
physical reads direct 0
physical reads direct (lob) 0
physical reads direct for securefile flashback block new 0
physical reads direct temporary tablespace 0
physical reads for flashback new 0
physical reads prefetch warmup 0
physical reads retry corrupt 0
recovery array reads 0
session logical reads 453226
session logical reads - IM 0
session logical reads in local numa group 0
session logical reads in remote numa group 0
transaction tables consistent reads - undo records applied 0
25 rows selected.
A close look at these statistics reveals all the reads that are done have gone into the cache (statistic: physical reads cache), while a “normal” full table scan on a large segment would have been done to the session’s PGA via direct reads (statistic: physical reads direct). This is also visible via the wait interface, multiblock reads into the cache are visible as “db file scattered read” events on non-Exadata platforms and “cell multiblock physical read” on Exadata, while reads into the PGA are visible as “direct path read” events, which on Exadata can be offloaded as Smart Scans. This is visible via the “cell smart table scan” or “cell smart index scan” events. This scan essentially filled the buffer cache. Let’s execute the same query again and see if we can use the data read into the buffer cache:
SYS@dbm2> select avg(pk_col) from kso.skewimpq where col1 > 0;
AVG(PK_COL)
-----------
16228570.2
SYS@dbm2> @mystats
Enter value for name: reads
NAME VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads 0
cold recycle reads 0
data blocks consistent reads - undo records applied 0
gc cluster flash cache reads failure 0
gc cluster flash cache reads received 0
gc cluster flash cache reads served 0
gc flash cache reads served 0
lob reads 0
physical reads 528923
physical reads cache 528923
physical reads cache for securefile flashback block new 0
physical reads cache prefetch 518528
physical reads direct 0
physical reads direct (lob) 0
physical reads direct for securefile flashback block new 0
physical reads direct temporary tablespace 0
physical reads for flashback new 0
physical reads prefetch warmup 0
physical reads retry corrupt 0
recovery array reads 0
session logical reads 906435
session logical reads - IM 0
session logical reads in local numa group 0
session logical reads in remote numa group 0
transaction tables consistent reads - undo records applied 0
25 rows selected.
The second execution of the scan of the SKEWIMPQ table doubled the session logical reads statistics (906435/2 roughly equals 453226 as seen as logical reads after the first run), but the physical reads statistic only increased with 78707 (528923-450216), indicating all the other logical reads where satisfied by the cache.
If DBMS_XPLAN.DISPLAY_CURSOR is used to display the execution plan, Oracle reveals if In-Memory Parallel Query has been used:
SYS@dbm2> @dplan
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
SQL_ID 31q77xaa06ggz, child number 2
-------------------------------------
select avg(pk_col) from kso.skewimpq where col1 > 0
Plan hash value: 3471853810
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 17043 (100)| | | | |
| 1 | SORT AGGREGATE | | 1 | 11 | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 11 | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 11 | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 95M| 1002M| 17043 (1)| 00:00:01 | Q1,00 | PCWC | |
|* 6 | TABLE ACCESS STORAGE FULL| SKEWIMPQ | 95M| 1002M| 17043 (1)| 00:00:01 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - storage(:Z>=:Z AND :Z<=:Z AND "COL1">0)
filter("COL1">0)
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
- automatic DOP: Computed Degree of Parallelism is 8 because of degree limit
- parallel scans affinitized for buffer cache
SYS@dbm2> @fsx2
Enter value for sql_text:
Enter value for sql_id: 31q77xaa06ggz
SQL_ID AVG_ETIME PX OFFLOAD SQL_TEXT
------------- ---------- ------ ------- --------------------------------------
31q77xaa06ggz 14.37 8 No select avg(pk_col) from kso.skewimpq
where col1 > 0
1 rows selected.
DBMS_XPLAN.DISPLAY_CURSOR shows the statement has been executed in parallel, as can be seen by the rowsources that start with “PX.” The most interesting thing here is in the Note section of the DISPLAY_CURSOR output: The line “parallel scans affinitized for buffer cache” clearly indicates the scan used the buffer cache, which means In-Memory Parallel Query has been used.
Indirect evidence of In-Memory Parallel Query can be found using the fsx2.sql script. When the SQL_ID was entered of our query, we can see the statement was executed with eight parallel query servers (the PX column), and no Smart Scans have been used (OFFLOAD column), which must have happened with normal (non In-Memory PX) execution of parallel query scans. Note also that the fsx2.sql script reports an estimated value for the AVG_ETIME, which is way more than the actual time the execution took. This occurs because v$sql reports elapsed time as the sum of the all the elapsed times of the slave processes. Dividing this number by the number of slaves gives an estimate, but that will not be totally accurate as slaves can vary greatly in their elapsed times. Let’s now compare our In-Memory Parallel Execution to how our system behaves when In-Memory Parallel Execution is not in play. We can disable this feature in a couple of ways. The documented way is to set the PARALLEL_DEGREE_POLICY parameter to MANUAL. However, this also disables Auto DOP and Parallel Statement Queueing. The other way is to set the hidden parameter _PARALLEL_CLUSTER_CACHE_POLICY to ADAPTIVE:
SYS@dbm2> alter system set "_parallel_cluster_cache_policy"=adaptive;
System altered.
SYS@dbm2> @mystats
Enter value for name: reads
NAME VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads 0
cold recycle reads 0
data blocks consistent reads - undo records applied 0
gc cluster flash cache reads failure 0
gc cluster flash cache reads received 0
gc cluster flash cache reads served 0
gc flash cache reads served 0
lob reads 0
physical reads 0
physical reads cache 0
physical reads cache for securefile flashback block new 0
physical reads cache prefetch 0
physical reads direct 0
physical reads direct (lob) 0
physical reads direct for securefile flashback block new 0
physical reads direct temporary tablespace 0
physical reads for flashback new 0
physical reads prefetch warmup 0
physical reads retry corrupt 0
recovery array reads 0
session logical reads 17
session logical reads - IM 0
session logical reads in local numa group 0
session logical reads in remote numa group 0
transaction tables consistent reads - undo records applied 0
25 rows selected.
SYS@dbm2> select avg(pk_col) from kso.skewimpq where col1 > 0;
AVG(PK_COL)
-----------
16228570.2
SYS@dbm2> @mystats
Enter value for name: reads
NAME VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads 0
cold recycle reads 0
data blocks consistent reads - undo records applied 0
gc cluster flash cache reads failure 0
gc cluster flash cache reads received 0
gc cluster flash cache reads served 0
gc flash cache reads served 0
lob reads 0
physical reads 450207
physical reads cache 0
physical reads cache for securefile flashback block new 0
physical reads cache prefetch 0
physical reads direct 450207
physical reads direct (lob) 0
physical reads direct for securefile flashback block new 0
physical reads direct temporary tablespace 0
physical reads for flashback new 0
physical reads prefetch warmup 0
physical reads retry corrupt 0
recovery array reads 0
session logical reads 450895
session logical reads - IM 0
session logical reads in local numa group 0
session logical reads in remote numa group 0
transaction tables consistent reads - undo records applied 0
25 rows selected.
SYS@dbm2> @fsx2
Enter value for sql_text: %avg(pk_col)%
Enter value for sql_id:
SQL_ID AVG_ETIME PX OFFLOAD SQL_TEXT
------------- ---------- -------- ----------------------------------------
31q77xaa06ggz 14.37 8 No select avg(pk_col) from kso.skewimpq
where col1 > 0
31q77xaa06ggz 5.92 8 Yes select avg(pk_col) from kso.skewimpq
where col1 > 0
2 rows selected.
Notice that with In-Memory Parallel Execution disabled, the statistics show that the number of physical reads and logical reads are roughly the same, just as the previous example with the first run to populate the buffer cache. However, instead of having the statistic physical reads cache increased to the same amount as the physical reads statistic, now the statistic physical reads direct increased up to the same amount as the physical reads statistic. Notice also that the fsx2.sql script shows that there is a new cursor in the shared pool that was executed with eight parallel slaves, offloaded to the storage tier. This is an important point. In-Memory Parallel Execution disables the optimizations that Exadata provides via Smart Scans. That should be obvious since the disk I/O was mostly eliminated by the In-Memory Parallel Execution feature, but it is the main reason that we believe this feature will not be as useful on the Exadata platform as on other platforms. The reason the offloaded query shows less elapsed time (AVG_ETIME) is that a huge part of the processing is offloaded to the storage tier, which means the parallel process’ processing is even further parallelized and only the results are sent to the database tier.
Troubleshooting Parallel Execution
In the days before Oracle version 11.2, when experiencing parallel execution issues, having a clear view of what was going on across the cluster and the corresponding workload distribution among the parallel servers was quite a challenge. Even more challenging was getting the answer to questions such as “Why is my SQL just partially parallelized?”
The introduction of the Real-Time SQL Monitoring fixed some of the instrumentation issues and allowed the DBAs and developers to analyze the parallel execution in greater detail by providing interactive display of resource consumption across the parallel servers and SQL statistics details at each step of the execution plan. In 11gR2 and 12c, it has gotten a lot better with additional columns about resource management and parallel allocation details in the V$SQL_MONITOR view. This section will not discuss in detail how to interpret a SQL Monitor report. Instead, it will focus on how to utilize the views which expose the information used by a SQL Monitor report, together with other views for quick troubleshooting and finding out when downgrades are happening and In-Memory Parallel Execution is kicking in.
Using the following SQL, you can gather extra information, like “px_in_memory” to indicate a query was affinitized for memory or, in normal speak, doing In-Memory Parallel Query:
SQL> !cat other_xml.sql
select t.*
from v$sql_plan v,
xmltable(
'/other_xml/info'
passing xmltype(v.other_xml)
columns
info_type varchar2(30) path '@type',
info_value varchar2(30) path '/info'
) t
where v.sql_id = '&sql_id'
and v.child_number = &child_number
and other_xml is not null;
SQL> alter session set parallel_degree_policy=manual;
System altered.
SQL> select avg(pk_col) from kso.skew2;
AVG(PK_COL)
-----------
62500.2406
SQL> @find_sql
Enter value for sql_text: %skew2%
Enter value for sql_id:
SQL_ID CHILD PLAN_HASH EXECS ETIME AVG_ETIME USERNAME SQL_TEXT
------------- ------ ---------- ----- ------- --------- -------- ---------------------
atb3q75xavzb6 0 4220890033 1 10.13 10.13 SYS select avg(pk_col)
from kso.skew2
SQL> @other_xml
Enter value for sql_id: atb3q75xavzb6
Enter value for child_number: 0
INFO_TYPE INFO_VALUE
------------------------------ ---------------
db_version 12.1.0.2
parse_schema "SYS"
plan_hash_full 1438813450
plan_hash 4220890033
plan_hash_2 1438813450
In this case, the extra information does not reveal a lot of extra information. In fact, there are no big surprises. However, let’s turn set PARALLEL_DEGREE_POLICY to AUTO and execute the query again:
SQL> alter session set parallel_degree_policy=auto;
System altered.
SQL> select avg(pk_col) from kso.skew2;
AVG(PK_COL)
-----------
62500.2406
SQL> @find_sql
Enter value for sql_text: %skew2%
Enter value for sql_id:
SQL_ID CHILD PLAN_HASH EXECS ETIME AVG_ETIME USERNAME SQL_TEXT
------------- ------ ---------- ----- ------- --------- -------- --------------------
atb3q75xavzb6 0 4220890033 1 10.13 10.13 SYS select avg(pk_col)
from kso.skew2
atb3q75xavzb6 2 2117817910 1 113.39 113.39 SYS select avg(pk_col)
from kso.skew2
SQL> @other_xml
Enter value for sql_id: atb3q75xavzb6
Enter value for child_number: 2
INFO_TYPE INFO_VALUE
------------------------------ -----------
derived_cpu_dop 15
derived_io_dop 3
dop 15
px_in_memory_imc no
px_in_memory yes
io_rate 200
derived_io_dop 3
cpu_rate 1000
derived_cpu_dop 16
cpu_rate 1000
derived_cpu_dop 16
cpu_rate 1000
derived_cpu_dop 16
db_version 12.1.0.2
parse_schema "SYS"
plan_hash_full 3329629242
plan_hash 2117817910
plan_hash_2 3329629242
This child cursor shows a lot more information related to parallel query. In fact, we can now see In-Memory Parallel Query was happening, because “px_in_memory” is “yes.” Also, we can see the degree of parallelism was 15; “dop” is “15.” When not using Auto DOP, other_xml will show additional information related to parallel execution:
INFO_TYPE INFO_VALUE
------------------------------ ---------------
derived_cpu_dop 0
derived_io_dop 0
dop_reason table property
dop 8
px_in_memory_imc no
px_in_memory no
Summary
Parallel execution of statements is important for maximizing throughput on the Exadata platform. Oracle database releases 11gR2 and 12c include several new features and enhancements that make the parallel execution a more controllable feature, which is especially important when using the platform with mixed workloads. The Auto DOP feature is designed to allow intelligent decisions about DOP to be made automatically based on individual statements. In-Memory Parallel Execution may not be as useful on Exadata platforms as it is on non-Exadata platforms because it disables the optimizations that come along with Smart Scans. Parallel Statement Queueing is a very useful feature as it allows a mixture of throughput-oriented work to co-exist with response-time–sensitive work. Integration with the Database Resource Manager (DBRM) further enhances this feature by providing a great deal of additional control and prioritization over the queueing mechanism.