Chapter 6: Exadata Parallel Operations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 6

Exadata Parallel Operations

Exadata doesn’t have a special way of executing parallel operations that is not available on other platforms running 11gR2 and 12c. However, parallel processing is a key component of Exadata because efficient handling of data warehouse workloads was a primary design goal for Exadata. In addition, because offloading/Smart Scan depends on direct path reads, which can be used by parallel query slaves, parallel operations take on a whole new importance. Traditionally, the efficient use of parallel query has required careful control of concurrency in order to maximize the use of available resources without overwhelming the system. Oracle’s previous attempts at throttling parallel operations to allow them to be used in multiuser environments have not been entirely successful. Version 11gR2 provided some new capabilities for controlling parallel operations. In particular, a queueing mechanism has been introduced that allows the number of concurrent parallel processes to be managed more effectively. This approach appears to be much better suited to allowing a high degree of parallelism without overwhelming the available resources than previous attempts. 11gR2 also introduced the ability for Oracle to automatically calculate a degree of parallelism on a statement-by-statement basis. Version 12c introduced a Parallel Statement Queueing enhancement and process memory usage control. Also introduced in version 12c is a Database Resource Manager (DBRM) directive which allows the critical queries to bypass the parallel statement queue, and a new database parameter that limits the total Program Global Area (PGA) size, for which the idea is to avoid excessive physical memory usage that leads to high rates of swapping.

In this chapter, we will focus on parallel query. Other forms of parallelism provided by the database, such as recovery parallelism, will not be covered here. We will briefly cover parallel query mechanics and demonstrate specifically how the queueing mechanism and automatic degree of parallelism work with Exadata. We will also briefly cover the 11gR2 In-Memory Parallel Execution feature and discuss how Exadata storage parallelizes I/O operations. We will not cover all the details of parallel query or parallel DML with the assumption that the reader is familiar with basic Oracle parallel concepts. Note also that the discussion and examples will refer to behavior in Oracle Database 11g Release 2 (11.2.0.4) and 12c Release 1 (12.1.0.2.0). In most cases, the comments apply equally to version 11.2.0.1 as well. Exceptions are explicitly called out.

Parameters

Before describing the various new features and how they apply to Exadata, you should review the parameters that affect how parallel queries are handled by the database. Table 6-1 shows the parameters along with a brief description of each one’s purpose.

Table 6-1. Database Parameters Controlling 11gR2 and 12c Parallel Features

Parameter	Default	Description
parallel_max_servers		The maximum number of parallel slave process that may be created on an instance. The default value is calculated by parallel_threads_per_cpu * cpu_count * concurrent_parallel_users * 5. The concurrent_parallel_users value is determined by: - If SGA_TARGET is set: 4. - If PGA_AGGREGATE_TARGET is set: 2. - Else 1.
parallel_servers_target		The upper limit on the number of parallel slaves that may be in use on an instance at any given time if parallel queueing is enabled. The default is calculated by parallel_threads_per_cpu * cpu_count * concurrent_parallel_users * 2. The concurrent_parallel_users value is determined by: - If SGA_TARGET is set: 4. - If PGA_AGGREGATE_TARGET is set: 2. - Else 1.
parallel_min_servers	0	The minimum number of parallel slave processes that should be kept running, regardless of usage. Usually set to eliminate the overhead of creating and destroying parallel processes.
parallel_threads_per_cpu	2	Used in various parallel calculations to represent the number of concurrent processes that a CPU can support. Please note that 2 is the Oracle default value, the Oracle Exadata recommendation is 1.
parallel_degree_policy	MANUAL	Controls several parallel features including automatic degree of parallelism (Auto DOP), Parallel Statement Queueing, and In-Memory Parallel Execution.
parallel_execution_message_size	16384	The size of parallel message buffers in bytes.
parallel_degree_level	100	New in 12c. The scaling factor for default DOP calculations. 100 represents 100%, so setting it to 50 will reduce the calculated DOP to half.
parallel_force:local	FALSE	Determines whether parallel query slaves will be forced to execute only on the node that initiated the query (TRUE), or whether they will be allowed to spread on to multiple nodes in a RAC cluster (FALSE).
parallel_instance:group		Used to restrict parallel slaves to certain instances in a RAC cluster. This is done by specifying a service name that is configured to a limited number of instances.
parallel_io_cap_enabled	FALSE	This parameter is deprecated, replaced by parallel_degree_limit when it has been set to IO. Used in conjunction with the DBMS_RESOURCE_MANAGER.CALIBRATE_IO function to limit default DOP calculations based on the I/O capabilities of the system.
parallel_degree_limit	CPU	With automatic degree of parallelism, specifies what degree of parallelism to use. When set to CPU, maximum is parallel_threads_per_cpu * cpu_count * instances. With IO, maximum is dependent on per process I/O bandwidth / total system throughput. When a number is set, it limits the maximal degree of parallelism to that number for a statement.
pga_aggregate_limit		New in 12c. This parameter limits the PGA memory usage by the instance. See Chapter 7, resource management.
parallel_adaptive_multi_user	TRUE	Old mechanism of automatic parallel query usage tuning. This mechanism works by reducing the requested DOP depending on system load at query startup time.
parallel_min_time_threshold	AUTO	The minimum estimated serial execution time that will trigger Auto DOP. The default is AUTO, which translates to 10 seconds. When the PARALLEL_DEGREE_POLICY parameter is set to AUTO, ADAPTIVE, or LIMITED, parallelism is considered after the set amount of seconds. If all tables referenced use the In-Memory Column Store, this parameter defaults to 1.
parallel_server	FALSE	This parameter has nothing to do with parallel queries. Set to TRUE or FALSE depending on whether the database is RAC enabled or not. This parameter was deprecated long ago and has been replaced by the CLUSTER_DATABASE parameter.
parallel_server_instances	1	This parameter has nothing to do with parallel queries, either. This parameter is set to the number of instances in a RAC cluster.
parallel_automatic_tuning	FALSE	Deprecated since 10g. This parameter enabled an automatic DOP calculation on objects for which a parallelism attribute is set.
parallel_min_percent	0	Old throttling mechanism. When Parallel Statement Queueing is not enabled (when PARALLEL_DEGREE_POLICY is set to MANUAL or LIMITED), this represents the minimum percentage of parallel servers that are needed for a parallel statement to execute.

The parameters shown in Table 6-2 control distinct features which are enabled when automatic degree of parallelism is enabled (often referred to as “auto DOP”). The parameters are underscore parameters, which means that they should not be used in production before consulting Oracle support and getting their blessing to use them.

Table 6-2. Selected Underscore Database Parameters Controlling Parallel Features

Parameter	Default	Description
_parallel_statement_queueing	FALSE	Related to automatic DOP. If set to TRUE, this enables Parallel Statement Queueing.
_parallel_cluster_cache_policy	ADAPTIVE	Related to automatic DOP. If set to CACHED, this enables In-Memory Parallel Execution.
_parallel_cluster_cache_pct	80	Percentage of the total buffer cache size in the cluster to be used for In-Memory Parallel Execution. By default, segments larger than 80% of the total buffer cache size are not considered a candidate for In-Memory Parallel Execution.

Parallelization at the Storage Tier

Exadata has a lot of processing power at the storage layer. The setup with the fixed eighth, quarter, half, or full rack configurations up to the X4 generation has more CPU resources available at the storage layer than at the computing layer. With the X5 generation, the storage server has gone from two six-core CPUs with X4 to two eight-core CPUs with X5, while the computing layer for the two socket servers has gone from two 12-core CPUs with X4 to two 18-core CPUs with X5, which means the number of cores is higher on the computing layer with the X5 generation two socket servers. Needless to say, the new “elastic configurations,” meaning any kind of configuration could be created, abandons the concept of a balance between computing and storage layer altogether and should only be considered as a last resort and with great care.

Since Smart Scans offload a lot of processing to the storage cells, every query involving Smart Scans is effectively parallelized across the CPUs on the storage cells. This type of parallelization is completely independent from the traditional database parallel processing. The Smart Scan parallelization occurs even when the activity is driven by a single process on a single database server. This introduces some interesting issues that should be considered with regard to normal parallelization at the database tier. Since one of the primary jobs of a parallelized query is to allow multiple processes to participate in the I/O operations and since the I/O operations are already spread across multiple processes, the degree of parallelism required by statements running on the Exadata platform should be smaller than on other platforms.

Auto DOP

One of the major changes to parallel operations in Oracle Database 11g Release 2 was the addition of a feature affectionately known as Auto DOP (automatic degree of parallelism). It was designed to overcome the problems associated with the fact that there is rarely a single DOP value that is appropriate for all queries touching a particular object. Prior to 11gR2, the DOP could be specified at the statement level via hints or at the object level via the DEGREE and INSTANCE settings. Realistically, using hints at the statement level makes more sense in most situations for the reason just mentioned. However, it requires that the developers understand the platform that the statements will be running on and the workload that the hardware will be supporting at the time of execution, as well the concurrency of other processes requiring resources. Getting the settings correct can be a tedious trial-and-error process and, unfortunately, the DOP cannot be changed while a statement is running. Once it starts, your only options are to let it complete or kill it, change the DOP settings, and try again. This makes fine-tuning in a “live” environment a painful process.

Operation and Configuration

When Auto DOP is enabled, Oracle evaluates each statement to determine whether it should be run in parallel and, if so, what DOP should be used. Basically, any statement that the optimizer concludes will take longer than 10 seconds to run serially will be a candidate to run in parallel. The 10-second threshold can be controlled by setting the PARALLEL_MIN_TIME_THRESHOLD parameter. This decision is made regardless of whether any of the objects involved in the statement have been decorated with a parallel degree setting or not.

Auto DOP is enabled by setting the PARALLEL_DEGREE_POLICY parameter to a value of AUTO, LIMITED, or ADAPTIVE (12c). The default setting for this parameter is MANUAL, which disables all three of the new 11gR2 parallel features (Auto DOP, Parallel Statement Queueing, In-Memory Parallel Execution). Unfortunately, PARALLEL_DEGREE_POLICY is one of those parameters that control more than one thing. The following list shows the effects of the various settings for this parameter:

MANUAL: If PARALLEL_DEGREE_POLICY is set to MANUAL, none of the new 11gR2 parallel features will be enabled. Parallel processing will work as it did in previous versions. That is to say, statements will only be parallelized if a hint is used or an object is decorated with a parallel setting.
LIMITED: If PARALLEL_DEGREE_POLICY is set to LIMITED, only Auto DOP is enabled while Parallel Statement Queueing and In-Memory Parallel Execution remain disabled. In addition, only statements accessing objects that have been decorated with the default parallel setting will be considered for Auto DOP calculation.
AUTO: If PARALLEL_DEGREE_POLICY is set to AUTO, all three of the new features are enabled. Statements will be evaluated for parallel execution regardless of any parallel decoration at the object level.
ADAPTIVE (12c): This new 12c parameter enables the same functionality as the previously discussed AUTO value. In addition to these, Oracle may re-evaluate the statement in order to provide a better degree of parallelism for subsequent executions based on feedback gathered during statement execution.

Although the only documented way to enable Parallel Statement Queueing and In-Memory Parallel Execution is via the all-or-nothing setting of AUTO or ADAPTIVE, the developers have thoughtfully provided hidden parameters that provide independent control of these features. Table 6-3 shows the parameters and how the settings of PARALLEL_DEGREE_POLICY alter the hidden parameters.

Table 6-3. Hidden Parameters Affected by PARALLEL_DEGREE_POLICY

Parallel_Degree_Policy	Parameter	Value
MANUAL	_parallel_statement_queuing	FALSE
	_parallel_cluster_cache_policy	ADAPTIVE
LIMITED	_parallel_statement_queuing	FALSE
	_parallel_cluster_cache_policy	ADAPTIVE
AUTO	_parallel_statement_queuing	TRUE
	_parallel_cluster_cache_policy	CACHED
ADAPTIVE (12c)	_parallel_statement_queuing	TRUE
	_parallel_cluster_cache_policy	CACHED

It’s pretty obvious what the _PARALLEL_STATEMENT_QUEUING parameter controls. When it is set to TRUE, queueing is enabled. The purpose of the _PARALLEL_CLUSTER_CACHE_POLICY parameter is a little less obvious. It turns out that it controls In-Memory Parallel Execution. Setting the value of the _PARALLEL_CLUSTER_CACHE_POLICY parameter to CACHED enables In-Memory Parallel Execution. You should note that In-Memory Parallel Execution is arguably of less value on the Exadata platform because the Smart Scan optimizations will not be available when using this feature to use parallel processes to scan and use data in the buffer cache. We will discuss that in more detail a little later. In the meantime, here is an example showing Auto DOP in action:


SQL> select owner, table_name, status, last_analyzed, num_rows, blocks, degree
  2  from dba_tables where owner = 'MARTIN' and table_name = 'BIGT';

OWNER  TABLE_NAME        STATUS  LAST_ANAL   NUM_ROWS     BLOCKS   DEGREE
------ --------------  --------  ---------  --------- ---------- --------
MARTIN BIGT               VALID  24-MAR-15  100000000   16683456        1

SQL> alter system set parallel_degree_policy=auto;

System altered.

SQL> select /* frits1 */ avg(id) from martin.bigt;

    AVG(ID)
-----------
  500000005

SQL> @find_sql
Enter value for sql_text: %frits1%
Enter value for sql_id:

SQL_ID        CHILD  PLAN_HASH EXECS    ETIME AVG_ETIME USERNAME SQL_TEXT
------------- ----- ---------- ----- -------- --------- -------- ---------------------------
djwmfmzgtjfqu     1 3043090422     1  1087.45   1087.45      SYS select /* frits1 */ avg(id)
                                                                 from martin.bigt

SQL> !cat dplan.sql
set verify off
set pages 9999
set lines 150
select * from table(dbms_xplan.display_cursor('&sql_id','&child_no',''))
/

SQL> @dplan
Enter value for sql_id: djwmfmzgtjfqu
Enter value for child_no:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID  djwmfmzgtjfqu, child number 1
-------------------------------------
select /* frits1 */ avg(id) from martin.bigt

Plan hash value: 3043090422

------------------------------------------------------------------------------------------------------
|Id|Operation                      | Name   |Rows|Bytes|Cost(%CPU)|Time     | TQ  |IN-OUT|PQ Distrib |
------------------------------------------------------------------------------------------------------
| 0|SELECT STATEMENT               |        |    |     | 104K(100)|         |     |      |           |
| 1| SORT AGGREGATE                |        |  1 |   7 |          |         |     |      |           |
| 2|  PX COORDINATOR               |        |    |     |          |         |     |      |           |
| 3|   PX SEND QC (RANDOM)         |:TQ10000|  1 |   7 |          |         |Q1,00| P->S |QC (RAND)  |
| 4|    SORT AGGREGATE             |        |  1 |   7 |          |         |Q1,00| PCWP |           |
| 5|     PX BLOCK ITERATOR         |        |100M| 667M| 104K  (1)|00:00:05 |Q1,00| PCWC |           |
|*6|      TABLE ACCESS STORAGE FULL| BIGT   |100M| 667M| 104K  (1)|00:00:05 |Q1,00| PCWP |           |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - storage(:Z>=:Z AND :Z<=:Z)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 48 because of degree limit

As you can see, enabling Auto DOP allowed the statement to be parallelized, even though the table was not decorated with a parallel setting (which means the degree property of the table was set to 1, not an higher value or DEFAULT). Also, notice that the plan output produced by DBMS_XPLAN shows that Auto DOP was enabled and the calculated DOP was 48, and that the value 48 was limited by “degree limit.” In fact, this is set by the parameter PARALLEL_DEGREE_LIMIT. In this case, the parameter PARALLEL_DEGREE_LIMIT is set to CPU (default), which means the limit is PARALLEL_THREADS_PER_CPU * CPU_COUNT * instances, which in this case of this example is 1 * 24 * 2, which is 48.

I/O Calibration

Oracle Database version 11.2.0.2 introduced a restriction to Auto DOP requiring that the I/O system be calibrated before statements will be automatically parallelized. This restriction is lifted with version 12c. The calibration is done by the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure, which generates a random read-only workload and spreads it across all instances in a RAC cluster. The procedure can put a significant load on the system. The documentation recommends running it when the system is idle or very lightly loaded. Here is an example of what happens if the calibration procedure has not been run and PARALLEL_DEGREE_POLICY has been set to limited, auto, or adaptive on 11.2.0.4:

SQL> @dplan
Enter value for sql_id: 05cq2hb1r37tr
Enter value for child_no:

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  05cq2hb1r37tr, child number 0
-------------------------------------
select avg(pk_col) from kso.skew a where col1 > 0

Plan hash value: 568322376

---------------------------------------------------------------------------
|  Id | Operation          | Name |   Rows| Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 44298 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    11 |            |          |
|*  2 |   TABLE ACCESS FULL| SKEW |    32M|   335M| 44298   (1)| 00:01:29 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1">0)

Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing

23 rows selected.

As you can see, when the I/O calibration has not been done, Auto DOP is disabled and the optimizer generates a serial execution plan. When automatic DOP is enabled on version 12.1.0.1 and above and I/O calibration has not been done, the optimizer will do the automatic DOP consideration! There are two views that provide additional information about the calibration. The V$IO_CALIBRATION_STATUS view shows whether the calibration has been done, and the DBA_RSRC_IO_CALIBRATE view shows the results of the procedure. Here is an example showing how to the calibration details, count the number of disks in the DATA disk group, and use the CALIBRATE_IO procedure:

SQL> select * from V$IO_CALIBRATION_STATUS;

STATUS        CALIBRATION_TIME
------------- -------------------------------------------------------------
NOT AVAILABLE

If you are unsure about the number of disks, you can query the number cell disks with the DATA prefix as seen by the database to calibrate. If your DATA disk group has a slightly different name, like the rack name added to DATA, add that between DATA and CD:

SQL> select count(*) from v$asm_disk where path like '%DATA_CD%';

  COUNT(*)
----------
        36

SQL> !cat calibrate_io.sql
SET SERVEROUTPUT ON
DECLARE
  lat  INTEGER;
  iops INTEGER;
  mbps INTEGER;
BEGIN
-- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops, mbps, lat);
   DBMS_RESOURCE_MANAGER.CALIBRATE_IO (&no_of_disks, 10, iops, mbps, lat);

  DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
  DBMS_OUTPUT.PUT_LINE ('latency  = ' || lat);
  dbms_output.put_line('max_mbps = ' || mbps);
end;
/

SQL> @calibrate_io
Enter value for no_of_disks: 36

The database procedure will run for some time—around 15 minutes on our system. The procedure will cause the database to spawn csnn processes on all instances, for which the amount of these processes is set by the the number of disks parameter. If the procedure is finished, it will show the result of the calibration. For example:

max_iops = 11237
latency  = 8
max_mbps = 5511

PL/SQL procedure successfully completed.

SQL> select * from V$IO_CALIBRATION_STATUS;

STATUS        CALIBRATION_TIME                                                        CON_ID
------------- ------------------------------------------------------------------- ----------
READY         24-MAY-15 07.17.45.086 AM                                                    0

SQL> select start_time, MAX_IOPS, MAX_MBPS, MAX_PMBPS, LATENCY, NUM_PHYSICAL_DISKS
     from DBA_RSRC_IO_CALIBRATE;

START_TIME                       MAX_IOPS   MAX_MBPS  MAX_PMBPS    LATENCY NUM_PHYSICAL_DISKS
------------------------------ ---------- ---------- ---------- ---------- ------------------
24-MAY-15 07.11.38.691358 AM        11237       5511        400          8                 36

The changes require a database restart to take effect. Once restarted, Auto DOP is using the calibration details. In certain cases, it might not be feasible to run the calibration, or a bug affecting calibration (11.2.0.2 before before bundle patch 4) to generate a too high number for Auto DOP, which means it will not consider it. In these cases, the important calibration details can be manually set. This is described in My Oracle Support (MOS) note 1269321.1. Here is how it is done:

SQL> delete from resource:io_calibrate$;
SQL> insert into resource:io_calibrate$ values(current_timestamp,
  2  current_timestamp, 0, 0, 200, 0, 0);
SQL> commit;
A database restart is required after setting the value. The value of 200 is recommended by Oracle Support for customers running on Exadata, and it is what Oracle uses to test Auto DOP on Exadata.

Auto DOP Wrap-Up

The end result of setting PARALLEL_DEGREE_POLICY to AUTO is that all kinds of statements will be run in parallel, even if no objects have been specifically decorated with a parallel degree setting. This is truly automatic parallel processing because the database decides what to run in parallel and with how many slaves. On top of that, by default, the slaves may be spread across multiple nodes in a RAC database. Unfortunately, this combination of features is a little like the Wild West with things running in parallel all over the place. However, the ability to queue parallel statements does provide some semblance of order, which leads us to the next topic.

We found Auto DOP being too optimistic about the calculated DOP in a lot of cases. A parameter worth pointing out is PARALLEL_DEGREE_LEVEL, which can be used to dial down the calculated DOP when setting the value of the parameter to a value lower than the default value 100. Other ways of limiting (Auto) DOP are PARALLEL_DEGREE_LIMIT and the database resource manager.

Parallel Statement Queueing

When Oracle first introduced the Parallel Query feature with Oracle version 7, Larry Ellison did a demo using a multiprocessor server, on which he was the only user. The individual CPU utilization was shown graphically, and all the CPUs where running full force when the Parallel Query feature was demoed. We wonder what would have happened if there were other database users in that database and what their experience would be during that demo. Probably their experience would not have been a good one. That is exactly what Parallel Statement Queueing tries to solve.

Oracle’s parallel capabilities have been a great gift, but they have also been a curse because controlling the beast in an environment where there are multiple users trying to share the resources is difficult at best. There have been attempts to come up with a reasonable way of throttling big parallel statements. But to date I do not think those attempts have been overly successful.

One of the most promising aspects of Exadata is its potential to run mixed workloads (OLTP and DW) without crippling one or the other. In order to do that, Oracle needs some mechanism to separate the workloads and, just as importantly, to throttle the resource intensive parallel queries. Parallel Statement Queueing appears to be just such a tool. And when combined with the Resource Manager, it provides a pretty robust mechanism for throttling the workload to a level that the hardware can support.

The Old Way

Before we get to the new Parallel Queueing functionality, we should probably review how it was done in previous versions. The best tool we had at our disposal was Parallel Adaptive Multiuser, which provided the ability to automatically downgrade the degree of parallelism for a given statement based on the workload when a query executes. It was actually a powerful mechanism and it was the best approach we had prior to 11gR2. This feature is enabled by setting the PARALLEL_ADAPTIVE_MULTI_USER parameter to TRUE. This is still the default in 12c, by the way, so this is definitely a parameter that you may want to consider changing. The downside of this approach is that parallelized statements can have a wildly varying degree of parallism and thus execution times. As you can imagine, a statement that gets 32 slaves one time and then gets downgraded to serial execution the next time will probably not make the users very happy.

The argument for this type of approach is that queries are going to run slower if the system is busy, regardless of what you do, and that users expect it to run slower when the system is busy. The first part of that statement may be true, but I do not believe the second part is (at least in most cases). The bigger problem with the downgrade mechanism, though, is that the decision about how many slaves to use is based on a single point in time—the point when the parallel statement starts. Recall that once the degree of parallelism (DOP) is set for an execution plan, it cannot be changed. The statement will run to completion with the number of slaves it was assigned, even if additional resources become available while it is running.

Consider the statement that takes one minute to execute with 32 slaves, and suppose that same statement gets downgraded to serial due to a momentarily high load. Now say that a few seconds after it starts, the system load drops back to more normal levels. Unfortunately, the serialized statement will continue to run for nearly 30 minutes with its single process, even though on average the system is not busier than usual. This sort of erratic performance can wreak havoc for those using the system and for those supporting it.

The New Way

Now let’s compare Parallel Adaptive Multi User (the old way) with the new mechanism introduced in 11gR2 that allows parallel statements to be queued. This mechanism separates long running parallel queries from the rest of the workload. The mechanics are pretty simple. Turn the feature on. Set a target number of parallel slaves using the PARALLEL_SERVERS_TARGET parameter. Run what should be a resource intensive query. If a statement that requires exceeding the target tries to start, it will be queued until the required number of slaves becomes available. There are, of course, many details to consider and other control mechanisms that can be applied to manage the process. Let’s look at how it behaves:

SQL> alter system set parallel_degree_policy=auto;

System altered.

SQL> alter system set parallel_servers_target=10;

System altered.

SQL> @parms
Enter value for parameter: parallel%
Enter value for isset:
Enter value for show_hidden:

NAME                                                  VALUE       ISDEFAUL
-------------------------------------------------- ---------     --------
parallel_adaptive_multi_user                           TRUE       FALSE
parallel_automatic_tuning                             FALSE       TRUE
parallel_degree_level                                   100       TRUE
parallel_degree_limit                                   CPU       TRUE
parallel_degree_policy                                 AUTO       FALSE
parallel_execution_message_size                       16384       FALSE
parallel_force:local                                  FALSE       TRUE
parallel_instance:group                                           TRUE
parallel_io_cap_enabled                               FALSE       TRUE
parallel_max_servers                                    240       FALSE
parallel_min_percent                                      0       TRUE
parallel_min_servers                                      0       FALSE
parallel_min_time_threshold                            AUTO       TRUE
parallel_server                                        TRUE       TRUE
parallel_server_instances                                 2       TRUE
parallel_servers_target                                  10       TRUE
parallel_threads_per_cpu                                  1       FALSE

In order to execute multiple SQLs that get to use automatic DOP and make them queue, we need a little script, which is shown below. The script executes 10 sqlplus processes, which are put into the background using “&.” The “wait” command at the end waits for all backgrounded processes to finish.

T=0
while [ $T -lt 10 ]; do
  echo "select avg(id) from t1;" | sqlplus -S ts/ts &
  let T=$T+1
done
wait
echo "Finished."

Now the script above is executed. If we look at the DBMS_XPLAN output via we see Auto DOP calculated 10 slaves for the execution because of PARALLEL_SERVERS_TARGET set to that value above:

SQL> @dplan b46903fft8uz4
Enter value for sql_id: b46903fft8uz4
Enter value for child_no:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID  b46903fft8uz4, child number 2
-------------------------------------
select avg(id) from t1

Plan hash value: 3110199320

------------------------------------------------------------------------------------------------------------------------
|  Id | Operation                      | Name     |   Rows| Bytes | Cost (%CPU)| Time     |     TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |       |       | 64140 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE                |          |     1 |     6 |            |          |        |      |            |
|   2 |   PX COORDINATOR               |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)         | :TQ10000 |     1 |     6 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE             |          |     1 |     6 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR         |          |   100M|   572M| 64140   (1)| 00:00:02 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS STORAGE FULL| T1       |   100M|   572M| 64140   (1)| 00:00:02 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - storage(:Z>=:Z AND :Z<=:Z)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 10

Directly after execution of the script to start up multiple sqlplus sessions to execute the parallellized table scan, we can see the statement queueing in effect. First, if we look at the number of parallel query processes active, we see that the number of parallel query processes active does not exceed the limit of 10 set:

SQL> select * from v$px_process_sysstat where statistic like '%In Use%';

STATISTIC                            VALUE     CON_ID
------------------------------  ---------- ----------
Servers In Use                          10          0

If fact, the reason the limit is not exceeded is because of the Parallel Statement Queueing, which can be seen in V$SQL_MONITOR and in the wait interface:

SQL> select sid, sql_id, sql_exec_id, sql_text from v$sql_monitor where status='QUEUED';

  SID SQL_ID            SQL_EXEC_ID SQL_TEXT
----- ------------- ----------- ------------------------------
  588 b46903fft8uz4    16777250 select avg(id) from t1
  396 b46903fft8uz4    16777254 select avg(id) from t1
  331 b46903fft8uz4    16777251 select avg(id) from t1
  265 b46903fft8uz4    16777253 select avg(id) from t1
  463 b46903fft8uz4    16777255 select avg(id) from t1
  654 b46903fft8uz4    16777256 select avg(id) from t1
  263 b46903fft8uz4    16777252 select avg(id) from t1
  200 b46903fft8uz4    16777249 select avg(id) from t1
  166 b46903fft8uz4    16777249 select avg(id) from t1

9 rows selected.

SQL> @snapper ash=event+wait_class 1 1 all
Sampling SID all with interval 1 seconds, taking 1 snapshots...

-- Session Snapper v4.11 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------
Active% | EVENT                               | WAIT_CLASS
---------------------------------------------------------------
   800% | resmgr:pq queued                    | Scheduler
   700% | cell smart table scan               | User I/O
   300% | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2015-05-25 07:17:51, seconds=1, samples_taken=5

PL/SQL procedure successfully completed.

There are several things worth mentioning in this listing. To set up the desired conditions, we turned on Auto DOP, which also enables Parallel Queueing, and then set the PARALLEL_SERVER_TARGET parameter to a very low number (10) in order to trigger queueing more easily. We then used a shell script to execute 10 statements that are subject to Auto DOP. When looking at V$PX_PROCESS_SYSSTAT, there are no more than 10 parallel query servers active, as was set with the PARALLEL_SERVERS_TARGET parameter. V$SQL_MONITOR showed that the statements were indeed queueing. This is an important point. All statements using parallel query will show up in the V$SQL_MONITOR view. Please mind this view requires the tuning pack license. If they have a status of QUEUED, they are not actually executing but are instead waiting until enough parallel slaves become available. We ran Tanel Poder’s Snapper script to see what event the queued statements were waiting on. As you can see, it was the resmgr: pq queued wait event.

Note There is one other thing you should be aware of regarding the wait events. There is a wait event change that relates to parallel queueing. This example was created using Oracle Database version 12.1.0.2. If you are using 11.2.0.1, you will see a different set of wait events (there are two). The first is PX Queuing: statement queue. This is the event that a statement waits on when it is next to run. The other is enq: JX - SQL statement queue. This event is what a statement waits on when there are other statements ahead of it in the queue. This scheme seems quite unwieldy, which is probably why it was changed in the later release.

Controlling Parallel Queueing

There are several mechanisms in place for controlling how the Parallel Statement Queueing feature behaves. The basic approach is to use a first-in, first-out queueing mechanism. But there are ways to prioritize work within the queueing framework. It is also possible to completely bypass the queueing mechanism via a hint. And, conversely, it is possible to enable queueing for a statement via a hint even when the Parallel Statement Queueing feature is not enabled at the database level. There are also a few parameters that affect the queueing behavior. And, finally, the Resource Manager has the capability to affect how statements are queued.

Controlling Queueing with Parameters

There are a handful of parameters that affect how Parallel Queueing behaves. The first two are PARALLEL_MAX_SERVERS, which sets the maximum number of parallel query slaves per instance, and the PARALLEL_SERVERS_TARGET parameter, which sets the amount of parallel query slaves to be in use after which statements that want to execute parallel need to be queued.

The default value of PARALLEL_MAX_SERVERS is calculated by:

      parallel_threads_per_cpu * cpu_count * concurrent_parallel_users * 5

The default value of PARALLEL_SERVERS_TARGET is calculated by:

      parallel_threads_per_cpu * cpu_count * concurrent_parallel_users * 2

Concurrent_parallel_users is determined by:

− If SGA_TARGET is set: 4.

− If PGA_AGGREGATE_TARGET is set: 2.

− Else 1.

This calculated value is almost certainly higher than you would want for most mixed workload systems, as it is geared at completely consuming the available CPU resources with parallel query processes. Allowing long-running parallel statements to consume the server completely means that response time-sensitive, OLTP-type statements could suffer. You should also note that it is possible to have more server processes active than the parameter allows. Since the number of slaves assigned to a query may be twice the DOP, the target can occasionally be exceeded.

In 12c, if the PARALLEL_STMT_CRITICAL resource management directive is used (discussed later in this chapter), it is recommended that the PARALLEL_SERVERS_TARGET be set to 50–75% of the PARALLEL_MAX_SERVERS so that critical queries that bypass the parallel queue can make use of the remaining parallel slave processes.

Note The PARALLEL_MAX_SERVERS parameter will be reduced to a value below the parameter PROCESSES if the calculated value of PARALLEL_MAX_SERVERS is greater than the value of PROCESSES parameter. You will see the occurrence of this behavior on the alert log at instance startup.

Mon May 06 18:43:06 2013
Adjusting the default value of parameter parallel_max_servers
from 160 to 135 due to the value of parameter processes (150)
Starting ORACLE instance (normal)

Another parameter that deserves some discussion is the hidden parameter, _PARALLEL_STATEMENT_QUEUING, which turns the feature on and off. As already discussed in the Auto DOP section, this parameter is set to TRUE when the PARALLEL_DEGREE_POLICY parameter is set to AUTO. However, the hidden parameter can also be set manually to turn Parallel Queueing off and on independently.

Auto DOP calculations are still a little scary, so it is nice that there is a way to turn on the Parallel Queueing feature without enabling Oracle to take complete control of which statements run in parallel. Of course, since this involves setting a hidden parameter, you should not do this in a production environment without approval from Oracle support. Nevertheless, here is another quick example showing that queueing can be turned on without enabling Auto DOP or In-Memory Parallel Execution:

SQL > alter system set parallel_degree_policy=manual sid='*';

System altered.

SQL > alter table ts.t1 parallel (degree 8);

Table altered.

SQL> @parms
Enter value for parameter: parallel
Enter value for isset:
Enter value for show_hidden:

NAME                                               VALUE
-------------------------------------------------- ------
parallel_adaptive_multi_user                       TRUE
parallel_automatic_tuning                          FALSE
parallel_degree_level                              100
parallel_degree_limit                              CPU
parallel_degree_policy                             MANUAL
parallel_execution_message_size                    16384
parallel_force:local                               TRUE
parallel_instance:group
parallel_io_cap_enabled                            FALSE
parallel_max_servers                               128
parallel_min_percent                               0
parallel_min_servers                               32
parallel_min_time_threshold                        AUTO
parallel_server                                    TRUE
parallel_server_instances                          2
parallel_servers_target                            10
parallel_threads_per_cpu                           1

T=0
while [ $T -lt 10 ]; do
  echo "select avg(id) from t1;" | sqlplus -S ts/ts &
  let T=$T+1
done
wait
echo "Finished."

SQL> select * from v$px_process_sysstat where statistic like '%In Use%';

STATISTIC                      VALUE     CON_ID
------------------------------ ----- ----------
Servers In Use                    80          0

First, we set PARALLEL_DEGREE_POLICY to manual. This disables automatic DOP and statement queueing and In-Memory Parallel Query. Next, we decorated the T1 table with a degree of eight. Then we looked at the values of the parameters that have PARALLEL in the name. For the sake of the example, we have set PARALLEL_FORCE_LOCAL to TRUE, so all the sqlplus sessions of the above script will use parallel query slaves in the current instance. Then we executed the script. As you can see, the 10 sessions all allocated their eight parallel query slaves because we see 80 servers in use.

Now set the undocumented parameter _PARALLEL_STATEMENT_QUEUING to TRUE and re-run the script to create 10 sessions:

SQL> alter system set "_parallel_statement_queuing"=true sid='*';

System altered.

T=0
while [ $T -lt 10 ]; do
  echo "select avg(id) from t1;" | sqlplus -S ts/ts &
  let T=$T+1
done
wait
echo "Finished."

SQL> select * from v$px_process_sysstat where statistic like '%In Use%';

STATISTIC                      VALUE     CON_ID
------------------------------ ----- ----------
Servers In Use                    12          0

This output shows statement queueing enabled without turning on Auto DOP. Please note that this involved a hidden parameter, which means that you should discuss setting this parameter with Oracle Support if you want to apply this technique in production.

Controlling Statement Queueing with Hints

There are two hints that can be used to control Parallel Statement Queueing at the statement level. One hint, NO_STATEMENT_QUEUING, allows the queueing process to be completely bypassed, even if the feature is turned on at the instance level. The other hint, STATEMENT_QUEUING, turns on the queueing mechanism, even if the feature is not enabled at the instance level. The STATEMENT_QUEUING hint provides a documented avenue for using the queuing feature without enabling Auto DOP.

Controlling Queueing with Resource Manager

Oracle’s Database Resource Manager (DBRM) provides additional capability to control Parallel Statement Queueing. While a thorough discussion of DBRM is beyond the scope of this chapter, we will cover some specific features related to parallel query. Chapter 7 covers DBRM in more detail.

Without DBRM, the parallel statement queue behaves strictly as a first-in, first-out (FIFO) queue. DBRM provides several directive attributes that can be used to provide additional control on a consumer group basis. Many of these controls were introduced in version 11.2.0.2. Table 6-4 contains a list of additional capabilities provided by DBRM.

Table 6-4. DBRM Parallel Statement Queueing Controls

Control	Description
Specify a Timeout	The PARALLEL_QUEUE_TIMEOUT directive attribute can be used to set a maximum queue time for a consumer group. The time limit is set in seconds and, once it has expired, the statement will terminate with an error (ORA-07454). Note that this directive did not become available until version 11.2.0.2 of the database.
Specify Maximum DOP	The PARALLEL_DEGREE_LIMIT_P1 directive attribute sets a maximum number of parallel slaves that may be assigned to an individual statement. This is equivalent to the PARALLEL_DEGREE_LIMIT database parameter but is used to set limits for different sets of users based on consumer groups.
Manage Order of Dequeueing	The MGMT_P1, MGMT_P2, ... MGMT_P8 directive attributes can be used to alter the normal FIFO processing. This attribute allows prioritization of dequeueing aside from being used as a resource percentage allocation for CPU & I/O (Intra-database IO Resource Management). Each of the eight attributes essentially provides a distinct dequeueing priority level. All statements with an MGMT_P1 attribute will be dequeued prior to any statement with MGMT_P2. In addition to the dequeueing priority, a probability number can be assigned to regulate dequeueing of statements within the same level.
Limit Percentage of Parallel Slaves	The PARALLEL_TARGET_PERCENTAGE directive attribute can be used to limit a consumer group to a percentage of the parallel slaves available to the system. So, if a system allowed 64 slaves to be active before starting to queue and PARALLEL_TARGET_PERCENTAGE was set to 50, the consumer group would only be able to consume 32 slaves. Note that this directive did not become available until version 11.2.0.2 of the database. In 12c the PARALLEL_SERVER_LIMIT replaced this directive.
Queue Multiple SQLs as a Set	The BEGIN_SQL_BLOCK and END_SQL_BLOCK procedures in DBMS_RESOURCE_MANAGER package work with Parallel Statement Queueing by treating individual statements as if they had been submitted at the same time. The idea is that the all the statements in the block are all dequeued, preventing individual statements from being kept queued. The mechanism requires surrounding independent SQL statements with calls to the BEGIN and END procedures. Note that this procedure did not become available until version 11.2.0.2 of the database.
Critical parallel statement prioritization	With version 12c, the PARALLEL_STMT_CRITICAL directive is introduced, which can make parallel statements bypass the statement queue when the directive is set to BYPASS_QUEUE for a consumer group. All queries issued by users belonging to a consumer group where the attribute is set to BYPASS_QUEUE will bypass the parallel statement queue and execute immediately. With the bypass mechanism, the total number of parallel slaves requested may be greater than PARALLEL_SERVERS_TARGET. Please be aware the critical queries are run regardless of actual parallel query processes currently in use, which means that if current use of parallel query slaves already reached PARALLEL_MAX_SERVERS, it is possible to encounter downgrades or, even worse, serial execution. This is why you should keep some headroom between PARALLEL_SERVERS_TARGET and PARALLEL_MAX_SERVERS.

The DBRM directives can be quite involved. Here is a modified example from the Oracle® Database VLDB and Partitioning Guide 12c Release 1 (12.1.0.2) that shows the directives making use of the Parallel Statement Queueing features:

BEGIN
  DBMS_RESOURCE_MANAGER.CLEAR_PENDING_AREA();
  DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();

--new plan
  DBMS_RESOURCE_MANAGER.CREATE_PLAN(
    'DAYTIME_PLAN',
    'Plan that priorizes short running queries'
  );
--consumer groups
  DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(
    'MEDIUM_TIME',
    'Medium: time running between 1 and 10 minutes'
  );
  DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(
    'LONG_TIME',
    'Long: time running more than 10 minutes'
  );
--directives
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'DAYTIME_PLAN',
    'SYS_GROUP',
    'Directive for SYS and high priority queries',
    MGMT_P1 => 100,
    PARALLEL_STMT_CRITICAL => 'BYPASS_QUEUE'
  );
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'DAYTIME_PLAN',
    'OTHER_GROUPS',
    'Directive for SQL running less than 1 minute',
    MGMT_P2 => 70,
    PARALLEL_DEGREE_LIMIT_P1 => 4,
    SWITCH_TIME => 60,
    SWITCH_ESTIMATE => TRUE,
    SWITCH_FOR_CALL => TRUE,
    SWITCH_GROUP => 'MEDIUM_TIME'
  );
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'DAYTIME_PLAN',
    'MEDIUM_TIME',
    'Directive for SQL running between 1 and 10 minutes',
    MGMT_P2 => 20,
    PARALLEL_SERVER_LIMIT => 75,
    SWITCH_TIME => 600,
    SWITCH_ESTIMATE => TRUE,
    SWITCH_FOR_CALL => TRUE,
    SWITCH_GROUP => 'LONG_TIME'
  );
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'DAYTIME_PLAN',
    'LONG_TIME',
    'Directive for SQL running more than 10 minutes',
    MGMT_P2 => 10,
    PARALLEL_SERVER_LIMIT => 50,
    PARALLEL_QUEUE_TIMEOUT => 3600
  );
  DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA();
  DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
END;
/

This example bears some explanation:

The resource plan DAYTIME_PLAN is created.
There are two consumer groups defined: MEDIUM_TIME and LONG_TIME. There are two other consumer groups (SYS_GROUP and OTHER_GROUPS), which are automatically created.
The first directive is for the SYS_GROUP consumer group. This group is assigned 100% priority as configured with MGMT_P1, which means it gets priority over the other groups. Also, this group bypasses the statement queue.
The second directive is for the OTHER_GROUPS consumer group, which is the default consumer group. This directive specifies that the optimizer should evaluate each SQL statement and, if the estimated execution time is longer than 60 seconds (SWITCH_TIME), the session should switched to the MEDIUM_TIME consumer group (SWITCH_GROUP=>’MEDIUM_TIME’). The session switches back to the OTHER_GROUPS consumer group once the SQL is executed (SWITCH_FOR_CALL=>TRUE). The dequeueing priority (MGMT_P2) is set to a probability of 70%, meaning that statements should be dequeued after any MGMT_P1 statement, with a probability of 70% when compared with other MGMT_P2 statements. The maximum DOP is set to 4 by the PARALLEL_DEGREE_LIMIT_P1 attribute.
The third directive is for the MEDIUM_TIME consumer group. This directive also includes a switch causing sessions to be moved to the LONG_TIME group if Oracle estimates the SQL statement will take longer than 10 minutes. Additionally, this directive sets the dequeueing priority to be 20% of the second priority group (MGMT_P2). This directive also puts a limit on the percentage of parallel slaves that may be used (PARALLEL_SERVER_LIMIT). In this case, 80% of the total slaves allowed on the system is the maximum that may be used by sessions in this consumer group.
The last directive is for the LONG_TIME consumer group. Sessions in this group have a very low dequeueing priority as compared to the others (MGMT_P2=10). It also limits the percentage of parallel slaves to 50%. Finally, since statements in this group may be queued for a long time, the PARALLEL_QUEUE_TIMEOUT attribute has been set to 14,400 seconds. Hence, if a statement is queued for four hours, it will fail with a timeout error.

There are a few additional settings that have to be done before this plan can truly work. A session needs to be granted a consumer group to be its initial consumer group if the consumer group is other than OTHER_GROUPS. This is not the case in our example, so we do not have to use the DBMS_RESOURCE_MANAGER.SET_INITIAL_CONSUMER_GROUP procedure. However, we want sessions to switch to the MEDIUM_TIME or LONG_TIME group when the rules as set in the directive apply. This means we must use the DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP procedure before this plan works. In this example, the KSO user is granted the switches. This needs to be done for all the database users that are subject to this plan:

BEGIN
 DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP('KSO','MEDIUM_TIME',FALSE);
 DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP('KSO','LONG_TIME',FALSE);
END;
/

Parallel Statement Queueing Wrap-Up

This section has been considerably more detailed than the coverage of the other two new parallel features in 11gR2. It also contains some of the enhancements that were added in version 12c. That is because this feature is a critical component in allowing Exadata to handle mixed workloads effectively. It enables us to effectively handle a mixture of statements that are sensitive to both throughput and response time. Without this feature, it would be very difficult to provide adequate resources without severely compromising one or the other.

In-Memory Parallel Execution

Prior to 11gR2, queries that were parallelized totally ignored the buffer cache. Oracle assumed that parallel queries would only be done on very large tables that would probably never have a large percentage of their blocks in the buffer cache. This assumption led to the conclusion that it would be faster to just read the data from disk. In addition, flooding the buffer cache with a large number of blocks from a full table scan was not desirable, so Oracle developed a mechanism called direct path reads, which bypassed the normal caching mechanism in favor of reading blocks directly in to the user’s Program Global Area (PGA).

The In-Memory Parallel Execution feature takes a different approach. It attempts to make use of the buffer cache for parallel queries. The feature is cluster-aware and is designed to spread the data across the cluster nodes (that is, the RAC database instances). The data blocks are also affinitized to a single node, reducing the number of communication and data transfers between the nodes. The goal, of course, is to speed up the parallel query by eliminating disk I/O. This can be a viable technique because many systems now have very large amounts of memory, which of course can provide a significant speed advantage over disk operations. There are some downsides to this approach though. The biggest disadvantage with regard to Exadata is that all the Smart Scan optimizations are disabled by this feature. In this respect, the new option introduced in 12c to split the buffer cache into a Big Table Scan Cache and the OLTP cache by means of an initialization parameter offers little consolation—a Smart Scan is most likely faster than an n-way scan across multiple buffer caches.

Note that we have not actually seen In-Memory Parallel Query in the wild on Exadata. This is probably good since many of the optimizations built into Exadata rely on offloading, which depends on direct path reads. Of course, direct path reads will not be done if blocks are being accessed in memory on the database servers. On most platforms, memory access would be much faster than direct path reads from disk. But with Exadata, eliminating the disk I/O also eliminates a large number of CPUs that could be applied to filtering and other operations. This means that accessing the buffer cache for parallel execution could actually be less efficient than allowing the storage servers to participate in the execution of the SQL statement.

Note The 12.1.0.2 database patchset introduced the In-Memory Option (In-Memory Column Store, Vector Processing, and In-Memory Aggregation). Version 12.1.0.1 introduced other In-Memory Caching (Automatic Big Table Caching and Full Database Caching) features, which further speeds up the analytical capabilities of Exadata while simultaneously running the OLTP. Although these are really great new features, covering them further is beyond the scope of this chapter.

A little demonstration is probably in order at this point. It is worth noting that getting this feature to kick in at all takes quite a bit of effort. Here are the basic steps we had to take in order to get it to work. First, we had to find a query that the optimizer estimated would run for longer than the number of seconds specified by the PARALLEL_MIN_TIME_THRESHOLD parameter (assuming the statement was not parallelized). The default for this parameter is AUTO, meaning 10 seconds. We set this parameter to one second to make it easier to trigger the In-Memory Parallel behavior. This query also had to be on a table that would almost completely fit in the aggregate buffer cache provided by the combination of all the participating RAC instances. To simplify things, we limited the processing to a single instance by setting the PARALLEL_FORCE_LOCAL parameter to TRUE. Of course, we had to set PARALLEL_DEGREE_POLICY to AUTO to enable the feature. We also set very low values for PARALLEL_SERVERS_TARGET and PARALLEL_DEGREE_LIMIT. Here are the parameter settings and pertinent information about the query we used to test the feature:

SYS@dbm2> @parms
Enter value for parameter: parallel
Enter value for isset:
Enter value for show_hidden:

NAME                                               VALUE    ISDEFAUL
-------------------------------------------------- -------- ----------
fast_start_parallel_rollback                       LOW      TRUE
parallel_adaptive_multi_user                       FALSE    FALSE
parallel_automatic_tuning                          FALSE    TRUE
parallel_degree_level                              100      TRUE
parallel_degree_limit                              8        FALSE
parallel_degree_policy                             AUTO     FALSE
parallel_execution_message_size                    16384    FALSE
parallel_force:local                               TRUE     FALSE
parallel_instance:group                                     TRUE
parallel_io_cap_enabled                            FALSE    TRUE
parallel_max_servers                               720      TRUE
parallel_min_percent                               0        TRUE
parallel_min_servers                               0        FALSE
parallel_min_time_threshold                        1        FALSE
parallel_server                                    TRUE     TRUE
parallel_server_instances                          4        TRUE
parallel_servers_target                            8        FALSE
parallel_threads_per_cpu                           1        FALSE
recovery_parallelism                               0        TRUE

19 rows selected.

SYS@dbm2> @pool_mem

AREA                                 MEGS
------------------------------ ----------
                                    384.0
free memory                       1,162.5
fixed_sga                             7.3
streams pool                           .0
log_buffer                          248.7
shared pool                       2,011.3
large pool                           26.2
buffer_cache                      4,352.0
                               ----------
sum                               8,192.0

8 rows selected.

SYS@dbm2> @table_size
Enter value for owner: KSO
Enter value for table_name: SKEWIMPQ
Enter value for type:

OWNER        SEGMENT_NAME                   TYPE           TOTALSIZE_MEGS TABLESPACE_NAME
------------ ------------------------------ -------------- -------------- ------------------
KSO          SKEWIMPQ                       TABLE                 3,577.0 USERS
                                                           --------------
sum                                                               3,577.0

SYS@dbm2> select owner, table_name, status, last_analyzed, num_rows, blocks, degree, cache
   2      from dba_tables where owner = 'KSO' and table_name = 'SKEWIMPQ';

OWNER      TABLE_NAME STATUS   LAST_ANALYZED         NUM_ROWS      BLOCKS DEGREE CACHE
---------- ---------- -------- -------------------   ---------- ---------- ------ ----
KSO        SKEWIMPQ   VALID    2015-02-14:06:52:31   89599778      451141      1     Y

So, the buffer cache on this instance is about 4.3G, and the table is about 3.5G. The query we used is simple and will not benefit from storage indexes, as virtually all the records satisfy the single WHERE clause:

SYS@dbm2> select count(*) from kso.skewimpq;

  COUNT(*)
----------
  89599778

1 row selected.

SYS@dbm2> select count(*) from kso.skewimpq where col1 > 0;

  COUNT(*)
----------
  89599776

1 row selected.

Now we will show some statistics prior to running the query, after running the query for the first time, and after running the query for the second time:

SYS@dbm2> alter system flush buffer_cache;

System altered.

SYS@dbm2> @mystats
Enter value for name: reads

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads                                                0
cold recycle reads                                                                   0
data blocks consistent reads - undo records applied                                  0
gc cluster flash cache reads failure                                                 0
gc cluster flash cache reads received                                                0
gc cluster flash cache reads served                                                  0
gc flash cache reads served                                                          0
lob reads                                                                            0
physical reads                                                                       7
physical reads cache                                                                 7
physical reads cache for securefile flashback block new                              0
physical reads cache prefetch                                                        0
physical reads direct                                                                0
physical reads direct (lob)                                                          0
physical reads direct for securefile flashback block new                             0
physical reads direct temporary tablespace                                           0
physical reads for flashback new                                                     0
physical reads prefetch warmup                                                       0
physical reads retry corrupt                                                         0
recovery array reads                                                                 0
session logical reads                                                               14
session logical reads - IM                                                           0
session logical reads in local numa group                                            0
session logical reads in remote numa group                                           0
transaction tables consistent reads - undo records applied                           0

25 rows selected.

SYS@dbm2> select avg(pk_col) from kso.skewimpq where col1 > 0;

AVG(PK_COL)
-----------
 16228570.2

SYS@dbm2> @mystats
Enter value for name: reads

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads                                                0
cold recycle reads                                                                   0
data blocks consistent reads - undo records applied                                  0
gc cluster flash cache reads failure                                                 0
gc cluster flash cache reads received                                                0
gc cluster flash cache reads served                                                  0
gc flash cache reads served                                                          0
lob reads                                                                            0
physical reads                                                                  450216
physical reads cache                                                            450216
physical reads cache for securefile flashback block new                              0
physical reads cache prefetch                                                   446512
physical reads direct                                                                0
physical reads direct (lob)                                                          0
physical reads direct for securefile flashback block new                             0
physical reads direct temporary tablespace                                           0
physical reads for flashback new                                                     0
physical reads prefetch warmup                                                       0
physical reads retry corrupt                                                         0
recovery array reads                                                                 0
session logical reads                                                           453226
session logical reads - IM                                                           0
session logical reads in local numa group                                            0
session logical reads in remote numa group                                           0
transaction tables consistent reads - undo records applied                           0

25 rows selected.

A close look at these statistics reveals all the reads that are done have gone into the cache (statistic: physical reads cache), while a “normal” full table scan on a large segment would have been done to the session’s PGA via direct reads (statistic: physical reads direct). This is also visible via the wait interface, multiblock reads into the cache are visible as “db file scattered read” events on non-Exadata platforms and “cell multiblock physical read” on Exadata, while reads into the PGA are visible as “direct path read” events, which on Exadata can be offloaded as Smart Scans. This is visible via the “cell smart table scan” or “cell smart index scan” events. This scan essentially filled the buffer cache. Let’s execute the same query again and see if we can use the data read into the buffer cache:

SYS@dbm2> select avg(pk_col) from kso.skewimpq where col1 > 0;

AVG(PK_COL)
-----------
 16228570.2

SYS@dbm2> @mystats
Enter value for name: reads

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads                                                0
cold recycle reads                                                                   0
data blocks consistent reads - undo records applied                                  0
gc cluster flash cache reads failure                                                 0
gc cluster flash cache reads received                                                0
gc cluster flash cache reads served                                                  0
gc flash cache reads served                                                          0
lob reads                                                                            0
physical reads                                                                  528923
physical reads cache                                                            528923
physical reads cache for securefile flashback block new                              0
physical reads cache prefetch                                                   518528
physical reads direct                                                                0
physical reads direct (lob)                                                          0
physical reads direct for securefile flashback block new                             0
physical reads direct temporary tablespace                                           0
physical reads for flashback new                                                     0
physical reads prefetch warmup                                                       0
physical reads retry corrupt                                                         0
recovery array reads                                                                 0
session logical reads                                                           906435
session logical reads - IM                                                           0
session logical reads in local numa group                                            0
session logical reads in remote numa group                                           0
transaction tables consistent reads - undo records applied                           0

25 rows selected.

The second execution of the scan of the SKEWIMPQ table doubled the session logical reads statistics (906435/2 roughly equals 453226 as seen as logical reads after the first run), but the physical reads statistic only increased with 78707 (528923-450216), indicating all the other logical reads where satisfied by the cache.

If DBMS_XPLAN.DISPLAY_CURSOR is used to display the execution plan, Oracle reveals if In-Memory Parallel Query has been used:

SYS@dbm2> @dplan

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
SQL_ID  31q77xaa06ggz, child number 2
-------------------------------------
select avg(pk_col) from kso.skewimpq where col1 > 0

Plan hash value: 3471853810

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |       |       | 17043 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE                |          |     1 |    11 |            |          |        |      |            |
|   2 |   PX COORDINATOR               |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)         | :TQ10000 |     1 |    11 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE             |          |     1 |    11 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR         |          |    95M|  1002M| 17043   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS STORAGE FULL| SKEWIMPQ |    95M|  1002M| 17043   (1)| 00:00:01 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - storage(:Z>=:Z AND :Z<=:Z AND "COL1">0)
       filter("COL1">0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - automatic DOP: Computed Degree of Parallelism is 8 because of degree limit
   - parallel scans affinitized for buffer cache

SYS@dbm2> @fsx2
Enter value for sql_text:
Enter value for sql_id: 31q77xaa06ggz

SQL_ID          AVG_ETIME       PX            OFFLOAD SQL_TEXT
-------------  ----------   ------ -------  --------------------------------------
31q77xaa06ggz       14.37     8 No            select avg(pk_col) from kso.skewimpq
                                                       where col1 > 0

1 rows selected.

DBMS_XPLAN.DISPLAY_CURSOR shows the statement has been executed in parallel, as can be seen by the rowsources that start with “PX.” The most interesting thing here is in the Note section of the DISPLAY_CURSOR output: The line “parallel scans affinitized for buffer cache” clearly indicates the scan used the buffer cache, which means In-Memory Parallel Query has been used.

Indirect evidence of In-Memory Parallel Query can be found using the fsx2.sql script. When the SQL_ID was entered of our query, we can see the statement was executed with eight parallel query servers (the PX column), and no Smart Scans have been used (OFFLOAD column), which must have happened with normal (non In-Memory PX) execution of parallel query scans. Note also that the fsx2.sql script reports an estimated value for the AVG_ETIME, which is way more than the actual time the execution took. This occurs because v$sql reports elapsed time as the sum of the all the elapsed times of the slave processes. Dividing this number by the number of slaves gives an estimate, but that will not be totally accurate as slaves can vary greatly in their elapsed times. Let’s now compare our In-Memory Parallel Execution to how our system behaves when In-Memory Parallel Execution is not in play. We can disable this feature in a couple of ways. The documented way is to set the PARALLEL_DEGREE_POLICY parameter to MANUAL. However, this also disables Auto DOP and Parallel Statement Queueing. The other way is to set the hidden parameter _PARALLEL_CLUSTER_CACHE_POLICY to ADAPTIVE:

SYS@dbm2> alter system set "_parallel_cluster_cache_policy"=adaptive;

System altered.

SYS@dbm2> @mystats
Enter value for name: reads

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads                                                0
cold recycle reads                                                                   0
data blocks consistent reads - undo records applied                                  0
gc cluster flash cache reads failure                                                 0
gc cluster flash cache reads received                                                0
gc cluster flash cache reads served                                                  0
gc flash cache reads served                                                          0
lob reads                                                                            0
physical reads                                                                       0
physical reads cache                                                                 0
physical reads cache for securefile flashback block new                              0
physical reads cache prefetch                                                        0
physical reads direct                                                                0
physical reads direct (lob)                                                          0
physical reads direct for securefile flashback block new                             0
physical reads direct temporary tablespace                                           0
physical reads for flashback new                                                     0
physical reads prefetch warmup                                                       0
physical reads retry corrupt                                                         0
recovery array reads                                                                 0
session logical reads                                                               17
session logical reads - IM                                                           0
session logical reads in local numa group                                            0
session logical reads in remote numa group                                           0
transaction tables consistent reads - undo records applied                           0

25 rows selected.

SYS@dbm2> select avg(pk_col) from kso.skewimpq where col1 > 0;

AVG(PK_COL)
-----------
 16228570.2

SYS@dbm2> @mystats
Enter value for name: reads

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
SecureFiles DBFS Link streaming reads                                                0
cold recycle reads                                                                   0
data blocks consistent reads - undo records applied                                  0
gc cluster flash cache reads failure                                                 0
gc cluster flash cache reads received                                                0
gc cluster flash cache reads served                                                  0
gc flash cache reads served                                                          0
lob reads                                                                            0
physical reads                                                                  450207
physical reads cache                                                                 0
physical reads cache for securefile flashback block new                              0
physical reads cache prefetch                                                        0
physical reads direct                                                           450207
physical reads direct (lob)                                                          0
physical reads direct for securefile flashback block new                             0
physical reads direct temporary tablespace                                           0
physical reads for flashback new                                                     0
physical reads prefetch warmup                                                       0
physical reads retry corrupt                                                         0
recovery array reads                                                                 0
session logical reads                                                           450895
session logical reads - IM                                                           0
session logical reads in local numa group                                            0
session logical reads in remote numa group                                           0
transaction tables consistent reads - undo records applied                           0

25 rows selected.

SYS@dbm2> @fsx2
Enter value for sql_text: %avg(pk_col)%
Enter value for sql_id:

SQL_ID               AVG_ETIME        PX   OFFLOAD SQL_TEXT
-------------       ----------   --------  ----------------------------------------
31q77xaa06ggz           14.37       8 No   select avg(pk_col) from kso.skewimpq
                                             where col1 > 0

31q77xaa06ggz            5.92       8 Yes  select avg(pk_col) from kso.skewimpq
                                             where col1 > 0
2 rows selected.

Notice that with In-Memory Parallel Execution disabled, the statistics show that the number of physical reads and logical reads are roughly the same, just as the previous example with the first run to populate the buffer cache. However, instead of having the statistic physical reads cache increased to the same amount as the physical reads statistic, now the statistic physical reads direct increased up to the same amount as the physical reads statistic. Notice also that the fsx2.sql script shows that there is a new cursor in the shared pool that was executed with eight parallel slaves, offloaded to the storage tier. This is an important point. In-Memory Parallel Execution disables the optimizations that Exadata provides via Smart Scans. That should be obvious since the disk I/O was mostly eliminated by the In-Memory Parallel Execution feature, but it is the main reason that we believe this feature will not be as useful on the Exadata platform as on other platforms. The reason the offloaded query shows less elapsed time (AVG_ETIME) is that a huge part of the processing is offloaded to the storage tier, which means the parallel process’ processing is even further parallelized and only the results are sent to the database tier.

Troubleshooting Parallel Execution

In the days before Oracle version 11.2, when experiencing parallel execution issues, having a clear view of what was going on across the cluster and the corresponding workload distribution among the parallel servers was quite a challenge. Even more challenging was getting the answer to questions such as “Why is my SQL just partially parallelized?”

The introduction of the Real-Time SQL Monitoring fixed some of the instrumentation issues and allowed the DBAs and developers to analyze the parallel execution in greater detail by providing interactive display of resource consumption across the parallel servers and SQL statistics details at each step of the execution plan. In 11gR2 and 12c, it has gotten a lot better with additional columns about resource management and parallel allocation details in the V$SQL_MONITOR view. This section will not discuss in detail how to interpret a SQL Monitor report. Instead, it will focus on how to utilize the views which expose the information used by a SQL Monitor report, together with other views for quick troubleshooting and finding out when downgrades are happening and In-Memory Parallel Execution is kicking in.

Using the following SQL, you can gather extra information, like “px_in_memory” to indicate a query was affinitized for memory or, in normal speak, doing In-Memory Parallel Query:

SQL> !cat other_xml.sql
select t.*
 from v$sql_plan v,
  xmltable(
    '/other_xml/info'
    passing xmltype(v.other_xml)
    columns
        info_type varchar2(30) path '@type',
        info_value varchar2(30) path '/info'
  ) t
where v.sql_id = '&sql_id'
  and v.child_number = &child_number
  and other_xml is not null;

SQL> alter session set parallel_degree_policy=manual;

System altered.

SQL> select avg(pk_col) from kso.skew2;

AVG(PK_COL)
-----------
 62500.2406

SQL> @find_sql
Enter value for sql_text: %skew2%
Enter value for sql_id:

SQL_ID         CHILD  PLAN_HASH  EXECS   ETIME AVG_ETIME USERNAME              SQL_TEXT
------------- ------  ---------- ----- ------- --------- -------- ---------------------
atb3q75xavzb6      0  4220890033     1   10.13     10.13      SYS    select avg(pk_col)
                                                                         from kso.skew2

SQL> @other_xml
Enter value for sql_id: atb3q75xavzb6
Enter value for child_number: 0

INFO_TYPE                       INFO_VALUE
------------------------------ ---------------
db_version                      12.1.0.2
parse_schema                    "SYS"
plan_hash_full                  1438813450
plan_hash                       4220890033
plan_hash_2                     1438813450

In this case, the extra information does not reveal a lot of extra information. In fact, there are no big surprises. However, let’s turn set PARALLEL_DEGREE_POLICY to AUTO and execute the query again:

SQL> alter session set parallel_degree_policy=auto;

System altered.

SQL> select avg(pk_col) from kso.skew2;

AVG(PK_COL)
-----------
 62500.2406

SQL> @find_sql
Enter value for sql_text: %skew2%
Enter value for sql_id:

SQL_ID         CHILD  PLAN_HASH  EXECS   ETIME AVG_ETIME USERNAME             SQL_TEXT
------------- ------  ---------- ----- ------- --------- -------- --------------------
atb3q75xavzb6      0  4220890033     1   10.13     10.13      SYS   select avg(pk_col)
                                                                        from kso.skew2
atb3q75xavzb6      2  2117817910     1  113.39    113.39      SYS   select avg(pk_col)
                                                                        from kso.skew2

SQL> @other_xml
Enter value for sql_id: atb3q75xavzb6
Enter value for child_number: 2

INFO_TYPE                       INFO_VALUE
------------------------------ -----------
derived_cpu_dop                         15
derived_io_dop                           3
dop                                     15
px_in_memory_imc                        no
px_in_memory                           yes
io_rate                                200
derived_io_dop                           3
cpu_rate                              1000
derived_cpu_dop                         16
cpu_rate                              1000
derived_cpu_dop                         16
cpu_rate                              1000
derived_cpu_dop                         16
db_version                        12.1.0.2
parse_schema                         "SYS"
plan_hash_full                  3329629242
plan_hash                       2117817910
plan_hash_2                     3329629242

This child cursor shows a lot more information related to parallel query. In fact, we can now see In-Memory Parallel Query was happening, because “px_in_memory” is “yes.” Also, we can see the degree of parallelism was 15; “dop” is “15.” When not using Auto DOP, other_xml will show additional information related to parallel execution:

INFO_TYPE                           INFO_VALUE
------------------------------ ---------------
derived_cpu_dop                              0
derived_io_dop                               0
dop_reason                      table property
dop                                          8
px_in_memory_imc                            no
px_in_memory                                no

Summary

Parallel execution of statements is important for maximizing throughput on the Exadata platform. Oracle database releases 11gR2 and 12c include several new features and enhancements that make the parallel execution a more controllable feature, which is especially important when using the platform with mixed workloads. The Auto DOP feature is designed to allow intelligent decisions about DOP to be made automatically based on individual statements. In-Memory Parallel Execution may not be as useful on Exadata platforms as it is on non-Exadata platforms because it disables the optimizations that come along with Smart Scans. Parallel Statement Queueing is a very useful feature as it allows a mixture of throughput-oriented work to co-exist with response-time–sensitive work. Integration with the Database Resource Manager (DBRM) further enhances this feature by providing a great deal of additional control and prioritization over the queueing mechanism.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 6: Exadata Parallel Operations

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 6: Exadata Parallel Operations