Specific to standby database(s), we may have performance issues to read redo data and to transport over a network, redo write phase because of bad RAID configurations, Redo Apply phase because of huge redo, improper memory settings, or the issues can be with bugs. Here we will discuss some of them.
Standby databases will be placed geographically in different locations with WAN for high availability in case of a disaster. Even though you keep your standby database geographically far away, you should have reasonable bandwidth to avoid data lag between the primary and standby databases. It can be a bigger problem if you are using synchronous redo with AFFIRM
. Consider the use of a high latency network to fulfill redo rate shipping as follows:
Required network bandwidth = ((Redo rate bytes per sec. / 0.7) * 8) / 1,000,000 = bandwidth in Mbps.
By using this formula according to the redo generation rate, you can estimate the required network bandwidth. You can get redo rate in bytes per second from DBA_HIST_SNAPSHOT
or from the AWR/Statspack reports.
Network throughput can be increased by setting Oracle net parameters RECV_BUF_SIZE
and SEND_BUF_SIZE
equal to three times of Bandwidth Delay Product
. To calculate Bandwidth Delay Product
, the bandwidth of the link and the Network Round Trip
time are required. RTT is measured by the complete two-way travel from the primary to standby database, including the standby and primary databases.
BDP = (Network speed * RTT) /8
By this calculation, the optimal send and receive buffer sizes can be estimated with the following formula:
Socket buffer size = 3 * (Bandwidth Speed) * (RTT)
Or you can also use the following:
Socket buffer size = 3 * (BDP)
If the value of the socket buffer size is 11718750 bytes, the socket buffer size can be set as the following in sqlnet.ora
or at the Operating System level:
[oracle@oracle-primary admin]$ cat sqlnet.ora|grep BUF_SIZE RECV_BUF_SIZE=11718750 SEND_BUF_SIZE=11718750 [oracle@oracle-primary admin]$
You can also configure to send and receive buffer sizes to the net service for connector descriptor in the client-side sqlnet.ora
file as follows:
INDIA = (DESCRIPTION = (RECV_BUF_SIZE=11718750) (SEND_BUF_SIZE=11718750) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.180.20)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = india_un) ) )
If you are replicating data remotely either using database links for materialized views or Data Guard, the data will be transferred over the network in terms of data sized units (SDU
); if a large amount of redo is being transmitted, you can increase the size of the SDU buffer to improve performance and network utilization. You can configure it in the sqlnet.ora
file as DEFAULT_SDU_SIZE
, which ranges from 512 bytes to 32767 bytes. The default SDU size of 2048 bytes is applicable for the client and dedicated server, where for the shared server the default SDU will be 32767 bytes.
On the standby databases, you can configure either in the sqlnet.ora
or listener.ora
file where we can specify buffer parameters for the address in description as follows:
SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (SDU = 32767) (GLOBAL_DBNAME = india_un) (SID_NAME = INDIA) (ORACLE_HOME = /u01/home/oracle/product/11.2.0/db_1) ) )
If you are using a redo transport type such as ARCH
, consider increasing the number of LOG_ARCHIVE_MAX_PROCESSES
parameters. The default value in 11gR2 is 4
and it can be controlled from 1
to 30
, if you set this parameter with a higher value. According to the archive processes and the system configuration, all the ARCn
processes work in parallel to resolve the archive gaps.
Choose the optimal value after several tests with an archive gap resolution as follows:
SQL> select * from V$PGASTAT where name='total PGA allocated'; NAME VALUE UNIT -------------------- ---------- ------------ total PGA allocated 249153536 bytes
The following output can be extracted using the view v$process
:
PROGRAM PGA_USED_MEM PGA_MAX_MEM ------------------------------ ------------ ----------- oracle@oracle-stby (ARC2) 11270688 12050576 oracle@oracle-stby (ARC1) 11297656 12050576 oracle@oracle-stby (ARC4) 28942512 30924944 oracle@oracle-stby (ARC3) 28942512 30924944 oracle@oracle-stby (ARC0) 28942512 30924944
So every archive process is consuming nearly 30 MB of memory; this calculation is completely based on the memory management you have used. Consider the parameter value LOG_ARCHIVE_MAX_PROCESSES
depending on the available resources.
If you are using synchronous redo transport with LGWR redo, consider decreasing the value of NET_TIMEOUT
to avoid outages on the production database's performance; this value can be defined from one to 1200 according to 11gR2 and the default value is 30 seconds. Oracle recommends setting the value of NET_TIMEOUT
to 10 seconds or less to avoid disconnection from the standby database.
Redo data is received from the primary to standby database and it will be applied by the background process MRP0
. Redo Apply is a block-to-block physical replication of the primary database. It uses media recovery to read records from standby redo logfiles into memory and applies directly to the standby database. If you start MRP
as alter database recover managed standby database disconnect from session
, only one MRP
process will be started to perform recovery. For huge OLTP databases, there are various possibilities for having a lot of redo to be applied on a standby database, with a single background process recovery being delayed. So we can initiate parallel recovery and it starts slave processes along with MRP
background processes as follows:
SQL> alter database recover managed standby database using current logfile disconnect from session parallel 5 SQL> !ps -ef|grep pr0 oracle 32243 1 0 19:33 ? 00:00:00 ora_pr00_INDIA oracle 32245 1 0 19:33 ? 00:00:00 ora_pr01_INDIA oracle 32247 1 0 19:33 ? 00:00:00 ora_pr02_INDIA oracle 32249 1 0 19:34 ? 00:00:00 ora_pr03_INDIA oracle 32251 1 0 19:34 ? 00:00:00 ora_pr04_INDIA oracle 32253 1 0 19:34 ? 00:00:00 ora_pr05_INDIA oracle 32292 31785 0 19:34 pts/2 00:00:00 /bin/bash -c ps -ef|grep pr0
Data Guard wait events are classified into primary-and standby-related wait events. According to the new releases, many of the wait events are introduced; some of them are as follows:
ARCH
transportThese wait events are specific to sending redo from the primary database to the standby database using ARCH
with synchronous or asynchronous redo transport. There is an ARCH
wait on ATTACH
, an ARCH
wait on SENDREQ
, and an ARCH
wait on DETACH
.
If you are using real-time apply with LGWR redo transport, the LNS
process will be working with the standby RFS server in redo transport and the wait events can be LNS
wait on ATTACH
, LNS
wait on SENDREQ
, LNS
wait on DETACH
, LGWR
wait on LNS
, LNS
wait on LGWR
, and LGWR-LNS
wait on a channel.
These wait events are applicable even in a normal system and are related to I/O. They are log file sync, log file parallel write, and DB file sequential read.
These wait events will occur in case time is spent on I/O on the standby. They are RFS write, RFS random I/O, and RFS sequential I/O.
Use the following query to get the details about all wait events:
SQL> select event,total_waits,time_waited,total_timeouts from v$system_event order by total_waits desc; EVENT TOTAL_WAITS TIME_WAITED ---------------------------------------- ----------- ----------- parallel recovery slave next change 260168 381493 control file sequential read 67687 2373 parallel recovery change buffer free 60053 82981 parallel recovery read buffer free 18975 24964 SQL*Net vector data from client 10534 79202