In many Data Guard installation cases, people may think that seeing the main functions of Data Guard running is enough for a successful deployment of Data Guard. In other words, if redo is being transferred from the primary database to a standby database(s), and is being applied on the standby, it's a smooth Data Guard configuration. However, if the configuration is prepared keeping in mind best practices, which is the topic of this chapter, it will be more robust, effective, and complete.
In this chapter, we will discuss the following topics:
Let's start with configuring a connection failover in a Data Guard environment.
Building a configuration in which database clients are able to automatically connect to a new primary database after a role change is vital in Data Guard. If we skip this important aspect, it may be very hard to configure connections of all database users to the new primary database.
Now, let's learn about the important connection failover terms – Transparent Application Failover (TAF), Fast Connection Failover (FCF), and Fast Application Notification (FAN).
TAF is a connection failover configuration of Oracle Call Interface (OCI) that is used for high-availability environments such as Oracle Data Guard, Oracle Real Application Clusters (RAC), and Oracle Fail Safe.
When using this configuration, clients can automatically establish a prespecified connection to the database after a failure of the database instance. In RAC, this means connecting to one of the surviving instances and in Data Guard it means connecting to the new primary database after failover.
We can configure TAF in two ways – client-side configuration and server-side configuration:
TAF will not only establish a new connection to the database, but also re-run a select
statement and reposition the cursor if preferred. We can configure TAF only for establishing a new connection, which is called session failover, or for the recovery of the session and query, which is called select failover. With the select failover mode, the number of rows fetched by the cursor is tracked and when the connection is lost, it's recovered by TAF by repositioning the cursor. So, the client doesn't restart but resumes fetching rows. This is especially good for long-running, time-critical reports and data extractions.
It's possible to monitor TAF properties of sessions using the V$SESSION
dynamic performance view. The following query result will show the service name, failover type, failover method, and failover failovers, if occurred, for the clients connected to the database:
SQL> SELECT USERNAME, SERVICE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER FROM V$SESSION;
The client-side TAF is configured using the TNS connection string. The following string is an example of a Data-Guard-enabled client-side TAF configuration where primary and standby databases are 11gR2 RAC with Single Client Access Names (SCAN) being used:
OLTP = (DESCRIPTION = (LOAD_BALANCE=OFF) (FAILOVER=ON) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = PRIMARY_SCAN)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = STANDBY_SCAN)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = OLTP) (SERVER = DEDICATED) (FAILOVER_MODE = (TYPE = session) (METHOD = BASIC) (RETRIES = 15) (DELAY = 10) ))))
In 11gR2, we configure the server-side TAF using Server Control Utility (SRVCTL). The srvctl add service
command adds a new service, and the srvctl modify service
command changes settings for a predefined service. The following is an example of creating a TAF-enabled database service on a RAC and Data Guard configuration. We must create this service both in the primary and standby hosts with the following commands:
srvctl add service -d ORCL_PRIM -s OLTP -r prim_node1,prim_node2 -l PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 15
srvctl add service -d ORCL_STD -s OLTP -r std_node1,std_node2 -l PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 15
The following table lists the definitions of the command options:
Option |
Description |
---|---|
|
This gives a unique name for the database. |
|
This gives the service name. |
|
For RAC databases, this gives the list of preferred instances on which the service runs. |
|
This gives the service role. Service is automatically started when the database is in this role. |
|
This indicates whether AQ HA notifications should be enabled for this service. |
|
This gives the failover type – session failover, select failover, or none. |
|
This gives the failover method. If the session failover or select failover type is selected, you should use |
|
This gives the number of failover retries. |
|
This gives the time delay between failover attempts. |
If we're going to use the physical standby database with active Data Guard for reporting, we should create a service for this purpose on both primary and standby hosts. For example:
srvctl add service -d ORCL_PRIM -s REPORTING -r prim_node1,prim_node2 -l PHYSICAL_STANDBY -q TRUE -e SESSION -m BASIC -w 10 -z 15
srvctl add service -d ORCL_STD -s REPORTING-r std_node1,std_node2 -l PHYSICAL_STANDBY -q TRUE -e SESSION -m BASIC -w 10 -z 15
In this case, the service will be created at the cluster level but the service definition will not be applied to the standby database because it's read-only. For this reason, we run the DBMS_SERVICE.CREATE_SERVICE
procedure for the REPORTING
service on the primary database, and the service definition will be replicated to the standby database with Redo Apply.
EXECUTE DBMS_SERVICE.CREATE_SERVICE( service_name => 'reporting' network_name => 'reporting' goal => 'NULL' dtp => 'NULL' aq_ha_notifications => 'TRUE' failover_method => 'BASIC' failover_type => 'SESSION' failover_retries => 15 failover_delay => 10 clb_goal => 'NULL'),
Using the previous server-side TAF configuration (the OLTP
and REPORTING
services), it's not necessary to configure TAF at the client side in the tnsnames.ora
file. The following TNS entry is an example that can be used to connect the OLTP
service:
OLTP= (DESCRIPTION_LIST= (LOAD_BALANCE=OFF) (FAILOVER=ON) (DESCRIPTION= (CONNECT_TIMEOUT=3)(TRANSPORT_CONNECT_TIMEOUT=2)(RETRY_COUNT=3) (ADDRESS_LIST= (LOAD_BALANCE=ON) (ADDRESS=(PROTOCOL=TCP)(HOST=PRIMARY_SCAN)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=OLTP))) (DESCRIPTION= (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3) (ADDRESS_LIST= (LOAD_BALANCE=ON) (ADDRESS=(PROTOCOL=TCP)(HOST= STANDBY_SCAN)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=OLTP))))
In this TNS entry, both the primary and standby SCAN
hostnames are involved. Just below DESCRIPTION_LIST
, we can see LOAD_BALANCE=OFF
. This means that the client will try to connect the DESCRIPTION
definitions in order. If it can't connect to the primary database, it'll try to connect to the standby database. However, below the DESCRIPTION
definitions, we see LOAD_BALANCE=ON
. This is about the connection to the RAC database and new connections are going to be assigned to three IP addresses of the SCAN
name randomly.
For each DESCRIPTION
definition, we can see some TNS string parameters set. CONNECT_TIMEOUT
specifies the total time to establish an Oracle net connection to a database. It includes the TRANSPORT_CONNECT_TIMEOUT
value, which is the time taken by a client to establish a TCP connection to the database server. It's possible to set the CONNECT_TIMEOUT
value globally for a database instance in the sqlnet.ora
file using the SQLNET.OUTBOUND_CONNECT_TIMEOUT
parameter. Also, we can set the TCP.CONNECT_TIMEOUT
parameter for a global TRANSPORT_CONNECT_TIMEOUT
value. The last parameter, RETRY_COUNT
, specifies the maximum number of connection attempts for the DESCRIPTION
definition.
In this configuration, the following algorithm will be applied when clients connect to a database using the OLTP
service:
PRIMARY_SCAN
hostname is resolved to three IP addresses.TRANSPORT_CONNECT_TIMEOUT
, which is two seconds, or the IP address responds but a connection can't be established in three seconds (CONNECT_TIMEOUT
), it'll try the next IP address. There will be a maximum of three retry attempts because of the RETRY_COUNT
setting.DESCRIPTION
definition, it'll try to connect the second DESCRIPTION
definition, which is the standby database.STANDBY_SCAN
hostname will be resolved to three IP addresses.Note that automatically controlling the startup of services by assigning a role to the service with SRVCTL is an 11gR2 feature. In earlier releases, we can create a trigger to ensure that the service is started only for the specified database role, such as the following example:
create trigger TAF_TRIGGER after startup on database declare db_role varchar(30); begin select database_role into db_role from v$database; if db_role = 'PRIMARY' then DBMS_SERVICE.START_SERVICE('OLTP'), else DBMS_SERVICE.STOP_SERVICE('OLTP'), end if; end; /