Implementing Data Guard
You were introduced to Data Guard terms and technology in depth in the previous chapter. Now it is time to put theory into practice. A Data Guard implementation always starts with the duplication of the primary database. This is easiest done using Recovery Manger, RMAN. The high-level steps are as follows:
If you have enough network bandwidth, you can clone the database directly over the network without having to take a backup first. This becomes a true possibility with the wider adoption of 10GB Ethernet networks, except maybe when long distances are involved.
This chapter will guide you through the process of the standby database creation. There is a slight difference between using Oracle Managed Files (OMFs) on a file system or Oracle Automatic Storage Management and using the traditional file system approach. Both approaches will be covered in this chapter.
Naming considerations
Data Guard naming conventions are often the source of considerable controversy. When thinking of using Data Guard with a database you should consider its use right from the start, especially when using OMFs. The OMF directory structure follows the following naming convention in ASM:
+diskGroupName/db_unique_name/filetype/fileTypeTag.file#.incarnation
The database unique name is of importance here. If you create your database as PROD with a standby database unique name of STDBY for example, then the ASM files will be stored in +data/prod/datafile/... and +data/stdby/datafile/. Although there is nothing wrong with that, it can cause confusion when the PROD database had to perform a role change and subsequently is in a standby database role. If your naming standards are flexible enough you should consider naming the databases you create by location rather than initial role. The confusion about a database role can be greatly reduced in this case.
One possible approach would be as follows:
Throughout the rest of the chapter you will find this principle applied. Figure 10-1 shows the configuration used from a high level.
Figure 10-1. The Data Guard configurations to be used in the following sections. The figure shows the initial configuration where the databases in data center A are in the primary role
There are three different types of databases used:
This naming scheme is only one out of a million possible combinations. Most often companies have naming schemes in use referring to continents, countries, cities, and so on. Regardless of how the naming scheme is implemented, it is important to make it consistent.
An observer process for lights-out management is started in data center c to observe the CDBFS databases.
Implementing Data Guard on a file system
The implementation of Data Guard using a traditional file system requires a little more thought, since the database administrator cannot rely on the OMF parameters db_create_file_dest and db_create_online_log_dest_n to perform the file name conversion if needed on the fly. As you read in Chapter 7, file name conversion with PDBs has to be taken care of. The same applies to a Data Guard environment, but on the CDB level. If you decided to use a different directory structure for your standby database (mount points for example), ensure that the relevant file name conversion parameters are set. Although you can specify the log file definition and a file name conversion pattern during the execution of the RMAN duplicate database command as well, it is often easier to have the conversion parameters in the initialization file of the standby database.
You begin the duplication process by copying the password file and a parameter file to the standby host. The steps will be shown in more detail in the next section. The files should be copied into the dbs directory of the RDBMS home. You should also add the new ORACLE_SID to the oratab file to make it easier to switch environments.
[oracle@server2 ∼]$ echo "CDBFSDCB:/u01/app/oracle/product/12.1.0.1/dbhome_1:N:" >> /etc/oratab
The initialization file needs to be amended next. The stream editor sed is a great way to create an almost workable initialization file:
[oracle@server2 dbs]$ sed -e "s/${PRIMARY_SID}/${ORACLE_SID}/g" $PRIMARY_INIT_FILE_LOC
> ${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora
Edit the file with your favorite editor and ensure the following parameters are set correctly:
The diagnostic destination introduced with Oracle 11g is a great help in simplifying the creation of auxiliary directories. The only remaining directory required for Oracle is the audit_file_dest. This can be created in a small script:
[oracle@server2 dbs]$ ADUMP=`grep audit_file_dest init${ORACLE_SID}.ora
> | cut -d= -f2 | tr -d "'"` ; mkdir -p $ADUMP
This example assumes that the directory structure is the same on the standby host with the exception of the top-level directory containing the database files. This directory is CDBFSDCB instead of CDBFSDCA. The directory structure to be put into place on the standby host to support the duplicate database command can easily be scripted on the primary database:
SQL> select distinct 'mkdir -p ' ||
2 replace(substr(name,1,instr(name,'/',-1)),'DCA','DCB') cmd
3 from v$datafile;
CMD
--------------------------------------------------------------------------------
mkdir -p /u01/oradata/CDBFSDCB/PDB1/
mkdir -p /u01/oradata/CDBFSDCB/pdbseed/
mkdir -p /u01/oradata/CDBFSDCB/
You can “union all” additional locations such as log file members, temp files, and so on should they not all be in the CDBFSDCA mount point. These commands are then to be executed on the standby host. If you are multiplexing the control files in the Fast Recovery Area (FRA) then the full path needs to be created too. The resulting initialization file for the standby database with the unique name “CDBFSDCB” is shown here:
*.audit_file_dest='/u01/app/oracle/admin/CDBFSDCB/adump'
*.audit_trail='db'
*.compatible='12.1.0.0.0'
*.control_files='/u01/oradata/CDBFSDCB/control01.ctl',
'/u01/fast_recovery_area/CDBFSDCB/control02.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='CDBFS'
*.db_unique_name='CDBFSDCB'
*.db_recovery_file_dest='/u01/fast_recovery_area'
*.db_recovery_file_dest_size=4800m
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=CDBFSDCBXDB)'
*.enable_pluggable_database=true
*.log_archive_format='%t_%s_%r.dbf'
*.open_cursors=300
*.pga_aggregate_target=512m
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=1024m
*.undo_tablespace='UNDOTBS1'
*.db_file_name_convert='/u01/oradata/CDBFSDCA','/u01/oradata/CDBFSDCB'
*.log_file_name_convert='/u01/oradata/CDBFSDCA','/u01/oradata/CDBFSDCB'
You should now add the service names to the tnsnames.ora file. The entries used for this example are shown here:
CDBFSDCA =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST =server1.example.com)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = CDBFSDCA)
)
)
CDBFSDCB =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST =server2.example.com)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = CDBFSDCB)
)
)
The following RMAN example will duplicate the database from the active database, in other words without an intermediate backup. For this to work the databases must be statically registered with the listeners on each host and for each database. A sample configuration necessary for the duplication ... from active database for the listener on server1 is shown here:
$ cat $ORACLE_HOME/network/admin/listener.ora
[...]
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = CDBFSDCA)
(ORACLE_HOME = /u01/app/oracle/product/12.1.0.1/dbhome_1)
(SID_NAME = CDBFSDCA)
)
)
[...]
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST =server1.example.com)(PORT = 1521))
)
)
[...]
Note that the listener.ora configuration file needs to be modified once more when implementing the Data Guard broker configuration. Make a corresponding similar change on server2’s listener.ora file and reload both listeners before continuing. The duplicate ... from active database command will restart the auxiliary instance a number of times; without static listener registration this will certainly fail.
With all the supporting directories and other infrastructure in place you can now start the RMAN duplicate command. Since the example assumes an identical directory structure the following RMAN commands can be used:
[oracle@server1 ∼]$ . oraenv
ORACLE_SID = [CDBFSDCA] ?
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@server1 ∼]$ rman target sys@cdbfsdca auxiliary sys@cdbfsdcb
Recovery Manager: Release 12.1.0.1.0 - Production on Sun Sep 15 19:48:19 2013
Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.
target database Password:
connected to target database: CDBFSDCA (DBID=2052895639)
auxiliary database Password:
connected to auxiliary database (not started)
RMAN> startup clone nomount
[...]
RMAN> run {
allocate channel c1 device type disk;
allocate channel c2 device type disk;
allocate auxiliary channel a1 device type disk;
allocate auxiliary channel a2 device type disk;
duplicate target database for standby from active database;
}
RMAN> 2> 3> 4> 5> 6> 7>
allocated channel: c1
channel c1: SID=133 device type=DISK
[...]
input datafile copy RECID=10 STAMP=826229614 file name=/u01/oradata/CDBFSDCB/PDB1/sysaux01.dbf
datafile 10 switched to datafile copy
input datafile copy RECID=11 STAMP=826229615 file name=/u01/oradata/CDBFSDCB/PDB1/PDB1_users01.dbf
Finished Duplicate Db at 15.09.2013 20:13:37
released channel: c1
released channel: c2
released channel: a1
released channel: a2
RMAN>
Recovery Manager complete.
Don’t forget to add standby redo logs on all databases in the configuration! You can query their location and size from the v$log and v$logfile views on the primary database. Please ensure that you create standby redo logs on all the databases in the Data Guard configuration. In the example, the following standby redo logs were created. The primary database used three online redo log groups, each with a size matching the online redo logs (they would be larger in a production environment).
SQL> alter database add standby logfile
2 '/u01/oradata/CDBFSDCA/srl1.log' size 52428800;
Database altered.
Repeat the above command and create four groups of standby redo logs. Remember that you need n+1 SRL groups which is one more than online redo logs. If you are multiplexing your online redo logs, which is less and less common these days, then you only need one member in the standby redo log group. More information about standby redo logs are in the v$standby_log view.
You should double-check if the primary database has force logging enabled and enable it if needed. The query to use for this verification is “select force_logging from v$database.”
You can read in the next section how to create a standby database when using ASM. Following that section the two streams will come together again and you will read more about creating a Data Guard broker configuration.
Configuring Data Guard with Oracle Managed Files
One of the easier configurations to implement with Data Guard is the use of Oracle Managed Files. Oracle Managed Files or OMF can be implemented as an option when using a file system, or the use is implicit when using Oracle ASM. With the number of Oracle hardware products and appliances using ASM underneath the covers increasing, there seems to be a benefit in using ASM throughout.
The creation of the standby database either begins with the backup of the primary database which needs to be made available on the standby system in exactly the same location as where it was taken on the primary. If you are using a tape library then this is not a problem, file system based backups however have to use the same directory structure. The alternative backup-less approach of using the “from active database clause” has already been demonstrated above and will again be used in this scenario.
Again, you should add the database name to the oratab file.
[oracle@server2 CDB1]$ echo "CDB1DCB:/u01/app/oracle/product/12.1.0.1/dbhome_1:N "
> >> /etc/oratab
Next copy the password file and the parameter file to the standby host, and rename them to match the instance name. The standby database’s parameter file does not need a lot of change. A careful search-and-replace operation could again be your starting point, but remember that the db_name parameter must be identical across all standby databases or you will receive errors. The most common changes for databases include:
You also should review the OMF related parameters if your ASM diskgroups or mount points are named differently from production. As this is a standard deployment no such changes needed to be made. The data files are standardized to go into the +DATA disk group. The database has its Fast Recovery Area set to +RECO. Here is the final parameter file for use with the standby database using ASM:
*.audit_file_dest='/u01/app/oracle/admin/CDB1/adump'
*.audit_trail='db'
*.compatible='12.1.0.0.0'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_name='CDB1'
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=4800m
*.db_unique_name='CDB1DCB'
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=CDB1DCBXDB)'
*.enable_pluggable_database=true
*.open_cursors=300
*.pga_aggregate_target=384m
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=1152m
*.undo_tablespace='UNDOTBS1'
As it was the case earlier, you need to ensure that the directories referenced in the parameter file exist on disk. You should also update the tnsnames.ora file to include the new databases, please refer to the above section for an example.
With the infrastructure in place it is time to start the auxiliary and initiate the duplication. Luckily with Oracle Managed Files you do not need to worry about file name conversions! Consider this example where the primary database CDB1DCA is duplicated to the standby CDB1DCB:
[oracle@server1 dbs]$ rman target sys/xxx@cdb1dca auxiliary sys/xxx@cdb1dcb
Recovery Manager: Release 12.1.0.1.0 - Production on Sun Sep 15 21:43:19 2013
Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.
connected to target database: CDB1 (DBID=799950173)
connected to auxiliary database: CDB1 (not mounted)
RMAN> duplicate target database for standby from active database;
Starting Duplicate Db at 15-SEP-13
using target database control file instead of recovery catalog
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=123 device type=DISK
allocated channel: ORA_AUX_DISK_2
channel ORA_AUX_DISK_2: SID=242 device type=DISK
allocated channel: ORA_AUX_DISK_3
channel ORA_AUX_DISK_3: SID=362 device type=DISK
contents of Memory Script:
{
[...]
Finished Duplicate Db at 15-SEP-13
Please double-check if the database has standby redo logs, if not you should create them now. Since you do not need to specify absolute path names you can use a small PL/SQL anonymous block to create the files. Assuming again you have three online redo log groups you can create the four needed standby redo logs as shown:
SQL> begin
2 for i in 1..4 loop
3 execute immediate 'alter database add standby logfile sizesize';
4 end loop;
5 end;
6 /
PL/SQL procedure successfully completed.
In case of Oracle Restart you should register your new standby database now with the framework. The srvctl command allows you to do this in a very elegant way. In the below example database CDB1 is registered with the unique name CDB1DCB as a physical standby database. The diskgroup option allows Oracle to mount the DATA and RECO disk groups as part of the database start procedure. The keyword automatic as an argument to the policy instructs Oracle Restart to start the database as part of the server boot process. The mount option ensures that the standby database is mounted, not opened read-only.
[oracle@server2 ∼] srvctl add database -db CDB1DCB -oraclehome $ORACLE_HOME -spfile
> /u01/app/oracle/product/12.1.0.1/dbhome_1/dbs/spfileCDB1DCB.ora
> -role PHYSICAL_STANDBY -startoption mount -dbname CDB1 -diskgroup RECO,DATA
After the command has been executed, you can view the configuration as it is stored in Oracle Restart:
[oracle@server1 ∼]$ srvctl config database -db $ORACLE_SID
Database unique name: CDB1DCB
Database name: CDB1
Oracle home: /u01/app/oracle/product/12.1.0.1/dbhome_1
Oracle user: oracle
Spfile: /u01/app/oracle/product/12.1.0.1/dbhome_1/dbs/spfileCDB1DCB.ora
Password file:
Domain:
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Database instance: CDB1DCB
Disk Groups: RECO,DATA
Services:
In the next step you can start the database:
[oracle@server2 ∼]$ srvctl start database -db $ORACLE_SID
[oracle@server2 ∼]$ srvctl status database -db $ORACLE_SID
Database is running.
Congratulations, you have successfully created and registered a standby database in ASM!
CONSIDERATIONS FOR AUTOMATION
The above processes involve quite a few manual steps. Automating those could prove to be a difficult task. The situation looks a little brighter if you have Enterprise Manager, which allows you to create a standby database from an existing primary. The process is controlled via jobs in the Enterprise Manager repository.
To enable the creation of a standby database you need to use the graphical user interface, at the time of this writing there did not seem to be an option for emcli to automatically create a standby database.
Creating a Data Guard Broker configuration
After the standby databases have physically been created, they need to be aware of their primary database. While the following steps can all be performed on the SQL*Plus command line, this requires additional skill and knowledge. The Data Guard Broker puts an abstraction layer between the SQL commands necessary to modify the initialization parameters and the sequence of commands to be executed during role transitions. A switchover for example can be simplified to:
DGMGRL> switchover tostandbydatabase;
There is an undeniable elegance in this approach. First- or second-line support should be able to execute the above statement after appropriate signoff. Should there be a problem during the switchover the operator can escalate to the level-two support to investigate.
If you intend to script the Broker configuration—maybe as part of the standard deploy mechanism—you could use the secure external password store to store credentials. This way the sensitive passwords do not need to be in the script. The following examples use the secure password store extensively. The wallets were created on each host in the Data Guard configuration. All wallets are created in /home/oracle/tns_admin. To ensure that the wallet works as expected, create a symbolic link for tnsnames.ora from $ORACLE_HOME/network/admin to ∼/tns_admin and export the environment variable TNS_ADMIN to point to ∼/tns_admin. Wallets can cause a security problem which is why you should create the wallet as explained in My Oracle Support note 1114500.1 using the -auto_login_local flag and orapki:
[oracle@server1 tns_admin]$ orapki wallet create -wallet . -auto_login_local
Oracle PKI Tool : Version 12.1.0.1
Copyright (c) 2004, 2012, Oracle and/or its affiliates. All rights reserved.
Enter password:
Enter password again:
Please note that the password must pass a simple validation. If the password does not satisfy these criteria an error is raised. Passwords must “have a minimum length of eight characters and contain alphabetic characters combined with numbers or special characters” as orapki will tell you.
[oracle@server1 tns_admin]$ mkstore -wrl . -createCredentialtnsNamesyssysPWD
Oracle Secret Store Tool : Version 12.1.0.1
Copyright (c) 2004, 2012, Oracle and/or its affiliates. All rights reserved.
Enter wallet password:
Create credential oracle.security.client.connect_string1
Repeat the call to mkstore for all databases in the configuration. In the next step you need to make Oracle aware of the wallet location. Edit the ∼/tns_admin/sqlnet.ora file and add the following, changing the path to your wallet location.
SQLNET.WALLET_OVERRIDE = TRUE
WALLET_LOCATION =
(SOURCE=
(METHOD = FILE) (METHOD_DATA =
(DIRECTORY=/home/oracle/tns_admin)
)
)
You can override the location where Oracle expects the net*8 configuration by setting the environment variable TNS_ADMIN, as in the following:
[oracle@server1 ∼]$ export TNS_ADMIN=/home/oracle/tns_admin
[oracle@server1 ∼]$ tnsping $ORACLE_SID
TNS Ping Utility for Linux: Version 12.1.0.1.0 - Production on 16-SEP-2013 00:23:55
Copyright (c) 1997, 2013, Oracle. All rights reserved.
Used parameter files:
/home/oracle/tns_admin/sqlnet.ora
If all the databases in the configuration had their Broker services started by setting the initialization parameter dg_broker_start to true, then you should be able to create a new configuration. The command used by Data Guard Broker is called “create configuration.” The syntax is not too complex:
CREATE CONFIGURATION <configuration name> AS
PRIMARY DATABASE IS <database name>
CONNECT IDENTIFIER IS <connect identifier>;
Following our example, the configuration is created using the following little snippet:
[oracle@server1 ∼]$ dgmgrl /@cdb1dca
DGMGRL for Linux: Version 12.1.0.1.0 - 64bit Production
Copyright (c) 2000, 2012, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> create configuration apress as
> primary database is "CDB1DCA"
> connect identifier is "CDB1DCA";
Configuration "apress" created with primary database "CDB1DCA"
DGMGRL>
Translated into plain English the Broker was instructed to create a new configuration, named apress using the primary database CDB1DCA. The connection identifier should match an entry in tnsnames.ora. It will be used whenever the Broker connects to a database in the configuration to query or modify properties. In the next step the standby databases are added.
DGMGRL> add database "CDB1DCB"
> as connect identifier is "CDB1DCB"
> maintained as physical;
Database "CDB1DCB" added
DGMGRL> add database "CDB1DCC"
> as connect identifier is "CDB1DCC"
> maintained as physical;
Database "CDB1DCC" added
DGMGRL>
The Broker has been told to add two databases to the configuration as physical standby databases. If you had a logical standby database you could use the “maintained as logical” clause instead. Now review the configuration you just created:
DGMGRL> show configuration;
Configuration - apress
Protection Mode: MaxPerformance
Databases:
CDB1DCA - Primary database
CDB1DCB - Physical standby database
CDB1DCC - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
DISABLED
As you can see Oracle has correctly added our two standby databases into the configuration. Thankfully the configuration is not enabled by default, there is some more work to be done. Every database in the configuration can be queried and managed using the Broker interface. The databases have certain properties which are settable, some are read-only. Using the broker interface you control these; think of the Broker as another way of executing alter system statements.
Caution Do not try and modify Data Guard related parameters in the initialization files! You would only confuse the Broker. If you did anyway, you had to do a manual reconciliation of the Broker and database parameters.
Other parameters have a different scope and belong to the configuration as such. The best example for one of these is the protection mode.
To complete the Broker configuration, the following tasks need to be executed:
The Broker syntax is not too daunting, but it is yet another tool to understand in addition to SQL and RMAN, which are quite common. The main verbs you need to be aware of in the next two sections are edit and show. Each verb is followed by an object such as the configuration or database. Add the database name as per the configuration and a property you would like to view or change. The following sections have all the examples you need to manage, modify, and otherwise maintain the broker configuration. There is also a very useful help function in dgmgrl that helps you remind yourself of the correct syntax.
Listener configuration
One of the greatest benefits of the Broker over the manual approach is that it fully automates the role transitions. During these, database restarts are required. As you know, a connection to a database which is shut down requires a static registration in the listener.ora file. Every listener for every host participating in the Data Guard configuration has to be amended. Remember that the listener is configured in the Grid home if you are using Oracle Restart. The Broker uses a database property, called StaticConnectIdentifier to address databases. It is populated by default and does not normally need to be changed. Using the broker, you can use the show databasedatabaseNameStaticConnectIdentifier command as demonstrated here:
DGMGRL> show database "CDB1DCA" staticConnectIdentifier
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.20)(PORT=1521))
(CONNECT_DATA=( SERVICE_NAME=CDB1DCA_DGMGRL)(INSTANCE_NAME=CDB1DCA)
(SERVER=DEDICATED)))'
DGMGRL>
The service name is the one that has to be registered in the listener.ora file. To follow the example, here is the listener.ora for the current primary database.
[grid@server1 admin]$ nl listener.ora
1 # listener.ora Network Configuration File:
/u01/app/grid/product/12.1.0.1/grid/network/admin/listener.ora
2 # Generated by Oracle configuration tools.
3 SID_LIST_LISTENER =
4 (SID_LIST =
5 (SID_DESC =
6 (GLOBAL_DBNAME = CDB1DCA)
7 (ORACLE_HOME = /u01/app/oracle/product/12.1.0.1/dbhome_1)
8 (SID_NAME = CDB1DCA)
9 )
10 (SID_DESC =
11 (GLOBAL_DBNAME = CDB1DCA_DGMGRL)
12 (ORACLE_HOME = /u01/app/oracle/product/12.1.0.1/dbhome_1)
13 (SID_NAME = CDB1DCA)
14 )
[configuration information removed for clarity]
25 )
26 VALID_NODE_CHECKING_REGISTRATION_LISTENER = SUBNET
27 LISTENER =
28 (DESCRIPTION_LIST =
29 (DESCRIPTION =
30 (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
31 )
32 (DESCRIPTION =
33 (ADDRESS = (PROTOCOL = TCP)(HOST =server1.example.com)( PORT = 1521))
34 )
35 )
36 ADR_BASE_LISTENER = /u01/app/grid
37 ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER = ON
[grid@server1 admin]$
The additional information to be added starts in line 3. Grid Infrastructure does not add database services statically by default. As you can see the global database name is set to the service name you got from the static connect identifier. If you are using a database domain (initialization parameter db_domain) then add the domain after the ${ORACLE_SID}_DGMGRL, for example CDB1DCA_DGMGRL.example.com. Do not forget to reload the listener for the changes to take effect. Careful though with instances that register dynamically with the listener: it could take a little while until they announce their presence again. You might want to run “alter system register” on these and/or perform the operation during a quieter period. Before you continue with the other listeners you should do a quick connection test. Launch sqlplus, then connect as sys@staticConnectionIdentifier as sysdba. If you are told you are connected, chances are high that the Broker can connect to the database as well.
Standby file management
The management of data files has to be automated. By default the DBA is responsible for managing data files on the standby databases which have previously been added to the primary database. The addition of a new file to the primary database generates redo as well. The same statement is then executed on the standby database, and thanks to Oracle Managed Files or the file name conversion parameters in the initialization file the file is created in the correct location with automatic standby file management.
Before you change the standby file management on all databases, primary and standby alike, you should quickly check if your file name conversion parameters are set.
Note Skip this test if you are using Oracle Managed Files and your disk group names are identical.
The two file-system based databases used in the earlier example should have file name and log file name conversion parameters set in both directions. In the example the primary database did not have name conversion parameters set, which has been rectified:
DGMGRL> show database "CDBFSDCB" dbFileNameConvert
DbFileNameConvert = '/u01/oradata/CDBFSDCA, /u01/oradata/CDBFSDCB'
DGMGRL> show database "CDBFSDCB" logFileNameConvert
LogFileNameConvert = '/u01/oradata/CDBFSDCA, /u01/oradata/CDBFSDCB'
DGMGRL> edit database "CDBFSDCA" set property
> dbFileNameConvert = '/u01/oradata/CDBFSDCB, /u01/oradata/CDBFSDCA';
Warning: ORA-16675: database instance restart required for property value modification
to take effect
Property "dbfilenameconvert" updated
DGMGRL> edit database "CDBFSDCA" set property
> logFileNameConvert = '/u01/oradata/CDBFSDCB, /u01/oradata/CDBFSDCA';
Warning: ORA-16675: database instance restart required for property value modification
to take effect
Property "logfilenameconvert" updated
DGMGRL>
If your disk group names in ASM are the same across all databases in the configuration you do not need to set any file name conversion parameters. Now ensure that standby file management is set to auto:
DGMGRL> show database "CDBFSDCA" StandbyFileManagement
StandbyFileManagement = 'MANUAL'
DGMGRL> edit database "CDBFSDCA" set property
> StandbyFileManagement = auto;
Property "standbyfilemanagement" updated
Repeat the edit command for all standby databases in the configuration.
Configuring redo application delay
Sometimes it is useful to have a third database lag a little behind the primary database to have more time to detect logical corruption. Humans are humans, and they make mistakes. Maybe someone starts a batch job at the end of the day with the wrong input parameters? Or someone executes a data cleaning operation that uses the wrong predicates in the where clause? There are many reasons your system could be in a bad state. If one of your standby databases trails behind for say 30 minutes then you have a little extra time before that, too, has to be recreated.
To enable the apply delay, you need to change the database property DelayMins as shown here, where a 30-minute delay is set.
DGMGRL> edit database "CDB1DCC" set property DelayMins = 30;
Property "delaymins" updated
DGMGRL>
Having one of the standby databases in the Data Guard configuration with an apply delay does not impact the ability to fail over to another standby database which is 100% in sync with production. Be aware though that a standby with an apply lag cannot be the target of a switchover operation.
Enabling the configuration
With all the above changes made, it is finally time to enable the configuration and to verify the setup. Enable the configuration while connected to the primary database in the Broker command line interface.
[oracle@server1 ∼]$ dgmgrl /@cdb1dca
...
DGMGRL> enable configuration;
If everything went according to plan, you should get the prompt back a few seconds after issuing the statement. To see if there are any irregularities or other problems, execute the show configuration command again:
DGMGRL> show configuration
Configuration - apress
Protection Mode: MaxPerformance
Databases:
CDB1DCA - Primary database
CDB1DCB - Physical standby database
CDB1DCC - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
The most commonly reported problems at this stage are missing server parameter files on either database, or missing standby redo logs. There should be no problems here at this stage if you followed the instructions in the chapter.
Changing the protection mode to maximum availability
You read in the previous chapter’s section about protection modes that you need to set the log transfer mode to SYNC if you want any of the higher protection levels. With synchronous shipping of redo the speed at which a database’s commit statement completes partially depends on the network latency as well. If it takes considerable time for a network packet to travel to a standby database the overall commit latency will suffer too. The advantage of the synchronous transfer mode though is a better protection from data loss.
The protection mode Maximum Availability seems to be the best compromise of data protection and continuity of service. To enable it, the databases in the Data Guard configuration need to be switched to ship redo synchronously. Connected to the Broker command line utility, you need to update the LogXptMode database property. You can see the example here:
DGMGRL> edit database "CDB1DCA" set property LogXptMode=Sync;
Property "logxptmode" updated
DGMGRL> edit database "CDB1DCB" set property LogXptMode=Sync;
Property "logxptmode"
You may have noticed that the third standby, which is configured to delay the application of redo for 30 minutes, has not had the log transport property changed. In a delayed configuration it does not matter much if the redo information is shipped asynchronously or synchronously.
As soon as the log ship property has been updated you can change the protection mode. The “edit configuration” command helps you do this. In the example the protection mode is upgraded from maximum performance to maximum availability. This upgrade does not require a restart of any database in the configuration.
DGMGRL> edit configuration set protection mode as maxAvailability;
Succeeded.
DGMGRL> show configuration
Configuration - apress
Protection Mode: MaxAvailability
Databases:
CDB1DCA - Primary database
CDB1DCB - Physical standby database
CDB1DCC - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
The configuration is now changed to maximum availability. You could read in the previous chapter that Data Guard redo can be shipped to the standby, which acknowledges that it received the redo in a memory buffer but has not written it to disk yet. This mode is only available in Maximum Availability and is called “fastsync”. It corresponds to the options “SYNC NOAFFIRM” in the log_archive_dest_n initialization parameter. Before Oracle 12c you could only have ASYNC NOAFFIRM (logXptMode “async”) or SYNC AFFIRM (logXptMode “sync”). Refer back to the previous chapter for a discussion of the fastsync transport mode.
Enabling Flashback on the standby
With the Broker active and managed recovery running you cannot add standby redo logs or enable flashback on the standby database. Trying to do so results in an error:
SQL> select flashback_on, open_mode from v$database;
FLASHBACK_ON OPEN_MODE
------------------ --------------------
NO MOUNTED
SQL> alter database flashback on;
alter database flashback on
*
ERROR at line 1:
ORA-01153: an incompatible media recovery is active
Please do not issue the SQL command to stop managed recovery manually! All Data Guard related maintenance has to be performed via the Broker. Change the state of the standby database to “apply-off” using the Broker command line interface again. This will stop managed recovery.
DGMGRL> edit database "CDB1DCB" set state="apply-off";
Succeeded.
You can then enable the flashback database feature, and enable managed recovery again:
DGMGRL> edit database "CDB1DCB" set state="apply-on";
Succeeded.
Completely removing the Broker configuration
Sometimes it is necessary to completely remove a Data Guard Broker configuration. The main reason for having to remove the Broker configuration is a completely failed role transition paired with a network problem or some other hard-to-explain weirdness. In some rare circumstances the switchover succeeds, but the Broker is unaware that it did.
To remove the configuration you need to stop the Broker on each database in the broker configuration. If your configuration is not entirely broken then you might be lucky and query which databases are currently part of the configuration:
SQL> select DB_UNIQUE_NAME,PARENT_DBUN,DEST_ROLE from V$DATAGUARD_CONFIG;
DB_UNIQUE_NAME PARENT_DBUN DEST_ROLE
------------------------------ ------------------------------ -----------------
CDB1DCA NONE PRIMARY DATABASE
CDB1DCB CDB1DCA PHYSICAL STANDBY
CDB1DCC CDB1DCA PHYSICAL STANDBY
In our scenario, there are three databases in the configuration: CDB1DCA, CDB1DCB, and CDB1DCC. The first step in removing the broker configuration is to stop the Broker processes. On each database, issue this command:
SQL> alter system set dg_broker_start = false;
Then you need to remove the Broker configuration files. Their location is stored in the initialization parameters dg_broker_config_file1 and dg_broker_config_file2 respectively. After the Broker services are stopped, move these files to a different location. Now start the Broker on each database, the missing broker configuration files will be created as part of the process.
Performing a graceful switchover operation
Regular disaster recovery tests are essential in spreading confidence in the procedure and to detect changes to the setup that require either fixing or updating of the documentation. Very often it is not the database that causes problems—Data Guard is very reliable—but hard-coded connection strings in applications. These problems can easily be identified by regular switchover exercises.
With the Broker configuration in place it is very simple to perform the switchover. A first check should be the switchover readiness of the database you want to switch to. In a multi-standby scenario you should switch over to the database that is most in sync with production. Technically speaking there is nothing wrong with switching over to a database that is hours behind the primary, but to save time and nerves it is not recommended to send all that redo over to it before the switchover can actually occur. If you are unsure about transport and apply lags you can query the standby databases in your configuration as shown here:
[oracle@server2 ∼]$ dgmgrl /@cdbfsdcb
DGMGRL for Linux: Version 12.1.0.1.0 - 64bit Production
Copyright (c) 2000, 2012, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> show database "CDBFSDCB"
Database - CDBFSDCB
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Apply Rate: 1.22 MByte/s
Real Time Query: OFF
Instance(s):
CDBFSDCB
Database Status:
SUCCESS
The below example extends the file system scenario, and there are two databases in the configuration. Connect to the current primary database and query the state of the configuration:
DGMGRL> show configuration
Configuration - fsconfig
Protection Mode: MaxPerformance
Databases:
CDBFSDCA - Primary database
CDBFSDCB - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
The Data Guard broker returned success: there are no known problems that could prevent a switchover from succeeding. Just to be safe you can query the database for its readiness to switch roles:
DGMGRL> validate database "CDBFSDCB"
Database Role: Physical standby database
Primary Database: CDBFSDCA
Ready for Switchover: Yes
Ready for Failover: Yes (Primary Running)
Flashback Database Status:
CDBFSDCA: Off
CDBFSDCB: Off
Current Log File Groups Configuration:
Thread # Online Redo Log Groups Standby Redo Log Groups
(CDBFSDCA) (CDBFSDCB)
1 3 2
Future Log File Groups Configuration:
Thread # Online Redo Log Groups Standby Redo Log Groups
(CDBFSDCB) (CDBFSDCA)
1 3 2
As you can see from the above output the database is ready for switchover, so let’s try:
DGMGRL> switchover to "CDBFSDCB"
Performing switchover NOW, please wait...
New primary database "CDBFSDCB" is opening...
Operation requires startup of instance "CDBFSDCA" on database "CDBFSDCA"
Starting instance "CDBFSDCA"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "CDBFSDCB"
DGMGRL>
Congratulations, you have successfully switched over to the standby database. Many DBAs worry what could happen during a switchover procedure, and the good news is: not too much. The worst case is a network problem right at the time the switchover has been initiated. As part of the switchover Oracle will send a special token over the network, indicating the last redo to be sent. This can be seen in the new primary database’s alert.log.
2013-09-16 11:08:47.856000 +01:00
Media Recovery Log
/u01/fast_recovery_area/CDBFSDCB/archivelog/2013_09_16/o1_mf_1_14_93fp5h61_.arc
Resetting standby activation ID 2052880791 (0x7a5c7997)
Media Recovery End-Of-Redo indicator encountered
Media Recovery Continuing
Media Recovery Waiting for thread 1 sequence 15
SWITCHOVER: received request 'ALTER DTABASE COMMIT TO SWITCHOVER TO PRIMARY'
from primary database.
ALTER DATABASE SWITCHOVER TO PRIMARY (CDBFSDCB)
[...]
SwitchOver after complete recovery through change 1842430
[...]
Standby became primary SCN: 1842428
2013-09-16 11:08:54.953000 +01:00
Switchover: Complete - Database mounted as primary
SWITCHOVER: completed request from primary database.
Only if all the redo information has been received the switchover will occur. Therefore you cannot incur data loss. As part of the switchover all databases in the Data Guard configuration will become standby databases (again, just a flag in the control file). If the switchover failed for any reason, the database can be converted back to a primary and resume normal operations.
If you made a mistake in statically configuring the DGMGRL services with the listeners you will get a message that in order to complete the switchover you have to manually start the old primary. This is not a problem, and once completed the Broker should return success when you check the status of the configuration. Verify that your configuration is what you expect it to be:
DGMGRL> show configuration
Configuration - fsconfig
Protection Mode: MaxPerformance
Databases:
CDBFSDCB - Primary database
CDBFSDCA - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
If you were using block change tracking on the primary before the switchover to facilitate faster level 1 backups of the database then please ensure you enable BCT again once the switchover completed, if you need it. By default the block change tracking file will not be created and enabled on the new primary. Be careful enabling BCT on the standby database because you need a license for Active Data Guard to do so. If your switchover operation is not part of a scheduled test then please test the connectivity with your backup solution and the database, the archive log deletion policy and other infrastructure tasks that you might regularly perform to be compliant with your standards and outside regulators. Also verify that the log file name and data file name conversion parameters are set on the new standby, and that it has standby redo logs. It always pays off to configure databases in a Data Guard environment for all roles. If you rely on Oracle Clusterware to start and stop databases, review the configuration to ensure that the standby database starts into the appropriate mode. Remember the license implications of having a standby database open read only while applying logs. You want the primary database to be open in read write mode.
Performing a failover operation
A failover operation is not usually something users are happy to test in a non-crisis situation. A decision for a failover has to be made by senior management for an application, and should have been practiced and documented to avoid mistakes when it happens. Coordination for a failover will be even more difficult in a DBaaS environment where all of the application owners have to agree. As with the switchover, the failover target should be selected in such way that data loss is minimized. The transport lag plays an important role, especially in protection mode Maximum Performance. Remember that the redo is sent directly to standby redo logs from where it is applied to the standby database with Real Time Apply. In the default settings for the maximum availability mode transactions commit on the primary only after the standby database confirmed that redo has been received and indeed written into the SRLs. The transport lag therefore is probably very small.
You could use the Broker command line interface to check the transport lag before a failover operation. Consider the following example, where database CDBFSDCB is the standby database for CDBFSDCA:
DGMGRL> show database "CDBFSDCB"
Database - CDBFSDCB
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Apply Rate: 115.00 KByte/s
Real Time Query: OFF
Instance(s):
CDBFSDCB
Database Status:
SUCCESS
The conditions for a failover seem favorable. As with the graceful switchover operation you can check the destination database’s role change readiness (the primary database has by now crashed):
DGMGRL> validate database "CDBFSDCB"
Database Role: Physical standby database
Primary Database: CDBFSDCA
Warning: primary database was not reachable
Ready for Switchover: No
Ready for Failover: Yes (Primary Not Running)
Temporary Tablespace File Information:
CDBFSDCA TEMP Files: Unknown
CDBFSDCB TEMP Files: 3
Flashback Database Status:
CDBFSDCA: Unknown
CDBFSDCB: On
Data file Online Move in Progress:
CDBFSDCA: Unknown
CDBFSDCB: No
Transport-Related Information:
Transport On: No
Gap Status: Unknown
Transport Lag: 0 seconds
Transport Status: Success
Log Files Cleared:
CDBFSDCA Standby Redo Log Files: Unknown
CDBFSDCB Online Redo Log Files: Cleared
DGMGRL>
The database is indeed ready for a failover, and you are informed that the primary database is not running. Unlike with a switchover, there are two possible ways to perform the failover: the preferred and strongly recommended complete failover and the emergency-not-to-be-used-if-at-all-possible immediate failover.
Performing an immediate failover
The immediate failover is the panic button. It is not recommended unless there is really no other way to fail over. You will almost certainly incur data loss! No additional redo is applied to the standby; if there is a transport lag, you will incur data loss which is proportional to the transport lag.
For the reasons shown in this section the immediate failover will not be discussed further. The only application one can think of is to allow a standby which is lagging behind by design to take over the primary role after a release went wrong or similar logical corruption occurred on the primary. Your monitoring tools should have picked that up.
Caution Always use the complete failover! The immediate failover is the last-resort-panic button to be avoided wherever possible.
Performing a complete failover
The complete failover, which Oracle recommends over the previously discussed immediate failover, is the default. During a complete failover Oracle will try to salvage as much redo from the primary as possible. This includes a new Oracle feature, which allows you to manually mount the old primary and flush previously unsent redo. At the time of this writing there was no Broker support to do so. This might allow for zero data loss, even though the Data Guard configuration is not set to such a protection mode. If the old primary cannot be mounted this operation cannot take place for obvious reasons.
The failover itself is executed very rapidly. Consider the below example:
DGMGRL> show configuration
Configuration - fsconfig
Protection Mode: MaxAvailability
Databases:
CDBFSDCA - Primary database
CDBFSDCB - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
ORA-12543: TNS:destination host unreachable
ORA-16625: cannot reach database "CDBFSDCA"
DGM-17017: unable to determine configuration status
As you can see the primary database is inaccessible—the listener is down which implies a severe problem with the host. After a little further investigation it turned out that the whole rack has gone offline due to a problem with the power supply. After careful consideration and application of standard Business Continuity Plans the decision has been made to fail over to the standby database. The DBA then enters the failover command as demonstrated below. The connection to the Broker is initiated on the standby host to the standby database.
DGMGRL> help failover
Changes a standby database to be the primary database
Syntax:
FAILOVER TO <standby database name> [IMMEDIATE];
DGMGRL> failover to "CDBFSDCB";
Performing failover NOW, please wait...
Failover succeeded, new primary is "CDBFSDCB"
The whole failover operation does not take long. The good news is that the operation did not involve data loss. The old primary database’s alert.log reports:
Data Guard Broker: Beginning failover
ALTER DATABASE FAILOVER TO CDBFSDCB
[...log information about database recovery]
Terminal Recovery finished with No-Data-Loss
2013-09-16 12:05:57.922000 +01:00
Incomplete Recovery applied until change 1870588 time 09/16/2013 11:57:20
Media Recovery Complete (CDBFSDCB)
Terminal Recovery: successful completion
As part of the failover operation the protection mode is automatically downgraded, which is reflected in the output of the “show configuration” command.
DGMGRL> show configuration
Configuration - fsconfig
Protection Mode: MaxAvailability
Databases:
CDBFSDCB - Primary database
Warning: ORA-16629: database reports a different protection level from the protection mode
CDBFSDCA - Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: DISABLED
Configuration Status:
WARNING
Two tasks are now remaining. The message that the database reports a different protection level from the protection mode refers to V$DATABASE.PROTECTION_MODE and PROTECTION_LEVEL. They are not identical since there is a network connectivity problem between the databases. Additionally the old primary database will not currently accept archived logs from the new primary: it still thinks it is the primary. Proceed with the reinstation of the failed primary.
Reinstating the old primary database
A failover operation always required a complete rebuild of the failed database up to Oracle 10g. Luckily this has changed with the introduction of the Flashback Database command. In addition to the recreation of the failed standby which still might be required you have the option to reinstate the failed database. Reinstating the database is a lot quicker than a recreation: it simply flashes the database back to the SCN before the standby became a primary (as per V$DATABASE.STANDBY_BECAME_PRIMARY_SCN on the old primary), and converts it to a standby database. If such an operation is possible the Broker will report the following message when you query the configuration status:
CDBFSDCA - Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated
If the database cannot be reinstated, you will see the following message instead:
CDBFSDCA - Physical standby database (disabled)
ORA-16795: the standby database needs to be re-created
If you had flashback database enabled and the required flashback and archived redo logs are present you can try to reinstate the database as shown here. The database can obviously be reinstated only if it can be brought online into the mount state, either manually or via the broker.
DGMGRL> reinstate database "CDBFSDCA"
Reinstating database "CDBFSDCA", please wait...
Reinstatement of database "CDBFSDCA" succeeded
Success! The database is now a fully functional member of the Broker configuration again:
DGMGRL> show configuration
Configuration - fsconfig
Protection Mode: MaxAvailability
Databases:
CDBFSDCB - Primary database
CDBFSDCA - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
The start of redo shipping after reinstating the failed database the protection_mode and protection_level columns in v$database will be both set to “MAXIMUM AVAILABILITY”. You can begin raising a change request to reverse the roles during the next change window. Before doing so you should ensure that the broker does not complain; missing standby redo logs are a common cause for a warning. Block Change Tracking might need to be enabled, and Oracle Clusterware metadata for the databases should be reviewed and amended where needed.
Creating a lights-out configuration using the Broker
The Data Guard Broker can also be configured for a lights-out situation. Lights-out can be understood as an environment where the management of failover operations is delegated to software. In the case of Data Guard this software is called the Observer. The observer process, which should ideally be in a third site, has the task to monitor the primary database. Under certain conditions the failover operation can be initiated by the Observer, without human intervention at all. Refer back to Figure 10-1 for a description of the architecture. The Observer in this case is located in data center C, and the database configuration consists of the primary database in data center A and a single standby database in data center B. For this Broker configuration there is no third database, although it is possible to have many standby databases, even a mix of physical and logical.
The Fast Start Failover or FSFO configuration is always based on an existing Broker configuration such as the one shown here:
DGMGRL> show configuration verbose
Configuration - fsconfig
Protection Mode: MaxAvailability
Databases:
CDBFSDCB - Primary database
CDBFSDCA - Physical standby database
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
TraceLevel = 'USER'
FastStartFailoverLagLimit = '30'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
Fast Start Failover (FSFO) is disabled as you can see. As a prerequisite to enabling FSFO you will need the primary and the target standby database to have Flashback Database enabled. Ideally the other databases in the configuration have flashback enabled as well, but this is not mandatory. It is possible to configure FSFO with a configuration using maximum performance—or asynchronous redo shipping—but that is not recommended for the risk of data loss. If you trust software to perform the failover operation then you would want to be in a position where no data loss is expected. The previously suggested maximum availability configuration is used in this example, and the Oracle documentation claims that it will ensure there is no data loss. In situations where you have multiple standby databases you need to define which standby database you would like the Observer to fail over to, and it can only fail over to that one database. Setting the database property “FastStartFailoverTarget” on the primary database triggers the Broker to update all other databases in the configuration. It needs to be set only if you have more than one standby database. Another parameter you need to set is the Fast Start Failover Threshold. This parameter which measures the maximum duration in seconds during which neither the standby nor the Observer have connectivity with the primary database. Automatic failover will be initiated by the Observer if no communication is possible with the primary for more than FastStartFailoverThreshold seconds. During the process the primary will shut down by default. As part of the failover procedure, the failed primary database is automatically reinstated by the Broker once it is entering mount state unless you configure it not to.
With the default configuration parameters in place you can enable FSFO, connected as SYS to the primary database:
DGMGRL> ENABLE FAST_START FAILOVER;
Enabled.
DGMGRL> show configuration
Configuration - fsconfig
Protection Mode: MaxAvailability
Databases:
CDBFSDCB - Primary database
Warning: ORA-16819: fast-start failover observer not started
CDBFSDCA - (*) Physical standby database
Warning: ORA-16819: fast-start failover observer not started
Fast-Start Failover: ENABLED
Configuration Status:
WARNING
DGMGRL>
The little asterisk next to CDBFSDCB indicates that that standby database is the preferred failover target.
TROUBLESHOOTING USING THE BROKER LOGFILE
It is possible that the enable fast_start failover command fails. If it fails it is not immediately obvious why it does. The Broker commands “show configuration” and “show database” do not reveal the cause of the failure. In such cases you need to check the Broker log. The log file used by the Broker resides in the trace directory just like the text representation of the database’s alert.log. If you are unsure where to find the directory on disk, simply query the database:
SQL> select value from v$diag_info
2 where name = 'Diag Trace';
The Broker logfile is named drc${ORACLE_SID}.log. If you tail it just before you try to enable FSFO you hopefully get meaningful errors, like this one:
09/16/2013 15:25:14
ORA-16693, FSFO requires the target standby to be in flashback mode
The error message indicated the requirement for the database to have Flashback Database enabled.
In the next step you need to start the observer on the third site. The observer is simply an instance of the dgmgrl utility constantly checking the standby and primary database. As such it is sufficient to install the Oracle client for administrators or alternatively make use of an existing Broker in an RDBMS server installation. Note that the prompt will not return after you started the observer, making it advisable to start the broker in its own screen session. Screen is a little command line utility that should be installed by default on all machines in the author’s opinion. Similar to VNC in principle it allows you to connect to a server and keep your session on the server. The connection is established using SSH however. Screen allows you to execute commands “in the background” using virtual sessions. The biggest benefit is that a loss of network connectivity does not impact your work. When reconnecting to the server you can re-attach to your screen session and resume where you left. When starting using “screen -L” it records all the output in a log file in the current directory. When you have started the observer as shown below you can detach from the screen session leaving the process to run. Refer to the manual page of screen for more information on how to use screen effectively.
Note that the observer is started from a host in data center C; none of the databases in the Broker configuration are located in this data center. The following example is executed from a screen session to allow the observer to execute even if the session to the database host is lost or disconnected:
[oracle@server3 ∼]$ cd ∼/tns_admin/
[oracle@server3 tns_admin]$ export TNS_ADMIN=$(pwd)
[oracle@server3 tns_admin]$ dgmgrl /@cdbfsdcb
DGMGRL for Linux: Version 12.1.0.1.0 - 64bit Production
Copyright (c) 2000, 2012, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> start observer
Observer started
[blinking cursor-command WILL not return]
After the Observer has been started the warnings in the Broker configuration disappear. You can query the FSFO configuration in two ways: showing the configuration as demonstrated before or by using the show fast_start failover command:
DGMGRL> show configuration verbose
Configuration - fsconfig
Protection Mode: MaxAvailability
Databases:
CDBFSDCB - Primary database
CDBFSDCA - (*) Physical standby database
(*) Fast-Start Failover target
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
TraceLevel = 'USER'
FastStartFailoverLagLimit = '30'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: CDBFSDCA
Observer: server3.example.com
Lag Limit: 30 seconds (not in use)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configuration Status:
SUCCESS
The output of the Fast Start Failover configuration is shown next:
DGMGRL> show fast_start failover
Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: CDBFSDCA
Observer: server3.example.com
Lag Limit: 30 seconds (not in use)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES
Oracle Error Conditions:
(none)
DGMGRL>
It summarizes the settings which trigger a failover, the ones shown here are the defaults. Note that a graceful shutdown of the primary will not automatically trigger a failover. Translated into English the FSFO process will be initiated if the observer and the standby database cannot “see” the primary database for more than 30 seconds. The lag limit which defines the maximum allowable time the standby is lagging behind the primary to be an eligible target for an automatic failover is not used in a Maximum Availability configuration. As part of the automated process the old primary will be shut down to ensure no one can connect to it. The old primary will also be reinstated. And finally, the observer will not reconnect to its targets. This is a property you should probably change. The value is in seconds; do not set it too low to incur the overhead of frequent connections but not too high either. Allowing the observer to reconnect gives you some more resilience over transient network glitches.
The section “configurable failover conditions” allow the administrator to fine-tune the failover conditions, and you can even add your own custom conditions using ORA-xxxxx error codes. If now for example your network connectivity from data center A to the rest of the world fails, the Broker will automatically fail over to the designated standby database. The output from the Observer process in data center C indicates that there has indeed been a problem:
DGMGRL> start observer
Observer started
17:55:11.81 Monday, September 16, 2013
Initiating Fast-Start Failover to database "CDBFSDCA"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "CDBFSDCA"
17:55:37.00 Monday, September 16, 2013
The old primary is shut down at this stage, and the Broker logfile recorded more information as to what happened.
09/16/2013 17:55:12
FAILOVER TO CDBFSDCA
Beginning failover to database CDBFSDCA
Notifying Oracle Clusterware to prepare database for FAILOVER
09/16/2013 17:55:19
DMON: Old primary "CDBFSDCB" needs reinstatement
09/16/2013 17:55:33
Warning: Another process is opening pluggable databases
Waiting for open to complete.
Will retry in 15 seconds, maximim wait time is 15 minutes
09/16/2013 17:55:36
Protection mode set to MAXIMUM AVAILABILITY
09/16/2013 17:55:37
Deferring associated archivelog destinations of sites permanently disabled due to Failover
Notifying Oracle Clusterware to buildup primary database after failover
Posting DB_DOWN alert ...
... with reason Data Guard Fast-Start Failover - Primary Disconnected
Notice how it detected that a failover has likely taken place. As soon as the old primary is started again it will be reinstated:
18:01:19.41 Monday, September 16, 2013
Initiating reinstatement for database "CDBFSDCB"...
Reinstating database "CDBFSDCB", please wait...
Reinstatement of database "CDBFSDCB" succeeded
18:01:58.56 Monday, September 16, 2013
The configuration shows no more warnings, and you can switch over to the old primary during the next change window.
Maintaining archived logs
Maintaining archived redo logs in a Data Guard configuration is something that needs careful planning. A very common problem with standby databases falling out of sync with the primary is when the archive log destination fills up. Without space no additional redo can be applied on the standby, and a flurry of errors will be recorded in the alert.log of both the primary database as well as the standby database that has run out of space. A fantastic way to deal with archived logs—but not only them—is the Fast Recovery Area (‘FRA’). It has been introduced with Oracle 10g as the Flash Recovery Area, but was renamed to Fast Recovery Area in a later release; for all purposes the two are the same.
The goal of the FRA, as it is often abbreviated to, is to simplify management of files which need backing up. Files found to be stored in the FRA include
The FRA is defined by two initialization parameters, and can be configured while the database is up and running, as shown here:
SQL> alter system set db_recovery_file_dest_size = 40G;
System altered.
SQL> alter system db_recovery_file_dest = '/u01/fast_recovery_area';
System altered.
Note that you have to define the size of the FRA before you can set the location. Oracle will then automatically create a directory /u01/fast_recovery_area/instance_name to store the contents pertaining to this database. For each file type, you will see an OMF-compliant structure. Archived logs are stored in ./archivelog/yyyy_mm_dd/ with an OMF file name, regardless of the log_archive_format you may have set. If you are on ASM then you set the FRA location to the ASM diskgroup, such as +RECO for example. Oracle will take care of the rest.
Now why is the FRA such a great help for the DBA? This has to do with the way the FRA handles space pressure. As soon as you are running very low on space, the FRA will automatically clear files that are no longer needed. It cannot work wonders though; if there is a guaranteed restore point, or if logs have not been applied to the standby then the files may not be deleted. The same is true for Streams or Golden Gate replication. The parameter to set for the automatic deletion of archived logs depends on the backup strategy. It is a RMAN setting, called ARCHIVELOG DELETION POLICY. In the context of Data Guard the policy can be set to either of these values:
Note Refer to Chapter 11 for more information about the Recovery Manager backup strategies available.
If you are implementing a backup strategy where you backup archived logs on the primary database, then you could set the deletion policy either to “none” or “backed up n times to device type sbt.” Setting the policy to “none,” which is the default, implies that archived logs in the FRA can still be deleted! However, to be deleted they have to meet all of the below:
You could use the “backed up n times” policy instead to get more control over the deletion process. An archived log will only be deleted if the specified n copies to the required medium-tape or disk.
The “applied to/shipped to standby” policies are very useful to manage space pressure. When selecting the “applied on all standby” policy archived logs will be deleted only if the archived redo logs have actually been applied on all standby databases. Conversely, the “shipped to all standby” policy will mark archived redo logs eligible for deletion after they have been transferred to the remote destination.
Oracle’s recommendation with regard to these policies has been defined in the Maximum Availability Architecture as follows:
The following example clarifies the point. The current space usage in the Fast Recovery Area can be queried from within a database:
SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE .38 0 1 0
REDO LOG 71.44 0 7 0
ARCHIVED LOG 1.42 0 11 0
BACKUP PIECE .38 0 1 0
IMAGE COPY 0 0 0 0
FLASHBACK LOG 0 0 0 0
FOREIGN ARCHIVED LOG 0 0 0 0
AUXILIARY DATAFILE COPY 0 0 0 0
8 rows selected.
You can see that the FRA—which had its size artificially reduced—has significant space pressure. In the next few minutes a few forced log switches on the primary increased this even further. But luckily the standby database is applying the logs in Real Time Apply, so the logs do not need to stay on disk for long. And indeed, thanks to the log deletion policy chosen—applied on all standby—the Oracle Managed Files are purged from the FRA as soon as they are no longer needed. This is reflected in the alert.log of the standby database:
[oracle@server1 trace]$ grep "Deleted Oracle managed" alert_CDB1DCA.log
Deleted Oracle managed file +RECO/CDB1DCA/ARCHIVELOG/2013_09_16/thread_1_seq_6.257.826281897
Deleted Oracle managed file +RECO/CDB1DCA/ARCHIVELOG/2013_09_16/thread_1_seq_7.264.826281945
Deleted Oracle managed file +RECO/CDB1DCA/ARCHIVELOG/2013_09_16/thread_1_seq_8.269.826300853
Querying the space usage again you see that the space used by the archived logs as well as the number of archived logs has changed.
SQL> select file_type,percent_space_used,percent_space_reclaimable,number_of_files
2 from V$RECOVERY_AREA_USAGE;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
----------------------- ------------------ ------------------------- ---------------
CONTROL FILE 3.6 0 1
REDO LOG 71.4 0 7
ARCHIVED LOG 3.4 .8 3
BACKUP PIECE 0 0 0
IMAGE COPY 0 0 0
FLASHBACK LOG 0 0 0
FOREIGN ARCHIVED LOG 0 0 0
AUXILIARY DATAFILE COPY 0 0 0
8 rows selected.
Caution Oracle’s documentation states that files that are required for Stream may be deleted from the FRA. Ensure you have backups of these files, for example by creating intra-day archivelog backups
The above usage of the Fast Recovery Area does not relieve you from the task of monitoring of the archive destination. It should however allow you to simplify the management of archived redo logs on systems where they are not constantly backed up and purged by RMAN.
Data Guard specifics for Pluggable Databases
Pluggable Databases are one of the most interesting new features of Oracle 12c and you can read a lot about them in Chapter 7. There are a few specifics when it comes to Data Guard and Pluggable Databases. No need to worry, all the previous examples have been using a Container Database (CDB). The following cases need closer examination:
As of this writing the scope of Data Guard is the Container Database, not the individual Pluggable Database. As you saw earlier in the chapter, role changes are performed for the Container. Following that logic you will notice that managed recovery is active for the CDB, not selectively for individual PDBs.
Using Data Guard with the Active Data Guard option greatly simplifies management of the standby databases. Without Active Data Guard the behavior is similar to that of transportable tablespaces in releases before 12c. When you plan to plug in a transportable tablespace set in the primary, it is your responsibility to make the same transportable tablespace data set available at the standby. The following examples will use the transparent solution with Active Data Guard in an effort not to extend the chapter even further. Be warned though that you need to be appropriately licensed to use the steps described here.
Creating a new Pluggable Database on the primary database
The creation of a new PDB is transparent to the end user. With Active Data Guard implemented the “create pluggable database” statement is executed on the standby database and the pluggable database is created. Any files that needed copying are then copied over the network. This copying is not very spectacular, since it simply works as the example will demonstrate. It does not matter if the new PDB is created from a clone of an existing PDB or as a clone of the seed PDB. The example assumes that you are familiar with creating a PDB from SQL*Plus, please refer back to Chapter 7 for a refresher on the syntax.
SQL> create pluggable database pdb4
2 admin user PDB4_DBA identified by secretPassword
3 file_name_convert=('/u01/oradata/CDBFSDCA/pdbseed','/u01/oradata/CDBFSDCA/PDB4'),
The above example is the most basic example possible where the CDB resides on a file system. Prior to executing the above command you will need to ensure that the directories referenced in the file name conversion clause exist. In addition you will need to have db_file_name_convert defined on the standby database to translate the full path. The final parameter that comes to mind regarding data file creation is standby_file_management which should always be set to auto in any Data Guard configuration.
The standby database acknowledges the creation of the PDB in the alert.log:
2013-09-17 11:34:42.864000 +01:00
Recovery copied files for tablespace SYSTEM
Recovery successfully copied file /u01/oradata/CDBFSDCB/PDB4/system01.dbf from
/u01/oradata/CDBFSDCB/pdbseed/system01.dbf
Recovery created file /u01/oradata/CDBFSDCB/PDB4/system01.dbf
Datafile 21 added to flashback set
Successfully added datafile 21 to media recovery
Datafile #21: '/u01/oradata/CDBFSDCB/PDB4/system01.dbf'
2013-09-17 11:34:52.038000 +01:00
Recovery copied files for tablespace SYSAUX
Recovery successfully copied file /u01/oradata/CDBFSDCB/PDB4/sysaux01.dbf from
/u01/oradata/CDBFSDCB/pdbseed/sysaux01.dbf
Recovery created file /u01/oradata/CDBFSDCB/PDB4/sysaux01.dbf
Datafile 22 added to flashback set
Successfully added datafile 22 to media recovery
Datafile #22: '/u01/oradata/CDBFSDCB/PDB4/sysaux01.dbf'
Querying the V$PDBS view you can also see that it has been created.
The effect of plugging in PDBs into the primary
Plugging in a previously unknown PDB is very similar to plugging in a transportable tablespace data set into the primary database but easier. This Transportable Tablespace scenario is well known for quite some time and has been documented in the Data Guard Concepts and Administration Guide section 10.3.3 in the 12c documentation set.
The basic tasks for plugging in a PDB are almost the same. First you need to make the PDB files available to the primary and standby databases. In this example the DBA has been instructed to plug in PDB5 into the primary database CDBFSDCA for which there is a single standby database CDBFSDCB. The file system based case has deliberately been chosen over ASM and OMF as it requires a little more attention to detail. As part of the clone process the DBA has made the files available in the default location on the primary and standby database(s).
[oracle@server1 CDBFSDCA]$ ls -l PDB5
total 875548
-rw-r-----. 1 oracle asmadmin 5251072 Sep 17 12:12 PDB5_users01.dbf
-rw-r-----. 1 oracle asmadmin 618668032 Sep 17 12:12 sysaux01.dbf
-rw-r-----. 1 oracle asmadmin 272637952 Sep 17 12:12 system01.dbf
[oracle@server1 CDBFSDCA]$
[oracle@server2 CDBFSDCB]$ ls -l PDB5
total 875604
-rw-r-----. 1 oracle asmadmin 5251072 Sep 17 12:12 PDB5_users01.dbf
-rw-r-----. 1 oracle asmadmin 618668032 Sep 17 12:12 sysaux01.dbf
-rw-r-----. 1 oracle asmadmin 272637952 Sep 17 12:12 system01.dbf
-rw-r-----. 1 oracle asmadmin 20979712 Sep 17 01:12 temp01.dbf
[oracle@server2 CDBFSDCB]$
Remember from Chapter 7 that you need the XML file describing the PDB as well. Without the XML metadata you cannot plug the PDB into the CDB. The XML file resides in /home/oracle/PDB5.xml in the example. With this information it is possible to plug the PDB into the primary CDB:
SQL> create pluggable database PDB5
2 using '/home/oracle/PDB5.xml'
3 nocopy;
Pluggable database created.
The operation completes successfully on both primary and standby databases.
2013-09-17 12:20:07.834000 +01:00
Recovery created file /u01/oradata/CDBFSDCB/PDB5/system01.dbf
Datafile 23 added to flashback set
[...]
Successfully added datafile 25 to media recovery
Datafile #25: '/u01/oradata/CDBFSDCB/PDB5/PDB5_users01.dbf'
In scenarios where you have more than one standby database you need to make the PDB available to all standby databases.
Unplugging PDBs from the primary
In the next example pluggable database PDB3 is unplugged from the primary. Prior to this example PDB3 exists in the primary and standby database:
SQL> select name,guid from v$pdbs;
NAME GUID
------------------------------ --------------------------------
PDB$SEED E6483A08FB7940D8E0431464A8C0318D
PDB1 E6484B59FBEC4908E0431464A8C06F2E
PDB2 E689AE04FAB41253E0431564A8C0354E
PDB3 E689AE04FAB51253E0431564A8C0354E
To show you that it exists in the standby as well v$pdbs is queried against the GUID of PDB3:
SQL> select name,guid from v$pdbs
2 where guid = 'E689AE04FAB51253E0431564A8C0354E';
NAME GUID
------------------------------ --------------------------------
PDB3 E689AE04FAB51253E0431564A8C0354E
Now let’s unplug the PDB:
SQL> alter pluggable database PDB3
2 unplug into '/home/oracle/PDB3.xml';
Pluggable database altered.
SQL> select name,guid from v$pdbs
2 where guid = 'E689AE04FAB51253E0431564A8C0354E';
NAME GUID
------------------------------ --------------------------------
PDB3 E689AE04FAB51253E0431564A8C0354E
SQL> !ls -l /home/oracle/PDB3.xml
-rw-r--r--. 1 oracle asmadmin 3690 Sep 17 12:12 /home/oracle/PDB3.xml
The command completed without a problem. You will notice that it has no effect on the availability of the PDB to the CDB.
Dropping a PDB from the primary
Dropping the PDB from the primary is mentioned here only for the sake of completeness. The operation to drop a PDB from the primary is replicated as a DDL command and executed seamlessly. Consider the following example where PDB4 is dropped on the primary:
SQL> drop pluggable database pdb4 keep datafiles;
Pluggable database dropped.
The command is executed on the standby database as well:
2013-09-17 11:56:02.048000 +01:00
Recovery dropped temporary tablespace 'TEMP'
Recovery deleting file #22:'/u01/oradata/CDBFSDCB/PDB4/sysaux01.dbf' from controlfile.
Recovery dropped tablespace 'SYSAUX'
Recovery deleting file #21:'/u01/oradata/CDBFSDCB/PDB4/system01.dbf' from controlfile.
Recovery dropped tablespace 'SYSTEM'
2013-09-17 11:56:03.106000 +01:00
Recovery dropped pluggable database 'PDB4'
A quick check against v$pdbs on the primary and standby database reveals that the PDB is indeed gone.
Summary
Well thought-through naming conventions make the life of the database administrator a lot easier. With Data Guard a new dimension is added to the scope of the naming convention. Database instances in a Data Guard environment should probably not be named after their role, especially not in a consolidated environment. A location could be a good name, otherwise users might get confused when the database STANDBY1 is in the primary role.
Standby databases are easily created using the RMAN duplicate database command. You read about the two possible cases for database duplication: to file system or to ASM/OMF based storage.
Finally you read more about configuration a Data Guard environment. The large number of configuration options and choices are nicely hidden from view when using the Data Guard Broker interface. In addition to the Broker command line interface you can use Enterprise Manager. OEM will pick up any existing Broker configuration and allow you to manage the environment.
Recent versions of Enterprise Manager have had great support for Data Guard, and centrally managed monitoring templates make it easy to alert the database administrator on shift when there are problems. The Enterprise Manager Data Guard pages offer a great view of the various performance metrics in the Data Guard configuration, most notably the transport and delay lags. If you wanted you could even use Enterprise Manager to monitor and maintain Data Guard configurations, including the initial creation of a standby database.
Less experienced DBAs find role transitions intimidating and scary, but they need not be. Instead, practice, documentation, and training will make a switchover a standard procedure much like opening and closing a database. The “practice” aspect cannot be overrated in this context. If a failover situation occurs for some reason stress levels are guaranteed to rise. Good documentation and confidence will help getting the system up and running on the disaster recovery site more easily.
The Broker command line interface again is a great way of simplifying the role transitions, and the bet is that it will take longer to get approval for the role transition than it takes to carry it out. Despite the fact that the Oracle database is highly resilient to problems during a role transition one should not forget to regularly test if applications continue to function after a role reversal. Strict coding standards should prevent hard coded connection strings from entering production applications, but time pressure can lead to cutting corners, reinforcing the need to perform regular tests.