Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. RAC Architecture

This chapter discusses features of the Oracle database that influence the architecture during the design stage of a highly available application based on Oracle Real Application Clusters (RAC). We will also point out where the availability can be limited, such as when a patch or a major upgrade must be performed. We will then have a look at intersite availability, discussing Data Guard for disaster recovery and Oracle Streams for message sharing and replication. A RAC solution is not an island; many components play a role, and many technologies can be leveraged to achieve a robust, highly available, and scalable application.

Availability Considerations

Many users chose a RAC solution because they need their application to be available to their users continuously and to be tolerant for to certain types of failures that could otherwise interrupt service. In the age of multitiered, Internet-facing applications, availability and application scalability are crucial to keeping a loyal user base. Research has shown that users can be very impatient. If the web page they are browsing doesn't render completely in the browser in less than approximately 10 seconds, they tend to move on and rarely come back.

Once implemented, the availability of a system can essentially be compromised by two different types of outages: planned and unplanned. The Oracle technology stack offers a wealth of complementary technologies that can protect against almost any type of outage. Table 3-1 lists some technologies that can save you from an unplanned outage when it occurs. Table 3-2 lists technologies that can help you prevent outages from occurring to begin with.

Table 3.1. Technologies that Counter Unplanned Outages

Failure	Technology	Mean time to recovery (estimated)
Node failure	Real Application Clusters	No downtime, but review table for implications of instance recovery
Failure occurring on the storage layer	Automatic Storage Management	No downtime when using normal or high redundancy for disk groups
Human error	Flashback Database/Flashback Table	No database downtime for recovering dropped tables Duration of the "flashback database" depends on the amount of data changed
	Data Guard with delayed redo application	Very little downtime during failover operation to standby database
Data Corruption	Data Guard	Very little downtime during failover to disaster recovery site; can be automated with fast start failover
Site failure	Data Guard	Very little downtime during failover to disaster recovery site

Table 3.2. Options to Minimize Planned Outages

Operation	Technology	Mean Time To Recovery (estimated)
Operating System changes, such as when applying security patches	Oracle RAC	No downtime when patching the OS one node at a time; use dynamic service relocation to active nodes
Changes to Oracle cluster layer	Data Guard graceful switchover	Very little downtime, assuming all standby databases are in sync with production
SAN migration	Oracle ASM	No downtime if database servers are connected simultaneously to both SANs; ASM disks can be dropped and added to an ASM disk group online and concurrently in one operation
Extending storage	Oracle ASM	No downtime as additional storage can be added to ASM disk groups; ASM will automatically start a rebalancing operation to restripe and mirror the data across all new disks
Database patch	RAC	Refer to section "Online Maintenance and Patching" for a detailed discussion of database patching and its implications
Database major release upgrade	Oracle RAC and Data Guard logical standby database	Users report downtime of 5 to 30 minutes
Platform migration	Transportable database, transportable tablespace, Oracle Streams, and Golden Gate	Highly project-specific downtime, mainly depending on the I/O subsystem and the number/size of data files
Data Changes	Many, including dbms_redefinition, online index rebuilds, and edition based redefinition	Little downtime, depending on the type of change

Deciding the Number of Nodes

A large number of RAC systems consist of only two nodes. You often see this demonstrated during the opening show of hands at Oracle user group meetings in the United Kingdom. Often such systems are used in a way similar to active/passive clusters, where RAC is deployed purely as a high availability solution.

The discussion about the appropriate number of cluster nodes should not be limited the live production system. Before users adopt RAC, they should also consider the number of nodes for the pre-production environment such as staging or quality assurance, as well as by the disaster recovery solution. For many reasons, it is beneficial to have the pre-production environment equivalent to your production environment. Otherwise, you might experience unpleasant surprises when your code is released. You should also test on an exact copy of your production database-if your data is out of sync with production; any testing is likely to be inaccurate. Having a non-production RAC cluster is essential for rolling out patches to production because patches definitely need to be tested to avoid unplanned outages or corruption of the Oracle binaries.

If no additional hardware or budget is available to create a RAC system just for testing, you might create a virtualized RAC system on a spare server. Oracle VM is a free hypervisor that can be used to host a RAC system (you can learn more about virtualization in Chapter 5).

The capabilities of the disaster recovery (DR) site should also be equal to those of the production site; the last thing the DBAs and management need in a crisis situation, such as when the disaster recovery site needs to be activated, is to find out that the DR site can't handle the workload. The common practice of "recycling" the old production environment for DR is definitely not recommended.

Two-Node RAC Clusters

There are a number of advantages to operating two-node RAC installations. First, licensing costs are lower compared to multinode configurations. Second, using only two nodes makes global cache transfers simpler because three-way communication can't happen. The licensing conditions for the Standard Edition RAC don't seem to allow for more than two nodes equipped with recent processor types, so your topology is not as flexible as with the Enterprise Edition.

When designing and implementing such a setup, each individual cluster node needs to be able to cope with the full workload in case of a node failure. Alternatively, the workload has to be reduced in such a scenario; do this by informing users to perform essential operations only if possible. This used to be a much bigger problem when individual RAC nodes were less powerful than they are now, when many production RAC nodes now are equipped with 64GB of RAM and two or more quad core CPUs. These setups should be more than capable of handling a full workload.

Oracle RAC One Node

With Oracle 11.2, a hybrid RAC between RAC and active/passive configuration has been introduced to the user community. This hybrid is known as RAC One Node. It features a complete installation of the Oracle RAC software stack that makes a later transition to full blown RAC simple in case you need to scale up. One of the main intentions of RAC One Node is to allow for planned downtime. For example, a running database instance can be migrated to another host using the omotion utility with very little impact, and maintenance can take place on the host the instance was originally running on. In contrast to a RAC system, there is only one instance actively servicing user connections; nevertheless, the database will be created as a RAC database.

Note

This is the main difference between active/passive clusters where a single instance database is protected by a high availability framework.

This single instance is registered in Oracle Grid Infrastructure, the metadata repository and high availability framework for Oracle clusters. This approach means that the RDBMS instance benefits from the high availability options Grid Infrastructure provides, namely failure detection and protection, online rolling patches, and rolling upgrades to the underlying operating system.

The base release for RAC One Node, 11.2.0.1, will only be available for Linux. It is not supported with Data Guard, nor is it available with the default installation. Users interested in RAC One Node will have to download MyOracle Support patch 9004119. Administrators of other platforms will have to wait for the first patch set to be released before they can use RAC One Node. The user guide also suggests that RAC One Node won't support third-party cluster software such as Veritas Storage Foundation for RAC or HP Serviceguard. Once the RAC database has been created on a node, a set of tools installed as part of the aforementioned patch allows the user to convert the system to a RAC One Node database. You need to call raconeinit to convert the database to RAC One Node. Once the tool finishes, you will notice that the database instance has been renamed to instanceName_1.

As mentioned previously, a new utility called omotion allows users to migrate a RAC One Node instance to another node in the cluster. Migrating the instance is useful for planned maintenance or if your cluster node is running out of resources and can no longer accommodate the database. To mitigate disruption to application users, the use of either Transparent Application Failover or Fast Connect Failover is required (you will learn about both of these later in this chapter). If none of these technologies is used, ongoing transactions will be allowed to complete within a certain user definable window; however, then users will be disconnected from the database and receive an ORA-3113 "End of line on communication channel" error. This error is caused by a shutdown of the database instance that commences after the grace period.

If you decide that there is a business case for updating from RAC One Node to "full" RAC, another tool called racone2rac can make the transition simple. Just select your RAC One Node database from a text-based menu and allow the tool to convert it to a RAC database. The result of the conversion looks similar to what you would get with a so-called policy-managed RAC database. Policy-managed databases are new with Oracle 11g Release 2, and they mark a shift away from the traditional RAC database, where the database administrator was responsible for administering all aspects of the RAC database, such as initialization parameters, online redo threads, and other structural changes. With policy-managed databases, the DBA only has to define how many nodes he'd like the database to have, and Oracle will take care of the rest. Any missing database components will be created as needed.

Multi-Node RAC Systems

Multinode systems are much more likely than the more common two-node cluster to support the whole workload if an individual node should fail. A correctly designed RAC system should have sufficient redundant capacity available to guarantee the application can continue with as little interruption as possible. Many enterprises therefore implement Real Application Clusters with more than two nodes. In the ideal case, when designing an n-node cluster, the same cluster should be able to support the workload with n−1 nodes available. This implies that there is a completely redundant last node available to the cluster. The redundant node does not have to be idle. For example, it could be used for backup and reporting; the only requirement is that all of its capacity can be made available to the cluster at short notice.

Beginning with Oracle 10.1, it is possible to add nodes to the cluster on-the-fly if your business or management teams come to the conclusion that the existing number of nodes is insufficient to support the workload. With Oracle 11.2, this has been improved and simplified with the introduction of Grid Plug and Play and policy-managed databases.

Online Maintenance and Patching

Regardless of how careful the testing for a new system was prior to its rollout, sooner or later every environment will require patches. This can be due to many reasons; the most common are listed here:

An internal error is causing application unavailability.
The currently used version is going out of support. For example, Oracle 10g Release 2 is moving out of premier support in 2010 (but Oracle is waiving the fees for extended support for an additional year), which could be a reason for businesses to move on to the next release.
A critical security patch needs to be applied.
Contractual requirement to perform scheduled maintenance.

Patches can have an effect on availability, even though Oracle has improved the situation by providing online patches and a rolling patch application mechanism. The primary tool for patch application in Oracle is opatch. However, point (major) release changes cannot be performed with opatch, so you have to use Oracle Universal Installer (OUI) for such tasks.

Warning

With critical, highly available systems, it is important not to patch systems just for the sake of patching them!

Any patching of a system is inherently risky, regardless of how much testing has been performed beforehand. Almost any database administrator can tell a story about a patch application that went wrong. Patching can also mean you have to incur downtime to the node while the cluster binaries are modified on that node.

In the Oracle database universe, there are a number of different types of patches (see Table 3-3 for a list of them).

Table 3.3. Oracle Patch Types

Patch Type	Patch description
One-off patch	This is a single patch to fix a specific problem.
Merge level request patch (MLR)	If a one-off patch conflicts with an already installed patch, Oracle development can be asked to provide a merged set of these patches.
Critical Patch Updates (CPU)	Critical Patch Updates are released quarterly and address security issues with Oracle products. Note that CPUs are not restricted to the RDBMS. Most CPUs require users to be on the last or next to last patch set, so you should regularly check Oracle's web site at `www.oracle.com/technology/deploy/security/alerts.htm` for more information about critical patches.
Patch Set Updates (PSU)	Patch Set Updates bundle previously separate patches in given product areas together. For instance, the first PSU for 10g Release 2 (its contents were published in My Oracle Support note 8576156.8) included, among other items, a RAC Recommended Patch Bundle, a Data Guard Logical Recommended Patch Bundle, and a Data Guard Physical Recommended Patch Bundle. Oracle assures users in My Oracle Support note 854428.1 that Patch Set Updates are intended to be low risk and RAC rolling installable. Oracle also claims in the same note that the patch is well tested. Don't take this as an excuse not to test. Patch Set Updates include the latest Critical Patch Update, as well.
Bundle Patch	Bundle patches were used before Patch Set Updates, and they bundled a number of patches per product area. Bundle patches were available for Data Guard logical and physical standby databases, the Data Guard broker, Clusterware, RAC, and other areas. There don't seem to be new bundle patches at the time of writing because they have been replaced by PSUs.
Patch set	A patch set is used to apply a point release to the database, such as patching a 11.1.0.6 system to 11.1.0.7. At the time of writing, there was no patch set for 11g Release 2 yet. The patch set contents are to be installed into an existing Oracle home. Rolling application of patch sets is possible for Cluster Ready Services/Clusterware/Grid Infrastructure since 10.1.0.2 (see "Oracle Clusterware (formerly CRS) Rolling Upgrades" on MOS 338706.1). Oracle Automatic Storage Management can be upgraded in a rolling fashion as well, beginning with 11.1.0.6. Note that there is a high probability of hitting unpublished bug 7436280 when upgrading clustered ASM 11.1.0.6 to 11.1.0.7 when issuing the command to stop the rolling upgrade. One of patches is available for 32-bit and 64-bit Linux, as well as some other platforms; this one-off patch is also included in the 11.1.0.7.1 Patch Set Update. RDBMS instances can't be patched in a rolling fashion. Also bear in mind that the cluster layer needs to be patched to an equal or higher version than the RDBMS and ASM binaries in RAC.
Major release update	A major database release change requires downtime and a lot of testing! An example of a major release change is the migration from 11g Release 1 to 11g Release 2 or from 10g Release 2 to 11g Release 2. You need to check the upgrade guide of your destination release for information about your upgrade paths. In any case, such a release requires you to install the new software into a separate Oracle Home. The downtime required to migrate to the new release can be minimized by using a (transient) logical standby database.

Patching Real Application Clusters Using opatch

In the past, patching a system most often meant downtime. But now, opatch supports rolling upgrades of the cluster, which minimizes the interruption to service availability. Before trying to apply any patch, it is important to verify that the Oracle Inventory containing the metadata about all nodes in the cluster and their patch levels is not corrupt. The global inventory is maintained during all patch operations and lists the available Oracle Homes on a RAC cluster, as well as the patches applied. You run the opatch utility with the lsinventory option to verify that the inventory is not corrupted, as in the following example:

[grid@london1 ˜]$ $ORACLE_HOME/OPatch/opatch lsinventory
Invoking OPatch 11.2.0.1.2

Oracle Interim Patch Installer version 11.2.0.1.2
Copyright (c) 2010, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/grid/product/11.2.0/crs
Central Inventory : /u01/app/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.2
OUI version       : 11.2.0.1.0
OUI location      : /u01/app/grid/product/11.2.0/crs/oui
Log file location : /u01/app/.../opatch/opatch2010-06-14_16-00-57PM.log

Patch history file: /u01/app/.../opatch/opatch_history.txt
Lsinventory Output file location : /u01/app/...opatch/lsinv/lsinventory2010-06-14_16-00-57PM.txt
----------------------------------------------------------------------------
Installed Top-level Products (1):
Oracle Grid Infrastructure                               11.2.0.1.0
There are 1 products installed in this Oracle Home.
Interim patches (2) :
Patch  9343627      : applied on Fri Apr 30 13:11:32 BST 2010
Unique Patch ID:  12381846
   Created on 15 Apr 2010, 11:28:38 hrs PST8PDT
   Bugs fixed:
     9343627, 9262748, 9262722
Patch  9352237      : applied on Fri Apr 30 13:06:29 BST 2010
Unique Patch ID:  12381846
   Created on 25 Mar 2010, 00:05:17 hrs PST8PDT
   Bugs fixed:
     8661168, 8769239, 8898852, 8801119, 9054253, 8706590, 8725286, 8974548
     8778277, 8780372, 8769569, 9027691, 9454036, 9454037, 9454038, 8761974
     7705591, 8496830, 8702892, 8639114, 8723477, 8729793, 8919682, 8818983
     9001453, 8475069, 9328668, 8891929, 8798317, 8820324, 8733749, 8702535
     8565708, 9036013, 8735201, 8684517, 8870559, 8773383, 8933870, 8812705
     8405205, 8822365, 8813366, 8761260, 8790767, 8795418, 8913269, 8897784
     8760714, 8717461, 8671349, 8775569, 8898589, 8861700, 8607693, 8642202
     8780281, 9369797, 8780711, 8784929, 8834636, 9015983, 8891037, 8828328
     8570322, 8832205, 8665189, 8717031, 8685253, 8718952, 8799099, 8633358
     9032717, 9321701, 8588519, 8783738, 8796511, 8782971, 8756598, 9454385
     8856497, 8703064, 9066116, 9007102, 8721315, 8818175, 8674263, 9352237
     8753903, 8720447, 9057443, 8790561, 8733225, 9197917, 8928276, 8991997,
     837736

Rac system comprising of multiple nodes
  Local node = london1
  Remote node = london2
  Remote node = london3
  Remote node = london4
-----------------------------------------------------------------------------
OPatch succeeded.

You can find the important lines here near the bottom of the output; the opatch utility discovered correctly that the system in question is a four-node RAC system. The utility also correctly detected that the command was run from the local node, london1. Do not proceed with a patch if the information reported by opatch is inaccurate because then opatch won't be able to properly register the patch information in the global inventory.

Opatch Operation Modes

OPatch can operate in four different modes for Real Application Clusters:

All node patch
Rolling patch
Minimum Downtime patch
Local patch

All Node Patch Mode

Opatch treats the cluster exactly as a single instance and patches all nodes while the database is down in this mode. This patch mode has the heaviest impact on availability, and can only be applied during an agreed downtime window. Otherwise this patch mode is not really suitable for a RAC system.

Rolling Patch Mode

All nodes in the cluster are patched, but only one at a time to avoid downtime. Process-wise, the first node is brought down for patching. When patched, it's started again. OPatch will ask you if you want to continue with the next node, and so on until all nodes are patched. There is no downtime because there is always at least one node up and servicing user requests.

Unfortunately not all patches are rolling patches. To find out if a patch is a rolling patch, you can use opatch's query option, as in this example:

[grid@london1 PSU1]$ $ORACLE_HOME/OPatch/opatch query 
> -is_rolling_patch 9343627/

Invoking OPatch 11.2.0.1.2
Oracle Interim Patch Installer version 11.2.0.1.2
Copyright (c) 2010, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/grid/product/11.2.0/crs
Central Inventory : /u01/app/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.2
OUI version       : 11.2.0.1.0
OUI location      : /u01/app/grid/product/11.2.0/crs/oui
Log file location : /u01/.../opatch/opatch2010-06-14_16-41-58PM.log
Patch history file: /u01/.../cfgtoollogs/opatch/opatch_history.txt
---------------------------------------------------------------------------
Patch is a rolling patch: true
OPatch succeeded.

Minimum Downtime Patch Mode

Another patching mechanism is referred to as minimum downtime patching. With this patching strategy, you divide your cluster into two sets of nodes. When patching, the first set of nodes is shut down and patched. Once the patching is completed, the second set of nodes is shut down. Now the previously patched first set of nodes is started, followed by the patch application to the second set. Once the patch to the second set of nodes is completed, the second set of nodes is brought up again, resulting in a patched cluster.

Local Patch Mode

Finally, individual nodes in the cluster can be patched by specifying the -local flag with the opatch command. A strategy similar to the rolling patch can be employed with this approach. In the end, you should verify that the patch information is correctly recorded in the Oracle inventory of each node.

Which patch you should use depends on the availability requirements and the preference of the user or the business. Minimum downtime patching and rolling patching are similar in the way they make the cluster partially unavailable. Some businesses have the luxury of a weekend downtime window, so they can use an all-node patch, which saves time and resources.

Instance Recovery in RAC

Another actor influencing availability of a RAC system is instance recovery. As noted previously, instance failure won't cause a complete outage of the application as with single-instance Oracle, where availability drops to nothing straight away, and users will notice the failure. In the single-instance scenario, the memory structures that make up the Oracle shared global area (SGA) will be gone and need to be reinitialized when the instance starts. Once that happens, the system monitor (SMON) process will have to perform instance recovery to get the database back into a consistent state and available to users. Unfortunately, this can take some time.

Instance recovery in RAC is very different than in the single-instance scenario, and it won't cause an outage. The instance failure will be detected by another instance in the cluster. Once detected, there is a window where the RAC system briefly will appear frozen for users, while the background processes remaster failed resources across the cluster. The time window while the system is unavailable or partially unavailable depends on the activity of the system, the number of blocks mastered in the failed instance, and the number of global resources such as enqueues that were held by the failed instance.

Instance recovery will be managed by any other instances on behalf of the failed instance. There are two streams of work that need to be done:

Enqueue Remastering: The Global Enqueue Service Monitor has to recover global resources such as enqueues (mainly TX and TM-transaction and table modification enqueues) and cache resources from the buffer cache.
Database Recovery: The SMON process performs database recovery similar to how it's handled in a single-instance crash scenario.

Some of these operations can be performed in parallel.

Enqueue Remastering

The first step during instance recovery is to remaster global enqueues. This task is performed by the Global Enqueue Service (GES). You can think of enqueues as global locks; some of the enqueue metadata held in the global resource directory is lost as consequence of the instance crash, and it has to be recovered by the surviving instances that are working together to do so. It's the LMON Global Enqueue Services Monitor process that is responsible for this work.

The global resource directory (GRD) is maintained jointly by all cluster nodes in their SGA. Cluster operations requiring access to resources in the cluster are synchronized, and this is exactly with the GRD is for. The GRD is maintained by the global cache service daemon and global enqueue service daemon, which are responsible for managing blocks in the shared SGA and global locks. Each resource in the cluster is mastered by a specific instance, and all other instances know the resource master for a resource.

The time it takes to recover the global enqueues depends on workload of the system, as well as on the number of failed instances and pending work. On a busy system, you will most likely find that TM and TX enqueues, which are table/partition modification enqueues and transaction enqueues respectively, take the longest to remaster. In the past—that is, in the days of Oracle Parallel Server—users experimented with setting the initialization parameter dml_locks to 0. You also occasionally find this advice on the Internet, but be careful! While this setting effectively disables enqueues, which makes remastering a quick process, it also imposes several restrictions that make such an approach untenable for today's environments. Specifically, setting dml_locks to 0 imposes the following restrictions:

You cannot use drop table or create index statements.
Explicit lock statements are no longer allowed. For example, you cannot issue a LOCK TABLE statement.
You cannot use Enterprise Manager on any instance with dml_locks set to 0.

The tradeoffs here make experimenting with the parameter not worth your while. As discussed previously, RAC instance recovery can have an effect on your application. Therefore, you should thoroughly test your application's exposure to instance failure before rolling it out into production.

The second step during instance recovery is to remaster global cache resources, which are the blocks in the buffer cache. The Global Cache Service daemon, GCS, is responsible for this task. To speed the process up, only those resources that have lost their master will be remastered. This approach marks an evolution from earlier releases, where all resources were remastered; obviously, the earlier approach took longer to complete. While resources are remastered, the LMS Global Cache Services process doesn't handle additional requests for accessing blocks in the buffer cache. Any sessions that are lucky enough to have the handles they need, but also don't need any further requests for resources (blocks), can carry on working. All other sessions must wait until the instance recovery process has finished.

Database Recovery

While the global cache resources (database blocks) are being remastered, the server monitor process starts building what is referred to as a recovery set. The recovery set is a set of blocks that need recovery; the SMON server monitor process needs to merge the online redo logs of all threads to complete the recovery set.

SMON also performs the next stage of instance recovery. The resources that were previously identified by SMON in need for recovery, the recovery set, need to be claimed to prevent other instances from accessing them while they wait to be recovered.

At this stage, the global resource directory is unfrozen. The global resources have been remastered (distributed among the surviving cluster nodes), and all the blocks in need of recovery are locked down for SMON to apply a redo to them. You are ready to roll forward now. All other blocks in the buffer cache are now accessible to users, and the system becomes partially available. As with single instance Oracle, some blocks are dirty because their corresponding transactions have not completed. Changes to those blocks need to be rolled back by applying undo information to them. The system will be fully available again as soon as all the blocks in the recovery set are actually recovered and the resource handles to them have been released.

This lengthy discussion illustrates why you probably want to keep the instance recovery process as short as possible. First and foremost, you'd prefer to keep users connected and able to carry out work in the application. Unfortunately, there is no RAC equivalent to the fast_start_mttr_target parameter we set in single-instance Oracle to govern a target time for instance startup after a database crash. The fast_start_mttr_target parameter supersedes the log_checkpoint_interval, log_checkpoint_timeout, and fast_start_io_target parameters, so it should be used instead. Setting fast_start_mttr_target allows Oracle to populate the V$INSTANCE_RECOVERY view, which can give the administrator an indication about the probable instance recovery time.

Note

As with any recovery scenario, you can set recovery_parallelism to a non-default value to spawn multiple recovery slaves during the recovery process (this requires parallel_max_servers to be > 0) to speed things up.

Oracle 10g introduced an underscore parameter called _fast_start_instance_recovery_target that allows you to influence the time needed to perform instance recovery. As with any underscore parameter, you need to check with Oracle support first to see whether setting it has any negative effects. The _fast_start_instance_recovery_target parameter governs the maximum amount of time between the point when an instance recovery starts and the rolling forward ends; in other words, it governs the time interval for which the system is not available.

Failover Considerations

When designing a RAC-based application, developers and architects need to address the fact that even the best hardware is not 100% reliable. Using RAC mitigates the implications of an instance failure, but some additional measures need to be taken on the application side to mask such a failure from the application and ultimately the user accessing it.

Unfortunately, not all applications are RAC aware. Commercial, off-the-shelf packages often don't come with RAC support. In such scenarios, an active/passive cluster (see Chapter 2 for more details) or a RAC One Node system might be a suitable alternative if the vendor can't be persuaded to certify a RAC installation.

Applications available with RAC support provided by the vendor or internally developed applications can use various high-availability technologies to mask instance failure. These technologies include Transparent Application Failover (TAF) and Fast Connection Failover (FCF).

Often applications are developed on single-instance Oracle. This is not a problem because an application can always be enabled to run on RAC. But it should be immediately apparent that it is easier to add failover support to applications developed in-house or to applications for which source code is available. When designing a new application, failover should be considered right from the start because the cost of introducing such features during a redesign can be huge.

Transparent Application Failover

As explained in Chapter 1, TAF allows database sessions to use OCI libraries to fail over to one of the surviving nodes of the cluster in the case of a node failure. Applications using the JDBC thin driver cannot benefit from this feature because that driver does not use OCI. TAF is a client-side feature: in the event of a node failure, notifications are sent out to trigger actions on the client side.

Failover Modes

Transparent Application Failover can be configured to work in two modes, or it can be deactivated. If we count deactivated as a mode, it means TAF can be assigned the following three options:

Session failover
Select failover
None (default)

The session failover mode is probably the most basic case because it doesn't require any code changes for implementation. All that is required is a suitable local naming definition in the form of a tnsnames.ora file or a TAF policy attached to a database service. TAF will automatically reconnect the user session using the same connection string if a node failure occurs.

The select failover mode allows clients to continue fetching rows from an open cursor after the session has been re-established on another node. Internally, TAF will re-execute the same query, discarding the rows already returned to the user and thereby making the failover transparent. There is a slight overhead associated with this because Oracle has to keep track of which data has already been transmitted.

If you explicitly specify none as the failover type, you prevent any failover from happening. The only reason to specify none, rather than omitting the FAILOVER_MODE clause altogether, is to explicitly deactivate TAF.

Note

Any DML changes that are created by transactions in progress can't be recovered by TAF. In such a scenario, the not-yet-committed transaction will be rolled back when the session is re-established on another instance.

In addition to node failures, TAF can also be configured to work with Data Guard configurations and active/passive clusters, thereby providing a failover capability in Data Guard scenarios.

Failover Methods

With the failover mode specified, users can further define a method that dictates exactly how TAF will re-establish the session on the other instance. A failover method can be defined independently of the failover type. The failover method determines how the failover works; the following options are available:

Basic
Preconnect

As its name suggests, the basic option instructs the client to establish a new connection only after the node failed. This can potentially lead to a large number of new connection requests to the surviving instance. In the case of a two-node RAC, this might cause performance degradation until all user connections are re-established. If you consider using this approach, you should test for potential performance degradation during the design stage.

The preconnect option is slightly more difficult to configure. When you specify the preconnect parameter, the client is instructed to preconnect a session to a backup instance to speed up session failover. You need to bear in mind that these preconnections increase the number of sessions to the cluster. In addition, you also need to define what the backup connection should be.

Fast Connection Failover and Fast Application Notification

Fast Connection Failover provides a means to transparently recover from instance failures for clients with support for Fast Application Notification (FAN). FAN is a framework for publishing UP and DOWN events for cluster reconfiguration. Such events could be changes to the availability of database services and instances.

The following clients that have integrated support for Fast Connection Failover:

Java Database Connection Driver (JDBC)
Oracle Universal Connection Pool UCP
Clients using the Oracle Call Interface
Oracle Data Providers for .Net

The easiest way to use FAN is to use one of the preceding clients. That way, no code changes have to be made to the application because these clients have integrated support for FAN events. In addition to protection from node failures, applications based on FAN and FCF can also use the load balancing advisory introduced in Oracle 10g Release 2 to make best use of resources in the cluster.

The great advantage of using FAN lies in its design. Instead of requiring the application to poll the database for information, the notification framework advises the application of cluster changes. It is less likely that an application will connect to hung or otherwise overloaded nodes, or try to use a stale connection pointing to a failed database instance.

Database services need to be defined to enable FAN. A database service is the primary tool for workload management in Real Application Clusters. It allows users to group similar workloads together or subdivide the cluster into disjoint units. Batch services, reporting services, payroll services are all examples of the sort of service we are talking about. Services are always associated with database instances. They can also be associated with specific attributes to enable high availability technologies such as TAF and FAN.

Note

You can learn more about database services in Chapter 11, which discusses workload management.

Fast Connection Failover is based on the high availability framework. Clients with support for FAN events receive and process them depending on the event sent. Technically, FAN and FCF rely on events published by the Oracle Notification Services background process, which is automatically installed and configured as part of the node apps in RAC. A JDBC connection pool is an example of a client that uses these events. Upon receiving a DOWN event from the framework, connections in the pool pertaining to the failed instance or service will be marked as invalid by the connection pool and then cleared up. To make up for the reduced number of sessions, additional sessions will be established on the other cluster nodes, as defined by the service. The prerequisites for using FCF are simple, and they can be met by standalone Java applications or connection pools.

Beginning with Oracle 11.1.0.7, the implicit connection cache previously required for FCF to work is deprecated. As with most Oracle features, it won't go away in the next release, but you can expect to see it removed eventually. The new feature called Universal Connection Pool (UCP) supercedes the implicit connection cache's functionality.

When specifying the db_domain initialization parameter, it is possible that FCF does not work in 11g Release 2's base release. This is due to an unpublished bug: "8779597 - [11GR2-LNX-090731] A NUMBER OF JDBC CONNECTION IS INCORRECT WITH FAN UP EVENT." A one-off patch is available, and the bug will be fixed in the first patch set to 11.2.0.2.

Interestingly, FCF is not limited to events published within the cluster. The Data Guard broker also publishes FAN events after a failover operation. Even single-instance Oracle, when registered with Oracle Restart, can receive events. Setting your connection string properly to reference primary and standby database host can make session failover after Data Guard role changes extremely efficient and elegant.

Scalability Considerations

One of the best definitions of scalability (a term which is used in many contexts) is found in the Oracle Performance Tuning Guide for database 11g Release 2. It states the following:

	Scalability is a system's ability to process more workload, with a proportional increase in system resource usage. In other words, in a scalable system, if you double the workload, then the system uses twice as many system resources.
	--—Oracle Database Performance Tuning Guide 11g Release 2 (11.2) Chapter 2 "Understanding Scalability"

Any system designed should adhere to the principles of scalability. Designing scalable RAC applications is not too different than designing efficient single-instance Oracle database applications. An application that scales well in a non-clustered configuration is likely to scale well on RAC, too. The requirement for writing a scalable application is always a given, even if only a small user base is expected initially.

Scalability Enhancers

Any application accessible over the Internet 365 days a year 24 hours a day poses special challenges to the architect. Many requirements are unclear at first, and usage patterns are difficult to predict until the application has reached an equilibrium state, at which time usage statistics can be interpolated into the future. Many application development cycles tend to be very short: the goal is to stay competitive in a highly volatile market. Similarly, testing time is limited. The short cycles and limited testing time require prudent planning and modern software development technologies to ensure that new releases don't compromise the overall system performance.

The Oracle RAC option potentially offers greatly increased scalability. In a correctly designed RAC cluster, additional hardware resources should yield increases in overall throughput. In other words, if you add another server to the cluster, it will either be able to handle an increased workload, or response times will be reduced. In addition, many large tasks can be divided into subtasks and executed concurrently on multiple nodes. Database services are the primary tool for mapping workloads to database nodes.

Note

The following discussion assumes that the hardware stack is adequately sized and does not include a limiting factor. Scalability cannot be viewed for isolated components in the application stack: every part of the architecture needs to be able to deal with the anticipated workload.

The introduction of the Cache Fusion cluster interconnect algorithm in Oracle 9i puts RAC a quantum leap ahead of Oracle Parallel Server, allowing most applications to run on RAC with little or no modification. With Cache Fusion, the cluster uses one shared SGA across all database instances. Resources such as blocks in the buffer cache and (global) enqueues are mastered by individual nodes and are controlled through the Global Resource Directory. In Oracle 8i OPS and earlier, it was important to isolate workloads from one another because inter-instance block transfers using the block ping method (you can learn more about this method in Chapter 1) were very resource intensive and to be avoided, if possible. The only way to avoid such block transfers was by assigning similar workloads to the same instance. Cache Fusion addresses the problem of workload isolation to instances by allowing transactions to run on multiple nodes without fear of cache transfers. High speed interconnects such as Infiniband or 10Gigabit Ethernet further reduce cache transfer times. However, RAC users should still be cautious when employing parallel operations across cluster nodes because this can potentially lead to interconnect congestion. Cache Fusion mainly helps when scaling OLTP-like workloads.

Today's databases tend to grow increasingly larger, and this trend doesn't only apply to data warehouses. With the decreased cost for storage and the demand by businesses for storing highly detailed information as opposed to aggregates, the maintenance of tables and indexes has become a challenge. Databases should therefore undergo careful capacity planning in the design stage, especially in terms of expected data growth. Segments that are expected to grow considerably should be partitioned right from the start to allow better performance and/or manageability. If you find out that some segments are growing quicker than initially expected, you can introduce partitioning to them by employing the DBMS_REDEFINITION package for an online table redefinition.

Long-running batches in data warehouses and hybrid workloads can benefit from the parallel execution feature, which is discussed in great detail in Chapter 14. Parallel execution makes maximum use of the available resources in order to complete a task, such as a data load or a long-running report.

To help cope with data growth, Oracle 11g Release 1 simplified the compression functionality already present in the database kernel with the advanced compression feature. Prior to Oracle 11g, compression offered a way to deal with rapid data growth, but it required bulk inserts or direct loads to compress data and did not work on large objects and some user defined data types. Targets for compression were tables, table partitions, and materialized views. The compression feature extended the index compression that was already part of the RDBMS kernel. The easiest ways to use compression in data warehousing was to use the SQL loader command-line tool for direct path loads or to reorganize objects using CREATE TABLE AS SELECT statements. Alternatively, data could be inserted either in parallel with the APPEND hint or by using DBMS_REDEFINITION to compress tables/partitions online. Compressing data in tables can save space, improve cache efficiency, and reduce I/O operations when retrieving data. However, there was an overhead associated with the compression and decompression of the data blocks.

Oracle 11g lifted many restrictions in the advanced compression option, making compression useful for OLTP workloads by introducing support for insert and update operations. Instead of being compressed when filled, blocks are now compressed in batch operations to minimize the impact on the system.

Note

Advanced compression offers far more than table compression. For example, it allows for Secure File de-duplication and compression, Data Pump compression, and the compression of the redo stream in Data Guard, topics that are outside the scope of this book.

The Oracle database offers many additional features that can improvement scalability for RAC clusters, including the following:

Automatic memory management
Reverse key indexes
Automatic Segment Space Management
Sequence caching
Locally Managed Tablespaces
Automatic Storage Management
Result Cache

Most of these features can simply be implemented by the database administrator. In the case of third-party applications, however, it is important to confirm with the vendor that the application will be still supported if you use them.

Scalability Inhibitors

There are some pitfalls you must avoid when developing applications in RAC. Let's ignore unsuitable and non-scalable hardware for the moment, and look at some other causes for this. For example, poor application design contributes massively to poor scalability. The following paragraphs list some common problems that should be avoided.

Many developers are taught object oriented-programming languages such as Java or C#, which allow designers to develop abstract base types and use features such as inheritance to subclass these types for their individual needs. The object-oriented approach to developing needs to be translated into the relational model when using a database as a backend. There are many frameworks available to perform this task, but many of these tend to be generic in nature, and typically they don't fully support the capabilities of a modern database management system. In fact, many frameworks use the database as little more than a glorified file system, which makes tuning applications based on these frameworks difficult.

When developing applications for the Oracle database, it is important to pay attention to schema design. Oracle offers a wealth of data and segment types to choose from, depending upon usage requirements. Suboptimal schema design can cause queries to be expensive or to use lots of physical and logical I/Os, which ultimately hampers scalability. As a rule of thumb, the application schema should be normalized. However, it is sometimes acceptable to de-normalize the schema for performance reasons. However, we want to emphasize that suitable schema design yields the greatest performance benefit.

Caching information in the application server layer is a design approach that we often encounter. Such caching is done with performance is mind. It is well intended, but typically it only masks the underlying problem in the short term. The problem leading to the creation of the cache will return when the volume of data exceeds the cache's size.

Ineffective transaction design is another performance inhibitor. Ineffective design often manifests itself in the form of LOCK TABLE and SELECT FOR UPDATE statements. These can cause locking and serialization problems that only get worse in a RAC environment.

Some applications that use legacy technology can present scalability problems. Web servers that serve dynamic content utilizing the Common Gateway Interface to connect to a RDBMS instance often establish new connections to the database for each web page served to the client. The pattern in these applications is to connect, perform a little bit of work, and then disconnect. Unsurprisingly, such applications can cause a considerable load on the database server. Prior to Oracle 11g, this load could somewhat be alleviated by using multiple shared servers and dispatchers. A better way to support these applications, however, is to configure the Database Resident Connection Pool (DRCP) that came with Oracle 11g Release 1. The DRCP has specifically been designed to care for applications that do not or cannot use connection pooling. DRCP works by providing a pool of dedicated connections within the database, rather than in the middle-tier. Oracle's Real User Experience Insight product uses DRCP with great effect to connect the user interface (written in PHP) to the repository database. DRCP memory requirements can be lowered compared to both dedicated server and shared server connections. This allows applications to scale well for large user numbers.

Unsuitable or incorrectly specified hardware can also contribute to scalability problems. Saturated storage controllers or write caches that are undersized can lead to degraded IO performance of the system. For example, overcommitting memory on a database server can lead to a situation where the kernel daemon responsible for freeing memory drives the load average so high that the system becomes unresponsive and has to be rebooted.

REAL WORLD EXAMPLE: ADDRESSING PERFORMANCE ISSUES

Some time ago, we were asked to help troubleshoot application performance for a web-based application that connects to an Oracle active-passive cluster based on Oracle Clusterware. Users were complaining about the application "hanging" and a slow response time.

Upon investigation, the following became apparent:

Bad transaction design severely limited scalability.
The application code accessing the database was built using an inflexible development framework.
The application was not stateless, as one would expect from a web-based tool.
The connection pool was oversized.

The worst part—and a problem that is nearly impossible to fix (because it originated from the application development framework)—was the transaction design. Whenever a user modified data in the frontend, the connection her application grabbed from the connection pool would turn every subsequent select into a select for update statement. Even though the number of users was not huge, the select for update commands from multiple sessions effectively introduced massive enqueue problems that resulted in the hanging or slow performance situation. Oracle deadlock detection alleviated this problem a little from a database point of view by rolling back a few transactions, but otherwise this problem required a major application rewrite to fix. Lesson learned: Any application design should address scalability right from the outset.

Standby Databases

Real Application Clusters is primarily a high availability and scalability solution. Among its main benefits: It protects the system from loss of service in case of instance failure, which would otherwise cause an unplanned outage in a single-instance setup.

However, a catastrophic site failure is usually not covered by RAC, except for the case of a rare implementation of extended distance clusters. Also, common human errors such as the following can cause logical corruption in the database:

Dropping a table with the deactivated Flashback Table feature
Updating data with an incorrectly specified where clause
Confusing the production development with the development or test environment

Without standby databases, either a flashback database call or a point-in-time restore is needed to recover from such mishaps. With today's large database systems, a point-in-time restore is usually not an option due to the large mean time to recovery associated with a full restore and recovery operation. Adding in the fact that RAC has been chosen in many cases for high availability highlights why a few hours of downtime cannot be tolerated. The Flashback Database feature introduced in Oracle 10g has greatly reduced the recovery time in many cases. In fact, the Flashback Database feature has become invaluable, not only for recovery of the production live database, but as we will see later in this chapter, for Data Guard scenarios as well.

Introduction to Oracle Standby Databases

To protect from site failures or human error, an additional precaution needs to be taken in a RAC environment to provide disaster recoverability. Oracle Enterprise Edition has offered a feature called Oracle Data Guard since version 7.3 to address this requirement.

In Oracle 7 and 8/8i, this feature was referred to as a standby database. The principle behind the technology is simple yet effective: an identical copy of the live (or primary) database (AKA the standby database) is instantiated in a remote data center. The standby database is constantly in the state of media recovery, unless opened for read-only access. Without the Active Data Guard option introduced in Oracle 11.1, there is an additional caveat you need to be aware of: while the database is opened in read-only mode, it doesn't apply changes received from the primary database.

While not a problem per se, the fact that changes are not being applied can extend the time needed to transition to the standby database. This is because additional archived redo logs have to be applied unless you are willing to incur data loss. Without Active Data Guard, the database has to be in mount state for managed recovery to work. The mount state prevents users, except for those with sysdba privilege, from connecting to the database. Any attempt to do so will result in an ORA-1033 "Oracle initialization or shutdown in progress" message.

When the standby feature was introduced in Oracle 7.3, maintaining a standby database was a highly manual process: the database administrator was in charge of transferring archived redo logs generated on the primary database to the standby site using utilities such as rcp or ftp (rsync). Once the logs were on the standby site, the standby database had to be placed in recovery mode. The only possible action the administrator could take was to activate the standby database in order for it to assume the primary role. This process where the DBA copied logs was referred to as manual recovery.

Beginning with Oracle 8i, the standby database uses managed recovery to stay in sync with the primary database. Using Oracle Net*8 communication, the primary database ships changes to the standby database, which are subsequently applied to the data files to keep the systems in sync. The application of changes can be delayed to protect the system from the aforementioned user errors. A standby database can also used for reporting or backing up data; this removes some of the load from the primary database.

A further milestone was reached with Oracle 9i, which introduced the logical standby database and graceful switchover operations. It was also in Oracle 9i that the standby database feature was renamed to Data Guard. Users of Data Guard were also given another choice for transmitting redo information to the standby. In addition to the archiver, which traditionally shipped information to the standby database after an online redo log was archived, the log writer process could be used to perform the same task. Standby redo logs were introduced as the counterpart to the primary database's online redo logs. Instead of having to wait for a complete archived redo log to arrive, the redo stream could be written into a standby redo log, thus reducing the risk of data loss. Oracle 9i Database also introduced the Data Guard broker with support for Enterprise Manager, as well as a command-line tool to simplify the setup and management of standby databases.

Another noteworthy evolution came with Oracle 10g, when the Real Time Apply feature was integrated into the database kernel. Using standby redo logs on the standby database server, the redo stream arriving on the destination could be applied to the standby database immediately, without having to wait for the standby redo log to be archived and applied. This further reduces the possibility of data loss.

Figure 3-1 illustrates the concepts for Oracle 11g, where redo generated by user activity on the primary database is transported via the Log Network Server (LNS0) process—not the log writer, as in previous versions—to the standby database's Remote File Server (RFS) process. In turn, the RFS process writes the redo stream into standby redo logs. The managed recovery process (MRP0) on the standby database applies the new information as soon as it arrives. Once filled, the standby redo log is archived by one of the standby database's archiver processes.

This figure shows the Data Guard physical standby configuration with asynchronous redo transport and real time apply under 11.1; note that 11.2 uses NSAn instead of LSNn to ship redo asynchronously.

Figure 3.1. This figure shows the Data Guard physical standby configuration with asynchronous redo transport and real time apply under 11.1; note that 11.2 uses NSAn instead of LSNn to ship redo asynchronously.

Provision of support for data corruption has been introduced in Oracle 11g Release 1. Setting the new db_lost_write_protect parameter helps prevent lost writes, and Data Guard will also try re-fetching corrupt blocks detected during the redo apply to a physical standby database, and vice versa. This is referred to as Automatic Block Media Recovery.

With Oracle 11g Release 2, the previous limit of 10 archive log destinations (both local and remote) has been lifted, and up to 30 standby databases can now form part of a Data Guard configuration. A cascaded standby database configuration in which a standby database passes on redo to another standby database is not possible (read: supported) if the primary database is a RAC 11.2 database.

Types of Standby Database

You can choose from the following four different types of standby databases:

Physical standby database
Snapshot standby database
Logical standby database
Transient logical standby database

Each of these will be covered in more detail in the sections that follow.

Physical Standby Database

The physical standby database was the first standby database option available. In all aspects, a physical standby database is an identical bit-for-bit copy of the primary database. All schema structures, database users, and segments are identical down to the block level.

A physical standby database is kept in sync with production through the application of redo (referred to as "redo apply"). The process employed is the same one that a database administrator would use to recover a live database after a media failure. Apart from disaster recovery scenarios, a standby database can be used for reporting and backup. With a little bit of manual effort, a physical standby database can also be used for testing hot fixes on production systems. You do this by activating it for read-write access after having taken it out of the Data Guard configuration. Once the testing of the hot fix on like-for-like production data is complete, the Flashback Database feature can be used to flash the database back to the point in time prior to its activation, followed by its conversion back to a physical standby. The downside to this procedure is that, while the database is open for read-write access, it won't receive any archived logs from the (real) primary database. This can cause a lot of network traffic once the database has been converted back to a physical standby.

Snapshot Standby Database

The snapshot standby database achieves the exact same result as the physical standby database opened read-write for testing, as just described. However, the snapshot database doesn't require that the administrator worry about the fine print and all the details. The snapshot standby will receive archived logs from production, significantly reducing the overhead during gap resolution. However, the archived redo logs received from production aren't applied until after the snapshot standby database has been converted back to a physical standby database, so the time it takes for the standby database to get back in sync with production is proportional to the amount of redo to be applied.

When upgrading a database with a standby database(s) in place, the redo transport mechanism will ensure that the dictionary changes are propagated to all destinations for as long as the catalog upgrade script is executing on the primary site. This is true for both physical and snapshot standby database configurations. All you need to do is ensure that the Oracle binaries on the standby servers exactly match the binaries on the primary database.

You can have both single-instance standby databases and multi-node RAC systems for your disaster recovery solution. However, bear in mind that your standby database must to be able to cope with the workload in a disaster recovery situation. You are well advised to use identical hardware for both production and standby environments. If you use identical hardware, and your standby database is a RAC database as well, then all instances can receive redo from the primary database, thereby spreading the load. However, only one instance can apply redo.

Logical Standby Database

A logical standby database differs from a physical standby database in that it is not an exact 1:1 copy of the live database. Every logical standby database starts its existence exactly like a physical standby database, but it is then converted for read-write access. At this stage, the primary database and logical standby deviate. Physical (and snapshot) standby databases stay synchronized through the application of redo logs. However, a logical standby database stays synchronized by having it execute all the same SQL statements as the primary database. This mechanism is often referred to as SQL Apply.

Internally, SQL Apply uses the log miner feature to extract SQL statements from the redo stream. It then applies SQL statements rather than redo to the standby database. Therefore, a logical standby database has the same data structure as the primary database, but the physical representation of the data in the database is likely to be different. There are also some restrictions as to which data types are supported on the primary database, and this list keeps growing from release to release.

Another big difference between a physical and logical standby database is the fact that a logical standby database is open read-write while it still receives changes from production. A logical standby database is unlikely to be used for disaster recovery purposes. Its main purpose is to provide an environment in which reporting can be offloaded from production as changes from the live system are fed into the database. This provides a high degree of data accuracy. The fact that the logical standby database is open read-write means that additional data structures such as indexes and materialized views can be created to speed up queries that would otherwise be too expensive to maintain on the primary database.

Finally, logical standby databases can be used as part of the process to upgrade primary databases to newer releases or to apply patch sets to a system with almost no downtime. This little used technique is referred to as a rolling upgrade in the Oracle documentation. The transient logical standby database discussed in the next section is what you really want to use in cases where you want to apply a rolling upgrade in your own environment.

Transient Logical Standby Database

Oracle has recognized that few businesses are willing to set up a logical standby database only for the rolling upgrade of an Oracle database. Setting up a logical standby database is not a trivial task, and maintaining a logical standby database requires close monitoring to check that all transactions have been applied. For this reason, Oracle 11g Release 1 provides the capability to transiently convert a physical standby database into a logical standby database for the duration of a rolling upgrade. After the upgrade, the logical standby database is converted back to its original role as a physical standby database.

This type of standby database is not listed under the name transient logical standby in the documentation; however, it is mentioned in Chapter 12 of the Oracle Data Guard Concepts and Administration Guide. Chapter 12 in that guide covers how to perform rolling upgrades of a database. The rolling upgrade steps are identical to using a logical standby database for the upgrade process; however, the setup of the logical standby database is greatly simplified.

Active Data Guard

The majority of standby databases are probably physical standby databases running in remote disaster recovery data centers waiting to be activated. Many users of such configurations have remarked that this isn't the best use of resources. While it is possible to use the standby database as a source for backups, the tapes must be shipped from the DR site to the primary site if something occurs. The other option, using the standby database for reporting and ad-hoc queries that couldn't possibly be executed on the primary database, also has a downside: while the database is open for read-only mode, no archived logs are applied, which causes the primary and the standby databases to go out of sync and the data to become stale.

Beginning with 11g Release 1, Oracle addressed these concerns with the Active Data Guard option, which needs to be acquired on top of Enterprise Edition. When purchased, this option offers the following benefits over and above what Data Guard already provides:

Physical Standby databases in read-only mode can have managed recovery enabled. This allows users to offload their queries to current production data, thus combining the best of both worlds. Oracle refers to this feature as Real Time Query.
This option also allows the use of block change tracking for faster incremental backups on the standby database.

Active Data Guard can also be used as a scalability tool for web-based content. For example, multiple standby databases opened read-only with the Active Data Guard option enabled to provide real time data access to web servers. This significantly scales data access. Updates to the data occur only on the primary database through a controlled user interface, and changes are immediately propagated to the reader farm through Real-Time Query.

Role Transitions

Data Guard supports two types of role transitions: a graceful switchover and a failover. During a switchover operation, the primary database ensures that no data loss occurs by performing a log switch and waiting for the redo to be applied on all databases that form part the Data Guard configuration. Only then will it transform into a standby database itself. At this stage, all the databases forming part of the Data Guard configuration are physical standby databases. The administrator then chooses one of the other standby databases to assume the primary role.

A switchover is an elegant solution that helps to minimize time required for performing the following maintenance operations:

Replacing hardware
Migrating to another storage technology, such as Automatic Storage Management
Migrating to another storage array
Moving the data center
Changing the word size of a database
Upgrading database and Clusterware versions
Upgrading the operating system

Although not entirely free of downtime, a switchover provides a proven technology for performing such tasks.

REAL WORLD EXAMPLE: SWITCHING OVER GRACEFULLY

While working on a long-overdue hardware upgrade project, we successfully employed a graceful switchover. In one case, we increased the number of cluster nodes from two to three, upgraded the operating system, modified the underlying storage, and changed the database word size.

The system to be replaced consisted of a two-node, 32-bit Red Hat 3.9 installation relying on a 10.2.0.3 RAC cluster using OCFS version 1. The new system consisted of three nodes, with 10.2.0.4 Clusterware and ASM/RDBMS binaries running Red Hat Enterprise Linux 5 update 3 in 64-bit mode. In lieu of OCFS1, we moved to Automatic Storage management. At the same time, an identical hardware configuration for the standby database was enabled in a remote data center.

Counting the time for preparation and sanity checking, the whole operation took only a little longer than one hour.

A failover indicates a more severe situation in which the primary database is no longer available for a switchover, possibly because of a site failure or the loss of the storage backend. In such a situation, the administrator should try salvaging as many outstanding archived logs as possible. The DBA should also minimize or, preferably, eliminate the gap to be resolved on the standby database before activating one of the standby databases for read-write access. Data loss may occur, depending on the protection mode the Data Guard configuration is using.

Prior to the introduction of the Flashback Database feature, activating a standby database always implied that the primary database had to be completely rebuilt by restoring from a backup. Today, however, if the failed primary database has flashback enabled, then the time and effort required for a reinstantiation of the database can be greatly reduced, assuming that it can be started up without problems. For example, once the failed database has been restarted after a complete data center power outage, it is possible to flash it back to the system change number that existed prior to the activation of the new primary. From there, the database can be converted to a physical standby with only a few commands. When the situation clears, the former primary database can then be the target of a graceful switchover to restore service to before the failover.

Role transitions are slightly different, depending on the type of standby database (logical or physical) to assume the primary role, and each database administrator should be familiar with the steps necessary to carry out the different role transitions. Regular disaster recovery tests should be performed to ensure that the standby database(s) and the entire dependent infrastructure, such as load balancers and application servers, can execute the workload in the disaster recovery center. Easily accessible documentation is important because it can help the more junior team members perform this task and keep a cool head during a crisis.

Note

Switchover operations involving RAC used to require that all but one instance were shut down prior to issuing the switchover command sequence. This was true until Oracle 11.2, when this restriction was partially removed.

Data Protection Modes

Data Guard offers three different data protection modes. Depending on the business requirements, Data Guard can be set up for maximum performance without affecting the operation of the primary database; alternatively, it can be set up to ensure zero data loss. There are advantages and disadvantages to all three options (see Table 3-4 for a description of available modes).

Table 3.4. The Data Guard Operation Modes

Operation Mode	Description
Maximum Protection	This mode provides the highest level of protection from data loss, ensuring that there will be zero data loss if the primary database fails. To achieve this level of protection, the standby database has to acknowledge that the redo generated by a transaction on the primary database has been written into its standby redo logs (in addition to the primary database's online redo logs) before the transaction can be committed on the primary database. If the primary database can't write to the standby database's standby redo logs, it will shut down to prevent data loss from occurring. Obtaining a zero data loss guarantee comes at a price: the application's commit time can increase compared to a different protection mode, and the primary database may shut down as a consequence of a network problem.
Maximum Performance	This mode dictates that the performance and availability of the primary database are not affected by the standby database. The default protection mode, maximum performance, has no redo write dependency between the primary and standby databases; in other words, the primary database commits transactions independently of the standby database. Because of this, many businesses introduce regular, forced log switches on the primary database using the `ARCHIVE_LAG_TARGET` initialization parameter.
Maximum Availability	This is a hybrid mode that strikes a balance between the other two modes. For transactions to commit on the primary database, at least one synchronized standby database must have received the redo in its standby redo logs. If it's not possible to write the redo stream to at least one synchronized database, the primary will perform as if it were configured in maximum performance mode.

The Data Guard Broker

The Data Guard broker is an integral part of the replication framework that lets you define Data Guard configurations, including support for all types of standby databases. The broker is installed by default with the RDBMS binaries. Its architecture was added to Data Guard in Oracle 9i, and it is a mature but little-used feature. From a user's point of view, the Data Guard broker simplifies the setup, maintenance, and monitoring of Data Guard configurations, as well as role transitions; RAC, of course, is fully supported. The tight integration into Enterprise Manager (not DBConsole!) also allows the creation of physical and logical standby database through simple mouse movements and a few key strokes. The usability of the Enterprise Manager integrated broker is higher than the command line interface alternatively available to the administrator. Enterprise Manager is more feature rich but requires additional infrastructure.

When used, Data Guard broker will rely on its own binary configuration files and additional background processes to configure the relevant initialization parameters upon instance start; it will also monitor the databases in the configuration. In RAC, the configuration files need to be on shared storage. ASM, raw devices, and cluster file systems are suitable candidates to store the files. You don't have to replicate the configuration on each database. Rather, the broker will automatically preserve the single image view of your Data Guard configuration by replicating changes to all database involved. You should not try to issue SQL commands through sqlplus to modify the Data Guard configuration because your changes are likely to be overwritten the next time the Data Guard broker starts: once the broker, always the broker.

Conceptually, the main objects the Data Guard broker operates on are configurations, databases, instances, and properties. To begin, a configuration is created with the primary database only. Up to 30 standby databases can then be added, along with their respective attributes. Data Guard broker will automatically detect whether a database is made up of multiple instances and register them with their database. Once all databases are added in and the administrator is happy with the setup, the configuration can be enabled. From there, the Data Guard broker takes command over the Data Guard environment.

The database object has properties and state information associated with it. As noted earlier, you no longer set initialization parameters directly. Instead, you change the state of the database to enable/disable log shipping or enable/disable managed recovery. The list of database properties you can either read or modify has grown with the evolution of Data Guard. Important properties you can modify include the following (among other options):

Synchronous or asynchronous sending of redo to the standby database
The delay for the application of a redo to the standby database; this setting is useful for preventing data corruption caused by human error in a multi-standby database environment
Compression of the redo. Beginning with Oracle 11.1, archived logs could be compressed during gap resolution; Oracle 11g Release 2 introduces compression for the redo stream, but a license for the Advanced Compression option is required.
Database file name conversion
Log file name conversion
The preferred apply instance for RAC
Apply parallelism

When using Enterprise Manager, you don't have to remember the property names. Instead, you use the graphical user interface (GUI) to manage your Data Guard configuration.

The Data Guard broker has built-in support to modify redo transport and apply services for RAC and single-instance installations. As an additional benefit, the Data Guard broker allows you to set up automatic failover for unsupervised Data Guard configurations. This is referred to as Fast Start Failover. With this feature, an observer process running on hardware separate from the database servers monitors the primary database, and it can be configured to automatically fail over to the standby based on conditions. In the context of RAC, the Data Guard broker assumes that the RAC database is available, as long as at least one instance responds to user requests. Beginning with Oracle 11g Release 1, an API to control the Fast Start Failover option is available in form of the DBMS_DG package. After the failover is complete, the Data Guard broker will post a Fast Application Notification event indicating that the new primary database is ready for use. FAN uses the Oracle Notification Services process to post the event, so the database needs to either be a RAC database or single-instance database integrated with Oracle Restart.

Note

Oracle Restart is a new feature that was introduced in Grid Infrastructure 11.2. In simple terms, it allows the registration of resources, such as ASM disk groups, ASM, and RDBMS instances in Clusterware. This works similarly to the way it works in RAC. The aim is to make custom startup scripts obsolete by starting all resources and their dependencies on a database server through Grid Infrastructure. Oracle Restart was also known as single instance high availability (SIHA).

Extended Distance Clusters

Extended distance clusters, or stretch clusters, are a special form of RAC system. Most RAC systems are designed to reside in a single data center, with the database servers, application servers, switches, and storage located close to one another. As the name suggests, extended distance clusters are different. Figure 3-2 illustrates some of the key concepts that apply.

Figure 3.2. An architectural overview of an extended distance RAC

Using the same basic building blocks as "normal" clusters, individual nodes are geographically dispersed, offering protection from (local) site failure. Extended distance clusters compete with Data Guard as a means to protect users from site failure; however, Data Guard physical standby databases will be located further away from the primary under normal circumstances. When configured correctly, they can satisfy the disaster recover requirements better than an extended distance cluster.

The extended distance cluster relies on a dedicated low latency, high throughput cluster interconnect even more than a RAC cluster in the same cabinet does. Such a dedicated interconnect becomes increasingly more expensive to lease as the distance between the nodes grows.

Extended distance clusters are set up using two sites in most scenarios. Each site has its own storage and is mirrored on the SAN level or through Automatic Storage Management. In case of ASM, all disks in the local SAN are aggregated in a single failure group. ASM also needs to use normal redundancy for the disk groups used by the cluster. The extended distance cluster feature became usable for practical purposes with the release of Oracle 10.2.0.3 Enterprise Edition. Oracle 11g Release 1 lifted the requirement that hardware had to be located in the same server room or cabinet for Standard Edition RAC, allowing users to roll-out extended distance clusters with a Standard Edition license. Oracle 11.1 also introduced an ASM feature called Preferred Mirror Read. This feature allows the local instance to read data from the local SAN. In Oracle 10g, ASM could only ever read the primary copy of an extent that could be located on the remote SAN, thus introducing unneeded latency.

Special attention is required for the deployment and configuration of voting disks that are used among other tasks to manage node membership. Since Oracle 10g Release 2, it's been possible to define three voting disks in the cluster. For extended RAC, voting disks should to be configured to reside on each local storage array for the first and second voting disks. The third voting disk can be placed on inexpensive storage, such as an NFS on an independent (i.e. non-cluster) system to provide quorum.

Oracle Streams

Oracle Streams is a highly flexible messaging system built into the Oracle database Enterprise Edition. To a lesser degree, it is also available with Standard Edition since Oracle 11g Release 1. Using the Oracle Streams technology, it is possible to capture, propagate, and apply data definition changes, data manipulation changes, and user-defined messages from a source database to one or multiple destinations. It is even possible to use Oracle Streams to send data back to the source database; however, we, the authors, struggle to identify a use case for this. Oracle Streams also offers support for heterogeneous messaging, and you could also employ it to send data to a non-Oracle database.

It might help you to think of Oracle Streams as a technology similar to Data Guard's logical standby database because it uses similar concepts under the hood. However, Oracle Streams provides more flexibility for sharing information with other databases. It also allows you to share information across platform boundaries. Cross platform data support exists for Data Guard physical standby databases, as well; however, you cannot use this feature to replicate a subset of the database. Therefore, Oracle Streams is an excellent tool for platform migrations and database consolidation, far more suitable than the classical export/import method so often used traditionally. Oracle Streams also offers advantages over the transportable tablespace/database feature because it allows the source database to stay online and remain usable while the changes applied to it are applied to the new destination database.

There are many other uses for Oracle Streams as a technology. For one, it is vastly superior to the old Advanced Replication toolset, but it can also be employed to trickle-feed data warehouses. For another, it can be used for platform migration, as well as for notification services and message exchange in general. It is possible to modify the data stream at almost any stage using flexible rule sets or handler functions, which makes data transformation on-the-fly possible. Interestingly, Oracle Streams also provides the foundation for other Oracle technologies, such as Change Data Capture (CDC).

The remainder of this section will focus on Oracle Streams as a replication and high availability tool. Replicating data from a source to a destination database is performed in three major processing steps:

Capturing messages
Propagating messages to the destination(s)
Applying messages

Depending on the capture type, the propagation step can be omitted. Unlike with a Data Guard configuration, Enterprise Manager DBConsole allows the configuration and maintenance of an Oracle Streams setup.

Streams Processing

Capturing data marks the beginning of the stream's data flow. Capture can be performed either on the source database or on the destination database. Capturing data on the source is referred to as local capture, whereas capturing data on the destination database is referred to as downstream capture. The capture process uses the log miner functionality to extract SQL commands from the redo logs, hence its the similarity to a Data Guard logical standby database.

Note

Oracle 11g provides another capture type called synchronous capture. This chapter does not discuss this feature because it has a number of limitations, such as not being able to capture DDL changes. However, this feature is available in Standard Edition and Standard Edition 1.

Supplemental logging onto the captured object (or, depending on the scope of the replication, the whole database) places the additional information necessary for the capture process to construct the logical change records into the redo logs. A logical change record is an internal representation of a database change, such as a DDL or DML command. This supplemental information is required by the apply process to merge the LCR to the destination database (this will be covered in more detail momentarily). Supplemental logging creates overhead; sufficient testing should ensure that the database can handle the new load.

Regardless of whether a local or downstream capture process is used, user-definable rules determine which information should be captured from the redo logs. It's this freedom to choose what information gets captured that makes Oracle Streams more flexible than a logical standby database. The user defines which tablespace, schema, or tables to capture changes from; the user can also define negative rules to filter information. Once extracted, the capture process transforms the information into a logical change record (LCR) and enqueues it into a buffered queue that draws its memory from the SGA.

Oracle Streams uses queueing to propagate or stage LCRs extracted from the redo logs. The queues are buffered, but their contents can "spill" to disk if too much memory is consumed. Again, you can define rules to filter out particular LCRs and exclude them from propagation. Oracle employs a throttling mechanism if the enqueue rate is too high.

Propagation between source and destination queues is handled by the database scheduler jobs, and Oracle Streams guarantees message delivery. Propagation uses database links to transmit the information from the source to the destination for a configuration if local capture is used. When employing downstream capture, propagation is not necessary because the apply process will dequeue messages the capture process added to the local queue.

Once the information reaches the destination database queue, the apply process can dequeue the LCRs and apply them to the database. Alternatively, it is possible to use your own code to which a LCR can be passed before it is applied to the database for inspection or modification. Such a processing set is referred to as an apply handler in the Oracle documentation, and it is especially useful for data warehouse feeds or complex requirements where data must be cleansed. You can have such handlers for data manipulation changes, data definition changes, and other user-defined message types. Identical to the capture and propagation processes, you can define a rule set to discard certain LCRs before application to the database objects.

The apply process is a critical part of the Oracle Streams high availability and replication framework. Many views exist to monitor its performance. Viewing the contents of LCRs is especially useful in error situations. By default, the apply process will stop when it encounters an error, such as a foreign key violation or a tablespace with no space left in it. All errors will be reported to the so-called error queue, where they are available for review by the administrator. It is possible to retry individual transactions or all of the enqueued erroneous transactions after the underlying problem has been fixed.

Oracle Streams Prerequisites

Technically, Oracle Streams uses an administrator account most often named strmadmin. (You might remember the earlier repadmin account for advanced replication.) However, if your environment has strict security requirements, individual accounts can be used for capturing, propagating, and applying data. Next, bi-directional database links need to be configured between source and destination to serve two purposes:

During the initial setup, Data Pump can be used to initialize the streams configuration. In this case, the database links serve as the equivalent to the command-line network_link parameter.
Streams allows bi-directional replication.

At this stage, the capture and propagation processes are created to capture changes prior to the next step: object instantiation. Once the prerequisites are in place, you need to instantiate the objects to be replicated on the destination database. Data volumes permitting this can be implemented using Data Pump on-the-fly or through other techniques, such as RMAN. Finally, the apply process is created to receive changes from the source database. Once the propagation is enabled, data will start to flow. The Oracle Streams Replication Administrator's Guide also recommends creating a dedicated tablespace for the streams administration account to separate the I/O workload if messages spill to disk, which happens when the propagation can't keep up or flow control is enabled for other reasons.

In addition to the Enterprise Manager Grid Control/DBConsole GUI, Oracle introduced the DBMS_STREAMS_ADM package in 10g Release 2. It offers a number of procedures to combine the instantiation of streams after the database links and strmadm administrator accounts have been created. It is highly recommended that you take a look at this package. Note that many of the instantiation procedures have a perform_action parameter that can be set to false. The result of the call will be a text file that contains the steps necessary to implement streams. Using DBMS_STREAMS_ADM offers another benefit: if anything in the execution of the script goes wrong, the problem and progress are listed in DBA_RECOVERABLE_SCRIPT_ERRORS, and they can be resumed at the stage where it failed.

Before using Oracle Streams, it is important to review the My Oracle Support notes about Oracle Streams patches. The main document seems to be note 437838.1, which is titled "Streams Specific Patches." A number of notes have been written that advise those who adopted Oracle Streams to apply patches on top of their patch set. This can be difficult because the introduction of PSUs complicates this approach. Additional one-off patches can cause patch conflicts, and they may have to be replaced by merge level request (MLR) patches. Oracle recommends always applying the latest PSU. You should also consult My Oracle Support for the recommended setup information for your database version and Oracle Streams.

Cluster Topologies

We've covered a lot of introductory information so far. Now it's time to bring it all together for the big picture. Consider a common multitiered, web-based application acting as a company's intranet. Figure 3-3 shows a simplified view of such an application. The necessary DR environment and network switches/firewalls that you'd undoubtedly find have been omitted for clarity. Also, the number of load balancers, webservers, and RAC nodes is arbitrary, so you might find that your application needs more (or possibly less) computing power.

Figure 3.3. A simplified view of a three-tiered application

First, you have the "outside" world, where clients are located that connect to the public network switch and grant access to the load balancing layer. Load balancers should be employed to evenly distribute the incoming connection requests to the web server layer. This should prevent a single application server from being overloaded with requests while others are idle. Many load balancers employ a DNS round-robin algorithm, whereas a few products are capable of recording the load on the web server and routing the request to the least-loaded webserver.

If network encryption is used, it can be extremely beneficial to use load balancers with hardware encryption to support offloading a lot of the CPU power required to decode the incoming data stream.

Second, you have the next layer, where you find the web application servers. A multitude of products is available to perform this task. It will be necessary to choose which programming language to write the application in. There are limitations governing which platform can be used, especially with respect to Microsoft's .NET framework, where platform independence exists only in a rudimentary fashion. For example, the Mono framework on Linux lags several versions behind the current Windows version (4.0). More choice is available for Java-based applications because the Java Development Kit is available on most platforms. Solutions range from small servlet and Java Server Pages (JSP) containers such as Apache Tomcat; to Java Enterprise Edition capable web application servers such as JBoss or commercial solutions such as Weblogic (formerly known as BEA); to IBM's WebSphere, among other application servers. Oracle seems to have discontinued its own application server in favor of acquiring BEA.

At the bottom end of the application stack, we find the Oracle RAC database and the storage area network.

Depending on your security requirements, additional firewalling can be found between the different application layers, but they have been omitted from discussion in this chapter for the sake of clarity.

Summary

In this chapter, we investigated a number of topics relevant to the deployment of an Oracle RAC-based application. The most important subject designers and developers should have on their agenda is the continuous availability of the application, regardless of failure. RAC is an excellent high availability option, but it requires a little effort to enable applications to make the best use of it. This is easier to do for applications developed in-house applications than for applications that rely on third-party tools, especially if no access to the source code is available and no agreement has been made with the third-party vendor to RAC-enable the application.

If packaged applications can't use connection pools to benefit from events published by the FAN framework, Transparent Application Failover in session mode can provide at least a little bit of resilience, provided that the application uses the Oracle Call Interface libraries.

Instance recovery is another factor influencing application availability. As we saw, a failure of a busy instance can impact other nodes until the global resource directory is reconstructed, and the recovery set is applied to the database.

Writing a scalable application should be the design goal for any architect or developer. Experience has taught us that scalability is difficult to retrofit into an existing codebase; therefore it should be addressed from the outset. The Oracle database provides many features that can be used to improve scalability, depending on a given environment's requirements. Unfortunately, many application development frameworks don't allow fine-grained access to the generated code, which can make it difficult to ensure scalability.

We discussed intercluster failures in depth in the Instance Recovery in RAC section; however, what if the entire site fails? Oracle Data Guard standby databases are well suited to provide protection in the case of such scenarios. To a lesser extent, these standby databases are suited for extended distance clusters, as well. Finally, we examined Oracle Streams and how they can help you achieve high availability of applications.

The contents presented in the chapter should aide you in designing a highly available and scalable protected application on RAC maximizing your investment in this new technology.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. RAC Architecture

Create new playlist

Sign In

Sign Up

Chapter 3. RAC Architecture

Availability Considerations

Deciding the Number of Nodes

Two-Node RAC Clusters

Oracle RAC One Node

Note

Multi-Node RAC Systems

Online Maintenance and Patching

Warning

Patching Real Application Clusters Using opatch

Opatch Operation Modes

All Node Patch Mode

Rolling Patch Mode

Minimum Downtime Patch Mode

Local Patch Mode

Instance Recovery in RAC

Enqueue Remastering

Database Recovery

Note

Failover Considerations

Transparent Application Failover

Failover Modes

Note

Failover Methods

Fast Connection Failover and Fast Application Notification

Note

Scalability Considerations

Scalability Enhancers

Note

Note

Scalability Inhibitors

Standby Databases

Introduction to Oracle Standby Databases

Types of Standby Database

Physical Standby Database

Snapshot Standby Database

Logical Standby Database

Transient Logical Standby Database

Active Data Guard

Role Transitions

Note

Data Protection Modes

The Data Guard Broker

Note

Extended Distance Clusters

Oracle Streams

Streams Processing

Note

Oracle Streams Prerequisites

Cluster Topologies

Summary

Table of Contents for
3. RAC Architecture