System Architecture

Sun Cluster 3.0, released in December 2000, is the result of nearly six years of research, development, and testing work. This work stems from the Solaris Multicomputer (Solaris MC) project started by Sun Labs in early 1995. New global features—global disks, tapes, CD-ROMs, a global file service, and global networking—augment the Sun Cluster 3.0 support for highly available (HA) applications. These new global features enable Sun Cluster 3.0 to provide all the functionality of its predecessor, Sun Cluster 2.2, and to support a new class of scalable applications. Some examples of HA applications are the Network File System (NFS), the industry-standard relational database management systems (RDBMSs), and parallel applications such as the Oracle 8i Parallel Server (Oracle 8i OPS) and Oracle 9i Real Application Cluster (Oracle 9i RAC).

Enterprise Infrastructure

You should never consider deployment of a Sun Cluster system in isolation. The well known cliché that “a chain is only as strong as its weakest link” applies equally well to the infrastructure within an enterprise. A cluster provides a platform to host highly available services. These services, in turn, must communicate with other components within the organization and be accessible to the users who interact with it. It is important, therefore, to ensure that the surrounding infrastructure is reliable and available too. Networks that link various architectural tiers within the organization should have alternative paths so that data can flow unimpeded between the various components. Similarly, power and cooling are needed to ensure that systems continue to run and do not overheat. All of this must occur in an environment secure from malicious damage or unauthorized observation. A secure environment ensures that the data is safe from unauthorized manipulation or inadvertent corruption.

A Sun Cluster must be run by trained system administrators who understand the technology. The system administrators must also understand the need to clearly define and carefully follow change management procedures. Through the SunUP™ Network program and by focusing on the people and process in addition to the product, IT departments can ensure that Sun Cluster 3.0 systems deliver the expected level of availability. The SunUP organization, which is part of the Sun Worldwide Quality organization, works with customers and third-party partners to develop products and services that enhance availability in Sun computer systems. The Sun BluePrints program works in alliance with the SunUP program to produce best practice documentation for use by the system administrators.

Service Point Architecture

The Sun™ Service Point architecture is the Sun Microsystems vision of how modern data centers can meet the challenges placed on them. These challenges are tightened IT budgets, the Internet, all of the facets of e-business, and the demand for highly available services that are as reliable and available as the dial tone on a telephone. Sun Cluster 3.0 forms a key part of this architecture, which brings together the concepts of resource consolidation and service-level management. Sun terms this a SunPlex™. This environment is administered through a common set of tools, such as the Sun™ Management Center 3.0 software. The Service Point Architecture enables IT departments to get the most out of their resources and contain provisioning, implementation, change, and on-going management costs. The case studies in Chapters 5 and 6 implement the Sun Service Point Architecture and embody the SunUP configuration best practices.

Fault Tolerant Systems

Although Sun Cluster 3.0 makes applications highly available, you should make a clear distinction between Sun Cluster 3.0 and fault tolerant solutions like the Netra ft™ server. If an unrecoverable hardware component failure, such as a processor failure or an uncorrectable ECC memory error occurs, the server panics. Depending on the type of service, you, the system administrator, may need to restart the services running on that node on another node within the cluster. Consequently, users of the service may experience a pause in the service or an outage that requires reconnection to the service. The advantages of a Sun Cluster solution over a fault tolerant solution are threefold. A Sun Cluster solution is considerably less expensive because it is based on general-purpose Solaris servers. Therefore, this solution can be used on a wider range of server platforms, thus enabling greater application scalability. And they can recover from software faults (bugs) that would otherwise be a single point of failure in a fault tolerant system.

High Availability Versus Disaster Recovery

Because the primary goal of Sun Cluster 3.0 is to provide a platform for highly available and scalable applications, you must clearly distinguish the goals of the Sun Cluster product and the goals of a disaster recovery strategy. Disaster recovery policies often trigger strong business processes that involve a large number of staff in a “firefighting” mode. A simple software failure that causes an application failover to a second node should not provoke such a complex response.

A genuine disaster recovery policy must take into account failures other than those outlined previously. These failures include data corruption and deletion, both accidental and malicious, plus the occasions that require rapid rollback of data or application changes.

When combined with the Sun StorEdge Network Data Replicator (SNDR software), Sun Cluster 3.0 provides a suitable combination of availability and disaster recovery solutions.

Sun Cluster 3.0 relies heavily on the private interconnects between its nodes for communication of “heartbeat” messages and the data required for the new global functionality. Increasing the node separation to afford protection from local disasters such as floods, fires, or power failures introduces latency into the internode communication. This results in potential performance penalties for applications that make any substantial use of internode communication for synchronization of data or state. Given that almost all applications have some data that must be read from a cluster file system or written to it, the node separation will inevitably affect system performance.

Separating cluster nodes over a long distance also requires substantial additional costs for the associated dedicated fiber communications networks needed to support the disk mirroring and the related public and private network infrastructure needed. Because the data must be mirrored across sites to ensure the survival of a site failure, all write transactions must be completed on both halves of the mirror before any disk write returns. FIGURE 3-1 shows the I/O overhead of campus clustering versus replication.

Figure 3-1. I/O Overhead of Campus Clustering Versus Replication


This action not only increases latency, but puts a substantial demand on bandwidth, because whole disk blocks must be transmitted across the public network to the remote site regardless of the size of change. The extended interconnects impact system performance, even if the system uses the HAStorage data service. See “Application Performance”. The extended node separation also adds latency to the internal Sun Cluster 3.0 coherency traffic. See “File and Attribute Caches”. Consequently, Sun Cluster 3.0 currently limits internode separation of campus clusters to 500 meters.

You should also consider failures that result in all or part of a disk mirror requiring synchronization again. The additional latency impacts the performance of the system and the completion time of the resynchronization. When dark fiber is used for intersite links, the links must be tested to ensure that they can sustain full bandwidth to minimize the period of disruption.

Data Deletion and Corruption Recovery

A genuine disaster recovery policy must account for failures other than those outlined previously, including data corruption and deletion, both accidental and malicious, plus the occasional need to rapidly roll back data or application changes. A Sun Cluster 3.0 system, even with mirrored storage, cannot recover from such errors. A corrupted or deleted file, by definition, is corrupted or deleted on all subcluster mirrors. The only way to recover data integrity is to restore the file from tape or from a locally held file system snapshot. In an RDBMS system, you may subsequently have to apply transactions to the recovered file or files.

You can use a number of technologies to facilitate recovery from data corruption or deletion, the most basic being a tape backup and recovery system. Other alternatives include:

  • The fssnap(1M) feature in Solaris 8 update 3 operating environment, which enables you to take snapshots of the file system. The fssnap(1M) snapshots are a point-in-time copy of data created on storage attached locally.

  • Sun StorEdge™ Instant Image 3.0 software, which enables you to take snapshots of data from both raw devices and file systems. This feature cannot be used with Oracle 8i OPS or Oracle 9i RAC systems. In contrast to the SNDR software, the Sun StorEdge Instant Image snapshots are a point-in-time copy of data created on storage attached locally. You can quickly update these snapshots by copying over only the changed disk blocks.

  • Sun StorEdge Network Data Replicator 3.0 software, which enables you to replicate raw devices or file systems synchronously or asynchronously, to remote sites continuously. This feature cannot be used with Oracle 8i OPS or Oracle 9i RAC systems. In contrast to the Sun StorEdge Instant Image, the SNDR software replicates data continuously to storage connected to a host on a remote site.

Most relational database management systems (RDBMS) have built-in replication technology. Oracle 9i RAC, for example, has Oracle Data Guard, which enables you to copy complete archive log files to one or more remote sites and then replay the log files at a suitable time. This feature enables you to replay transactions up to a specific error, thus avoiding a corruption or deletion. This feature can also replicate changes to one or more tables to tables in remote databases.

A suitable disaster recovery architecture should, therefore, include Sun StorEdge Instant Image 3.0, Sun StorEdge Network Data Replicator 3.0 software products, and application-level replication, plus Sun Cluster 3.0 software to provide high availability.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset