This chapter addresses the following topics:
The need for a business to have a highly available, clustered system
System failures that influence business decisions
Factors to consider when designing a clustered system
Failure modes specific to clusters, synchronization, and arbitration
To understand why you are designing a clustered system, you must first understand the business need for such a system. Your understanding of the complex system failures that can occur in such systems will influence the decision to use a clustered system and will also help you design a system to handle such failures. You must also consider issues such as data synchronization, arbitration, caching, timing, and clustered system failures—split brain, multiple instances, and amnesia—as you design your clustered system.
Once you are familiar with all the building blocks, issues and features that enable you to design an entire clustered system, you can analyze the solutions that the Sun Cluster 3.0 software offers to see how it meets your enterprise business needs and backup, restore, and recovery requirements.
The sections in this chapter are:
Business Reasons for Clustered Systems
Failures in Complex Systems
Data Synchronization
Arbitration Schemes
Data Caches
Timeouts
Failures in Clustered Systems
Summary