Cluster Software

The Sun Cluster 3.0 software includes the highly available Network File System (NFS) data service package, SUNWscnfs, without additional licensing charges. The HA-NFS data service is currently a failover service—the Sun Cluster 3.x software road map includes a scalable NFS agent in the future.

A failover service executes on a single cluster node until the fault monitor agent detects a service anomaly. Once a fault is detected on a data service, the system restarts the data service on the same node or an alternate node. The Firm has decided to use the Sun Cluster 3.0 HA-NFS service to serve two separate file systems:

  • The workspace file system includes the developers' home directories and workspaces used to store any work in progress.

  • The build file system contains the configuration controlled sources, build environment, test suites, and test results.

This requirement allows the Firm to distribute the NFS service between two cluster nodes by creating two separate HA-NFS data services: workspace and build. During normal operations, the system shares the NFS load across the nodes, based on the file systems. The only caveat with this configuration is that a failure on a cluster node reduces the performance of the surviving node because it must host two HA-NFS data services at the same time. The Firm considered this and believes that their configuration will be adequate, even in the degraded mode.

Software Configuration

The primary service provided by this cluster is NFS, which has been a standard part of the Sun operating environment since 1985. The Sun Cluster 3.0 HA-NFS agent includes the fault monitors and recovery mechanisms required to provide failover NFS service. This service is easily installed on Sun Cluster 3.0.

NFS Overview

The NFS environment provides transparent file access to remote files over a network. File systems of remote devices appear to be local. Clients access remote file systems by using either the mount command or the automounter.

The terms client and server describe the roles that a computer plays when sharing file systems. If a file system resides on a computer disk and that computer makes the file system available to other computers on the network, that computer acts as a server. The computers that are accessing the file system are called clients. The NFS service enables any computer to access file systems of any other computer and, at the same time, provides access to its own file systems. Thus, a computer can play the role of client, server, or both at any given time on a network.

Note

Cluster nodes should not be NFS clients of file systems that are served by the cluster.


The objects that can be shared with the NFS service include any whole or partial directory tree, or a file hierarchy, including a single file. Unlike the Sun Cluster 3.0 GFS, peripheral devices such as modems and printers cannot be shared through NFS.

In most UNIX system environments, a shared file hierarchy corresponds to a file system or to a portion of a file system. However, NFS support works across operating systems, and the concept of a file system might be meaningless in other, non-UNIX environments. Therefore, the term file system used throughout this book refers to a file or file hierarchy that can be shared and mounted over the NFS environment.

NFS Characteristics

The NFS protocol enables multiple client retries and easy crash recovery. The client provides all of the information for the server to perform the requested operation. The client retries the request until the server acknowledges the request or until the request times out. The server acknowledges writes when the data is flushed to nonvolatile storage.

The multithreaded kernel does not require the maintenance of multiple nfsd or asynchronous-block I/O daemon (biod) processes; they are both operating system kernel threads. There are no biods on the client and only one nfsd process on the server.

Random patterns are characteristic of NFS traffic. NFS generates requests of many types, in bursts. The capacity of an NFS server must address the sporadic nature of NFS demands. Demand varies widely but is relatively predictable during normal activity.

Most requests from applications follow this pattern:

  1. The user reads in the sections of the application binary, then executes the code pages leading to a user dialog, which specifies a data set on which to operate.

  2. The application reads the data set from the remote disk.

  3. The user can then interact with the application, manipulating the in-memory representation of the data. This phase continues for most of the runtime of the application.

  4. The modified data set is saved to disk.

More sections of the application binary can be paged in as the application continues to run.

The NFS client negotiates with the server about using NFS version 2 or NFS version 3. If the server supports NFS version 3, version 3 becomes the default.

NFS version 3 contains several features to improve performance, reduce server load, and reduce network traffic. Since NFS version 3 is faster for I/O writes and uses fewer operations over the network, the network is used more efficiently. Note that higher throughput may make the network busier. NFS version 3 maintains the stateless server design and simple crash recovery of version 2, along with its approach to building a distributed file system from cooperating protocols. TABLE 5-2 lists the high-level features of NFS versions 2 and 3.

Table 5-2. High-Level Features of NFS Versions 2 and 3
Feature NFS v2 NFS v3
Stateless protocol Yes Yes
Default transport protocol UDP/IP TCP/IP
Maximum transfer size 64 Kbytes 4 Gbytes
Maximum file size 4 Gbytes 1 Tbyte
Asynchronous writes No Optional

The Firm's workload is mostly small, attribute-intensive text files. This workload justifies the use of a slower, more cost-effective network infrastructure for connecting the clients and the server. In this case, the Firm uses FastEthernet for its client or public networks.

Arbitration

NFS is a stateless protocol. This makes detection of a failed server slightly difficult. The system must monitor four processes on an NFS server to determine if it is providing all NFS services properly:

  1. The network file system daemon, nfsd, handles file requests, reads, writes, attribute lookups, and so forth.

  2. The mountd processes file system mount requests. mountd allows or disallows a client from accessing a file system. The system only accesses mountd when a client wants to mount or unmount a file system. The network file system daemon, nfsd, handles file I/O requests.

  3. The network lockd handles file and record locking requests. The paragraphs that follow describe the network lockd and the network status monitor daemon, statd(1M), in detail.

  4. The statd recovers file and record locking requests in the case of a server or client failure. The paragraphs that follow describe the network statd in detail.

The starting order of these processes is important. When the system enters multiuser mode at run level 2, it starts statd before lockd. When the system extends multiuser mode by offering network services at run level 3, it starts nfsd before mountd.

The Sun Cluster 3.0 HA-NFS agent monitors all four of these daemons. If a daemon stops running, the agent attempts to restart the daemon. The operation of these daemons is independent of the file systems served. Any NFS file server has all four daemons running. Only one instance of each daemon is running, no matter how many file systems are being served.

The system must export each file system being served by using the share(1M) command. For convenience, you can enter share commands in the /etc/dfs/dfstab file so that the system starts the NFS service and exports specified file systems automatically at reboot. The Firm's configuration specifies two different resource groups. Each group serves different file systems— the workspace file system and the build file system. The agent start and stop scripts share or unshare the appropriate file system by having a separate dfstab for each resource group—/etc/dfs/dfstab.workspace and /etc/dfs/dfstab.build.

Each client connects to the server through a logical IP address that follows the resource groups. For example, the workspace-server logical IP address belongs to the resource group that serves the workspace files and is associated with the dfstab.workspace share commands. Similarly, the build-server logical IP address is associated with the dfstab.build share commands. From the client perspective, the workspace and build file systems are served by two different servers regardless of the state of the cluster. This greatly simplifies client file system management.

Synchronization

NFS is a stateless protocol, although some clients may want to use file or record locking on files served by NFS. The NFS protocol has a built-in method of file and record lock synchronization that is independent of the cluster framework. The network lockd and the network statd manage the locks.

The lockd is part of the NFS lock manager, which supports NFS file record locking operations. The lock manager does the following tasks:

  • Forwards fcntl(2) locking requests for NFS-mounted file systems to the lock manager on the NFS server.

  • Generates local file locking operations in response to requests forwarded from lock managers running on NFS client machines.

State information kept by the lock manager about these locking requests can be lost if you kill lockd and reboot the operating system, or if the system fails over the HA-NFS service to another node in a cluster. The system can recover some of this information as follows.

When the server lock manager restarts, it waits for a grace period for all client-site lock managers to submit reclaim requests. Also, statd notifies client-site lock managers of the restart and promptly resubmits previously granted lock requests. If the lock daemon fails to secure a previously granted lock at the server site, it sends SIGLOST to a process.

Network Status Monitor

The statd is an intermediate version of the status monitor. It interacts with lockd to provide the crash and recovery functions for the locking services on NFS. The statd keeps track of the clients with processes that hold locks on a server. When the server reboots after a crash, statd sends a message to the statd on each client notifying it that the server has rebooted. The client statd processes then notify the client lockd that the server has rebooted. The client lockd then attempts to reclaim the lock(s) from the server.

The statd on the client also informs the statd on the server(s) holding locks for the client when the client has rebooted. In this case, the server statd notifies the server lockd to release all locks held by the rebooting client, allowing other processes to lock those files.

Note

The NFS protocol is stateless—a client only detects a server crash when the service recovers.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset