Compute Node Kernel interfaces
This chapter describes the kernel interfaces that the CNK provides for applications that run on compute nodes. It includes the following information:
5.1 Lightweight principles
The CNK is designed as a simple and lightweight kernel to maximize performance and reliability for high-performance computing (HPC) applications. It provides an environment for running user processes that is similar to Linux. It is not a full Linux kernel implementation. Instead, it implements a subset of the Linux functionality and a subset of the Portable Operating System Interface (POSIX) functionality.
5.2 Kernel access
The following interfaces can be used to access the CNK services:
Application programming interface (API) provided by the library
System programming interface (SPI) provided by the kernel
System call (syscall) interface provided by the kernel
5.2.1 Application programming interfaces
The library provides various APIs. Each API can be used to send system calls to the CNK. Some of these APIs are lightweight wrappers to the kernel system calls. Some APIs provide more library functionality and might call more than one system call per invocation. This section describes three groups of supported APIs:
File I/O and directory operations
Sockets
Process and threads
File I/O and directory operations
Instead of executing the system call on the compute node, the CNK might send the system call to the I/O node for execution. This is described as function-shipping the system call. Depending on the targeted file system, the CNK might function-ship system calls that are invoked by these APIs to the Common Input Output Services (CIOS) service on the I/O node. The CIOS is a user-level process that services applications in the compute node. The decision to function-ship a specific request depends on the file system that is targeted by the API and the specific system call that is used by the API. Table 5-1 shows the File I/O and directory operations.
Table 5-1 File I/O and directory operations
Function prototype
Header required
Description and type
int access(const char *path, int
mode);
<unistd.h>
Determines the accessibility of a file. Mode: R_OK, X_OK, F_OK; returns 0 on success or -1 on error.
int chmod(const char *path,
mode_t mode);
<sys/types.h>
<sys/stat.h>
Changes the access
permissions on an already open
file.
Mode: S_ISUID, S_ISGID, S_ISVTX, S_IRWXU, S_IRUSR, S_IWUSR, S_IXUSR, S_IRWXG, S_IRGRP, S_IWGRP, S_IXGRP, S_IRWXO, S_IROTH, S_IWOTH, and S_IXOTH. Returns 0 if the permissions are correct or -1 on error.
int chown(const char *path, uid_t
owner, gid_t group);
<sys/types.h>
<sys/stat.h>
Changes the owner and group of a file.
int close(int fd);
<unistd.h>
Closes a file descriptor. Returns 0 on success or -1 on error.
int dup(int fd);
<unistd.h>
Duplicates an open descriptor. Returns a new file descriptor on success or -1 on error.
int dup2(int fd, int fd2);
<unistd.h>
Duplicates an open descriptor. Returns a new file descriptor on success or -1 on error.
int fchmod(int fd, mode_t mode);
<sys/types.h>
<sys/stat.h>
Changes the mode of a file. Returns 0 on success or -1 on error.
int fchown(int fd, uid_t owner,
gid_t group);
<sys/types.h>
<unistd.h>
Changes the owner and group of
a file. Returns 0 on success or -1 on error.
int fcntl(int fd, int cmd, int arg);
<sys/types.h>
<unistd.h>
<fcntl.h>
Manipulates a file descriptor. Supported commands are F_GETFL, F_DUPFD, F_GETLK, F_SETLK, F_SETLKW, F_GETLK64, F_SETLK64, F_SETLKW64.
int fstat(int fd, struct stat *buf);
<sys/types.h>
<sys>/<stat.h>
Gets the file status. Returns 0 if correct or -1 on error.
int stat64(const char *path, struct
stat64 *buf);
<sys/types.h>
<sys>/<stat.h>
Gets the file status.
int statfs(const char *path, struct
statfs *buf);
<sys/vfs.h>
Gets the file system statistics.
long fstatfs64 (unsigned int fd,
size_t sz, struct statfs64 buf);
<sys/vfs.h>
Gets the file system statistics.
int fsync(int fd);
<unistd.h>
Synchronizes changes to a file. Returns 0 on success or -1 on error.
int ftruncate(int fd, off_t length);
<sys/types.h>
<unistd.h>
Truncates a file to a specified length. Returns 0 on success or -1 on error.
int ftruncate64(int fildes, off64_t
length);
<unistd.h>
Truncates a file to a specified length for files larger than 2 GB. Returns 0 on success or -1 on error.
int lchown(const char *path, uid_t
owner, gid_t group);
<sys/types.h>
<unistd.h>
Changes the owner and group of a symbolic link. Returns 0 on success or -1 on error.
int link(const char *existingpath,
const char *newpath);
<unistd.h>
Links to a file. Returns 0 on success or -1 on error.
off_t lseek(int fd, off_t offset, int
whence);
<sys/types.h>
<unistd.h>
Moves the read/write file offset. Returns 0 on success or -1 on error.
int _llseek(unsigned int fd,
unsigned long offset_high,
unsigned long offset_low, loff_t
*result, unsigned int whence);
<unistd.h>
<sys/types.h>
<linux/unistd.h>
<errno.h>
Moves the read/write file offset.
int lstat(const char *path, struct
stat *buf);
<sys/types.h>
<sys>/<stat.h>
Gets the symbolic link status. Returns 0 on success or -1 on error.
int lstat64(const char *path, struct
stat64 *buf);
<sys/types.h>
<sys/stat.h>
Gets the symbolic link status. Determines the size of a file larger than 2 GB.
int open(const char *path, int
oflag, mode_t mode);
<sys/types.h>
<sys>/<stat.h>
<fcntl.h>
Opens a file. oflag: O_RDONLY, O_WRONLY, O_RDWR, O_APPEND, O_CREAT, O_EXCL, O_TRUNC, O_NOCTTY, O_SYNC, mode: S_IRWXU, S_IRUSR, S_IWUSR, S_IXUSR, S_IRWXG, S_IRGRP, S_IWGRP, S_IXGRP, S_IRWXO, S_IROTH, S_IWOTH, and S_IXOTH.
Returns the file descriptor on success or -1 on error.
ssize_t pread(int fd, void *buf,
size_t nbytes, off64_t offset);
<unistd.h>
Reads from a file at offset. Returns the number of bytes read on success, 0 if end of file, or -1 on error.
ssize_t pwrite(int fd, const void
*buf, size_t nbytes, off64_t
offset);
<unistd.h>
Writes to a file at offset; returns the number of bytes written on success or -1 on error.
ssize_t read(int fd, void *buf,
size_t nbytes);
<unistd.h>
Reads from a file. Returns the number of bytes read on success, 0 if end of file, or -1 on error.
int readlink(const char *path, char
*buf, int bufsize);
<unistd.h>
Reads the contents of a symbolic link. Returns the number of bytes read on success or -1 on error.
ssize_t readv(int fd, const struct
iovec iov[], int iovcnt)
<sys/types.h>
<sys/uio.h>
Reads a vector. Returns the number of bytes read on success or -1 on error.
int rename(const char *oldname,
const char *newname);
<stdio.h>
Renames a file. Returns 0 on success or -1 on error.
int stat(const char *path, struct
stat *buf);
<sys/types.h>
<sys/stat.h>
Gets the file status. Returns 0 on success or -1 on error.
int stat64(const char *path, struct
stat64 *buf);
<sys/types.h>
<sys/stat.h>
Gets the file status.
int statfs (char *path, struct statfs
*buf);
<sys/types.h>
<sys/stat.h>
Gets the file system statistics.
long statfs64 (const char *path,
size_t sz, struct statfs64 *buf);
<sys/statfs.h>
Gets the file system statistics.
int symlink(const char
*actualpath, const char
*sympath);
<unistd.h>
Makes a symbolic link to a file. Returns 0 on success or -1 on error.
int truncate(const char *path,
off_t length);
<sys/types.h>
<unistd.h>
Truncates a file to a specified length. Returns 0 on success or -1 on error.
truncate64(const char *path, off_t
length);
<unistd.h>
<sys/types.h>
Truncates a file to a specified length.
mode_t umask(mode_t cmask);
<sys/types.h>
<sys/stat.h>
Sets and gets the file mode creation mask. Returns the previous file mode creation mask.
int unlink(const char *path);
<unistd.h>
Removes a directory entry. Returns 0 on success or -1 on error.
int utime(const char *path, const
struct utimbuf *times);
<sys/types.h>
<utime.h>
Sets the file access and modification times. Returns 0 on success or -1 on error.
ssize_t write(int fd, const void
*buff, size_t nbytes);
<unistd.h>
Writes to a file. Returns the number of bytes written on success or -1 on error.
ssize_t writev(int fd, const struct
iovec iov[], int iovcntl);
<sys/types.h>
<sys/uio.h>
Writes a vector. Returns the number of bytes written on success or -1 on error.
int chdir(const char *path);
<unistd.h>
Changes the working directory. Returns 0 on success or -1 on error.
char *getcwd(char *buf, size_t
size);
<unistd.h>
Gets the path name of the current working directory. Returns the buf value on success or NULL on error.
int getdents(int fildes, char **buf,
unsigned nbyte);
<sys/types.h>
Gets the directory entries in a file system. Returns 0 on success or -1 on error.
int getdents64(unsigned int fd,
struct dirent *dirp, unsigned int
count);
<sys/dirent.h>
Gets the directory entries in a file system.
int mkdir(const char *path,
mode_t mode);
<sys/types.h>
<sys/stat.h>
Makes a directory; mode: S_IRUSR, S_IWUSR, S_IXUSR, S_IRGRP, S_IWGRP, S_IXGRP, S_IROTH, S_IWOTH, and S_IXOTH. Returns 0 on success or -1 on error.
int rmdir(const char *path);
<unistd.h>
Removes a directory. Returns 0 on success or -1 on error.
Sockets
The socket support allows the creation of both outbound and inbound socket connections with standard Linux APIs. For example, an outbound socket can be created by calling the socket() function, followed by the connect() function. An inbound socket can be created by calling the socket() function followed by the bind(), listen(), and accept() functions.
Communication through the socket is provided by the glibc send(), recv(), and select() function calls. These function calls run the socketcall() system call with different parameters.
The CNK provides socket support through the standard Linux socketcall() system call. The CNK function-ships the socketcall() system call to the CIOS, which performs the requested operation.
Table 5-2 summarizes the supported socket APIs.
Table 5-2 Supported socket APIs
Function prototype
Header required
Description and type
int accept(int sockfd, struct
sockaddr *addr, socklen_t
*addrlen);
<sys/types.h>
<sys/socket.h>
Extracts the connection request on the queue of pending connections. Creates a new connected socket. Returns a file descriptor on success or -1 on error.
int bind(int sockfd, const struct
sockaddr *my_addr, socklen_t
addrlen);
<sys/types.h>
<sys/socket.h>
Assigns a local address. Returns 0 on success or -1 on error.
int connect(int socket, const
struct sockaddr *address,
socklen_t address_len);
<sys/types.h>
<sys/socket.h>
Connects a socket. Returns 0 on success or -1 on error.
int getpeername(int socket, struct
sockaddr *restrict address,
socklen_t *restrict address_len);
<sys/socket.h>
Gets the name of the peer socket. Returns 0 on success or -1 on
error.
int getsockname(int socket, struct
sockaddr *restrict address,
socklen_t *restrict address_len);
<sys/types.h>
<sys/socket.h>
Gets the name of the peer socket. Returns 0 on success or -1 on
error.
int getsockopt(int s, int level, int
optname, void *optval, socklen_t
*optlen);
<sys/socket.h>
Manipulates options that are associated with a socket. Returns 0 on success or -1 on error.
int listen(int sockfd, int backlog);
<sys/types.h>
<sys/socket.h>
Accepts connections. Returns 0 on success or -1 on error.
int poll(struct pollfd fds[], nfds_t nfds, int timeout);
#include <poll.h>
The poll() function provides applications with a mechanism for multiplexing input/output over a set of file descriptors.
ssize_t recv(int s, void *buf, size_t
len, int flags);
<sys/types.h>
<sys/socket.h>
Receives a message only from a connected socket. Returns 0 on success or -1 on error.
ssize_t recvfrom(int s, void *buf,
size_t len, int flags, struct
sockaddr *from, socklen_t
*fromlen);
<sys/types.h>
<sys/socket.h>
Receives a message from a socket regardless of whether it is connected. Returns 0 on success or -1 on error.
ssize_t recvmsg(int s, struct
msghdr *msg, int flags);
<sys/types.h>
<sys/socket.h>
Receives a message from a socket regardless of whether it is connected. Returns 0 on success or -1 on error.
ssize_t send(int socket, const
void *buffer, size_t length, int
flags);
<sys/types.h>
<sys/sockets.h>
Sends a message only to a connected socket. Returns 0 on success or -1 on error.
ssize_t sendto(int socket, const
void *message, size_t length, int
flags, const struct sockaddr
*dest_addr, socklen_t dest_len);
<sys/types.h>
<sys/socket.h>
Sends a message on a socket. Returns 0 on success or -1 on error.
ssize_t sendmsg(int s, const
struct msghdr *msg, int flags);
<sys/types.h>
<sys/socket.h>
Sends a message on a socket. Returns 0 on success or -1 on error.
int setsockopt(int s, int level, int
optname, const void *optval,
socklen_t optlen);
<sys/types.h>
<sys/socket.h>
Manipulates options that are associated with a socket. Returns 0 on success or -1 on error.
int shutdown(int s, int how);
<sys/socket.h>
Causes all or part of a connection on the socket to shut down. Returns 0 on success or -1 on error.
int socket(int domain, int type, int
protocol);
<sys/types.h>
<sys/socket.h>
Opens a socket. Returns a file descriptor on success or -1 on error.
int socketpair(int d, int type, int
protocol, int sv[2]);
<sys/types.h>
<sys/socket.h>
Creates an unnamed pair of connected sockets. Returns 0 on success or -1 on error.
Processes and threads
This section shows the supported APIs that are associated with the control and access of processes and threads executing on the compute nodes. Additional APIs might operate correctly if they use the system calls that are supported by the CNK.
Table 5-3 lists the supported process and thread management APIs.
Table 5-3 Supported APIs for managing threads that run on compute nodes
Function prototype
Header required
Description and type
gid_t getgid(void);
<unistd.h>
Gets the group ID.
pid_t getpid(void);
<unistd.h>
Gets the process ID. This ID is a nonzero value that uniquely identifies a process with a node. This ID is not unique across all of the processes in a job.
int getrlimit(int resource, struct
rlimit *rlp)
<sys/resource.h>
Gets information about resource limits.
int getrusage(int who, struct
rusage *r_usage);
<sys/resource.h>
Gets information about resource use. All time reported is attributed to the user application, so the reported system time is always zero.
uid_t getuid(void);
<unistd.h>
Gets the user ID.
int setrlimit(int resource, const
struct rlimit *rlp);
<sys/resource.h>
Sets resource limits. Only
RLIMIT_CORE can be set.
clock_t times(struct tms *buf);
<sys/times.h>
Gets the process times. All time reported is attributed to the user application, so the reported system time is always zero.
int brk(void *end_data_segment);
<unistd.h>
Changes the allocated size in the heap segment.
void exit(int status)
<stdlib.h>
Terminates a process.
int uname(struct utsname *buf);
<sys/utsname.h>
Gets the name of the current system and other information, for example, the version and release.
void *mmap(void *addr, size_t len, int prot, int flags,
int fildes, off_t off);
#include <sys/mman.h>
Establishes a mapping between a process address space and a file, shared memory object, or typed memory object.
int shm_open(const char *name, int oflag, mode_t mode);
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
Creates and opens a new, or opens an existing, POSIX shared memory object.
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
#include <pthread.h>
Starts a new thread in the calling process.
void pthread_exit(void *retval);
#include <pthread.h>
Terminates the calling thread.
int pthread_yield(void);
<sched.h>
Forces the running thread to relinquish the processor.
pthread_setschedprio(pthread_t thread, int prio);
#include <pthread.h>
Sets the scheduling priority of the thread.
int nanosleep(const struct timespec *req, struct timespec *rem);
#include <time.h>
Suspends the execution of the calling thread until either at least the time specified.
int kill(pid_t pid, int sig);
<signal.h>
Sends a signal. A signal can be sent only to the same process.
int sigaction(int signum, const
struct sigaction *act, struct
sigaction *oldact);
<signal.h>
Allows the calling process to examine and specify the action to be associated with a specific signal.
typedef void (*sighandler_t)(int)
sighandler_t signal(int signum,
sighandler_t handler);
<signal.h>
This interface is supported for existing applications. Use the sigaction interface for new applications.
int shm_open(const char *name, int oflag, mode_t mode);
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
Creates and opens a new, or opens an existing, POSIX shared memory object.
5.2.2 System programming interface
The SPI that is provided by the CNK allows low-level access to Blue Gene/Q-specific interfaces. Many of the SPIs are implemented using special internal Blue Gene/Q system calls. Some of the SPIs are implemented in the user state and do not require entry into the kernel.
For information about kernel SPIs, see the installed documentation in the /bgsys/drivers/ppcfloor/spi/doc/html directory. This information is also available on the Knowledge Center tab in Navigator.
The following tables list the header files that contain the SPIs and describe the SPIs in the files.
Table 5-4 lists the supported SPI header files.
Table 5-4 SPI header files and the interfaces they provide
Interface file
Description
/spi/include/kernel/collective.h
Allocates the collective class route IDs, and sets the configuration of collective class routes.
/spi/include/kernel/gi.h
Allocates the global interrupt class route IDs, and sets the configuration of global interrupt class routes.
/spi/include/kernel/location.h
Provides location information including the node location in the block, process information with the node, the core in the node, and the hardware thread in the core.
/spi/include/kernel/memory.h
Manages regions of memory within the compute node. Opens the persistent memory handle with the persist_open kernel function.
/spi/include/kernel/process.h
Retrieves information about the process. This information includes how many processes are configured per node, how many processors are assigned to a process, which hardware threads are assigned to the process.
/spi/include/kernel/spec.h
Controls the speculative execution of threads.
/spi/include/kernel/thread.h
Retrieves scheduler information about active and runnable pthreads on a hardware thread.
/spi/include/kernel/MU.h
Controls and retrieves information from the messaging unit hardware.
/spi/include/kernel/rdma.h
Interfaces for an abbreviated version of OFED RDMA CM from the compute node to its I/O node.
/spi/include/kernel/sendx.h
Provides extensions to the light-weight kernel for user-defined function-shipping exchanges with a dynamically loaded library attached to the sysiod daemon on the I/O node.
The extensions are in a derived plug-in class that is coded by the user on the I/O node where the base class is defined in /ramdisk/include/services/Plugin.h.
The function-ship operations include simple message passing to the more complex operations of RDMA and using file descriptors.
/spi/include/l1p/pprefetch.h
Controls the perfect prefetcher hardware.
/spi/include/l1p/sprefetch.h
Controls the stream prefetcher hardware.
/spi/include/l1p/flush.h
Causes an L1P flush of all pending load and store operations to the L2 cache.
/spi/include/l2/atomic.h
L2 atomic operations.
/spi/include/l2/barrier.h
L2 atomic-based barrier operations.
/spi/include/l2/lock.h
L2 atomic-based lock operations.
5.3 System calls
The system call is the lowest‑level interface that an application can use to access kernel functions. It is typically best to use the library APIs as the primary interface to the kernel. See 5.2.1, “Application programming interfaces” on page 56. However, direct execution of system calls is allowed and is sometimes required. Example 5-1 shows an example of a direct invocation of a system call.
Example 5-1 Direct invocation of a system call
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
main(int argc, char *argv[])
{
pid_t tid;
tid = syscall(SYS_gettid);
}
Supported Linux APIs
The following system calls are supported by the CNK.
ftruncate64
futex
getcwd
getdents
getdents64
getgroups
getitimer
getpid
getrlimit
getrusage
gettid
gettimeofday
ioctl
kill
lseek
lstat
lstat64
mkdir
mmap
mremap
munmap
nanosleep
open
poll
prctl
pread64
pwrite64
read
readlink
readv
rename
rmdir
rt_sigaction
rt_sigprocmask
sched_get_priority_max
sched_get_priority_min
sched_getaffinity
sched_getparam
sched_setscheduler
sched_yield
setitimer
setrlimit
sigaction
signals
sigprocmask
socketcall
stat
stat64
statfs
statfs64
symlink
 
time
times
tmwrite
truncate
truncate64
uid
umask
uname
unlink
utime
write
writev
 
All other system calls return the errno value ENOSYS.
Additional information about system calls
For more information about Linux system calls, see the syscalls(2) manual page.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset