We assume that the kernel noticed the arrival of a signal and invoked one of the functions mentioned in the previous section to prepare the process descriptor of the process that is supposed to receive the signal. But in case that process was not running on the CPU at that moment, the kernel deferred the task of delivering the signal. We now turn to the activities that the kernel performs to ensure that pending signals of a process are handled.
As mentioned in Section 4.8, the kernel
checks the value of the sigpending
flag of the
process descriptor before allowing the process to resume its
execution in User Mode. Thus, the kernel checks for the existence of
pending signals every time it finishes handling an interrupt or an
exception.
To handle the nonblocked pending signals, the kernel invokes the
do_signal( )
function, which receives two
parameters:
regs
The address of the stack area where the User Mode register contents of the current process are saved.
oldset
The address of a variable where the function is supposed to save the
bit mask array of blocked signals. It is NULL
if
there is no need to save the bit mask array.
The do_signal( )
function starts by checking
whether the function itself was triggered by an interrupt; if so, it
simply returns. Otherwise, if the function was triggered by an
exception that was raised while the process was running in User Mode,
the function continues executing:
if ((regs->xcs & 3) != 3) return 1;
However, as we’ll see in Section 10.3.4, this does not mean that a system call cannot be interrupted by a signal.
If the oldset
parameter is
NULL
, the function initializes it with the address
of the current->blocked
field:
if (!oldset) oldset = ¤t->blocked;
The heart of the do_signal( )
function consists of
a loop that repeatedly invokes the dequeue_signal( )
function until no nonblocked pending signals are left.
The return code of dequeue_signal( )
is stored in
the signr
local variable. If its value is 0, it
means that all pending signals have been handled and
do_signal( )
can finish. As long as a nonzero
value is returned, a pending signal is waiting to be handled.
dequeue_signal( )
is invoked again after
do_signal( )
handles the current signal.
The dequeue_signal( )
always considers the
lowest-numbered pending signal. It updates the data structures to
indicate that the signal is no longer pending and returns its number.
This task involves clearing the corresponding bit in
current->pending.signal
and updating the value
of current->sigpending
. In the
mask
parameter, each bit that is set represents a
blocked signal:
sig = 0; if (((x = current->pending.signal.sig[0]) & ~mask->sig[0]) != 0) sig = 1 + ffz(~x); else if (((x = current->pending.signal.sig[1]) & ~mask->sig[1]) != 0) sig = 33 + ffz(~x); if (sig) { sigdelset(¤t->signal, sig); recalc_sigpending(current); } return sig;
The collection of currently pending signals is ANDed with the blocked
signals (the complement of mask
). If anything is
left, it represents a signal that should be delivered to the process.
The ffz( )
function returns the index of the first
bit in its parameter; this value is used to compute the lowest-number
signal to be delivered.
Let’s see how the do_signal( )
function handles any pending signal whose number is returned by
dequeue_signal( )
. First, it checks whether the
current
receiver process is being monitored by
some other process; in this case, do_signal( )
invokes notify_parent( )
and schedule( )
to make the monitoring process aware of the signal
handling.
Then do_signal( )
loads the ka
local variable with the address of the k_sigaction
data structure of the signal to be handled:
ka = ¤t->sig->action[signr-1];
Depending on the contents, three kinds of actions may be performed: ignoring the signal, executing a default action, or executing a signal handler.
When
a delivered signal is explicitly ignored, the do_signal( )
function normally just continues with a new execution of
the loop and therefore considers another pending signal. One
exception exists, as described earlier:
if (ka->sa.sa_handler == SIG_IGN) { if (signr == SIGCHLD) while (sys_wait4(-1, NULL, WNOHANG, NULL) > 0) /* nothing */; continue; }
If the signal delivered is SIGCHLD
, the
sys_wait4( )
service routine of the
wait4( )
system call is invoked to force the
process to read information about its children, thus cleaning up
memory left over by the terminated child processes (see
Section 3.5).
If
ka->sa.sa_handler
is equal to
SIG_DFL
, do_signal( )
must
perform the default action of the signal. The only exception comes
when the receiving process is init, in which
case the signal is discarded as described in the earlier section
Section 10.1.1:
if (current->pid == 1) continue;
For other processes, since the default action depends on the type of
signal, the function executes a switch
statement
based on the value of signr
.
The signals whose default action is “ignore” are easily handled:
case SIGCONT: case SIGCHLD: case SIGWINCH: continue;
The signals whose default action is
“stop” may stop the current
process. To do this, do_signal( )
sets the state
of current
to TASK_STOPPED
and
then invokes the schedule( )
function (see
Section 11.2.2). The do_signal( )
function also sends a SIGCHLD
signal
to the parent process of current
, unless the
parent has set the SA_NOCLDSTOP
flag of
SIGCHLD
:
case SIGTSTP: case SIGTTIN: case SIGTTOU: if (is_orphaned_pgrp(current->pgrp)) continue; case SIGSTOP: current->state = TASK_STOPPED; current->exit_code = signr; if (current->p_pptr->sig && !(SA_NOCLDSTOP & current->p_pptr->sig->action[SIGCHLD-1].sa.sa_flags)) notify_parent(current, SIGCHLD); schedule( ); continue;
The difference between SIGSTOP
and the other
signals is subtle: SIGSTOP
always stops the
process, while the other signals stop the process only if it is not
in an “orphaned process group.” The
POSIX standard specifies that a process group is
not orphaned as long as there is a process in
the group that has a parent in a different process group but in the
same session.
The signals whose default action is
“dump” may create a
core
file in the process working directory; this
file lists the complete contents of the process’s
address space and CPU registers. After the do_signal( )
creates the core file, it kills the process. The default
action of the remaining 18 signals is
“terminate,” which consists of just
killing the process:
exit_code = sig_nr; case SIGQUIT: case SIGILL: case SIGTRAP: case SIGABRT: case SIGFPE: case SIGSEGV: case SIGBUS: case SIGSYS: case SIGXCPU: case SIGXFSZ: if (do_coredump(signr, regs)) exit_code |= 0x80; default: sigaddset(¤t->pending.signal, signr); recalc_sigpending(current); current->flags |= PF_SIGNALED; do_exit(exit_code);
The do_exit( )
function receives as its input
parameter the signal number ORed with a flag set when a core dump has
been performed. That value is used to set the exit code of the
process. The function terminates the current process, and hence never
returns (see Chapter 20).
If
a handler has been established for the signal, the
do_signal( )
function must enforce its execution.
It does this by invoking handle_signal( )
:
handle_signal(signr, ka, &info, oldset, regs); return 1;
Notice how do_signal( )
returns after having
handled a single signal. Other pending signals won’t
be considered until the next invocation of do_signal( )
. This approach ensures that real-time signals will be
dealt with in the proper order.
Executing a signal handler is a rather complex task because of the need to juggle stacks carefully while switching between User Mode and Kernel Mode. We explain exactly what is entailed here.
Signal handlers are functions defined by User Mode processes and
included in the User Mode code segment. The handle_signal( )
function runs in Kernel Mode while signal handlers run in
User Mode; this means that the current process must first execute the
signal handler in User Mode before being allowed to resume its
“normal” execution. Moreover, when
the kernel attempts to resume the normal execution of the process,
the Kernel Mode stack no longer contains the hardware context of the
interrupted program because the Kernel Mode stack is emptied at every
transition from User Mode to Kernel Mode.
An additional complication is that signal handlers may invoke system calls. In this case, after the service routine executes, control must be returned to the signal handler instead of to the code of the interrupted program.
The solution adopted in Linux consists of copying the hardware
context saved in the Kernel Mode stack onto the User Mode stack of
the current process. The User Mode stack is also modified in such a
way that, when the signal handler terminates, the sigreturn( )
system call is automatically invoked to copy the hardware
context back on the Kernel Mode stack and restore the original
content of the User Mode stack.
Figure 10-2 illustrates the flow of execution of the
functions involved in catching a signal. A nonblocked signal is sent
to a process. When an interrupt or exception occurs, the process
switches into Kernel Mode. Right before returning to User Mode, the
kernel executes the do_signal( )
function, which
in turn handles the signal (by invoking handle_signal( )
) and sets up the User Mode stack (by invoking
setup_frame( )
or setup_rt_frame( )
). When the process switches again to User Mode, it starts
executing the signal handler because the handler’s
starting address was forced into the program counter. When that
function terminates, the return code placed on the User Mode stack by
the setup_frame( )
or setup_rt_frame( )
function is executed. This code invokes the
sigreturn( )
system call, whose service routine
copies the hardware context of the normal program in the Kernel Mode
stack and restores the User Mode stack back to its original state (by
invoking restore_sigcontext( )
). When the system
call terminates, the normal program can thus resume its execution.
Let’s now examine in detail how this scheme is carried out.
To properly set the User Mode stack
of the process, the handle_signal( )
function
invokes either setup_frame( )
(for signals that do
not require a siginfo_t
table; see
Section 10.4 later in this chapter)
or setup_rt_frame( )
(for signals that do require
a siginfo_t
table). To choose among these two
functions, the kernel checks the value of the
SA_SIGINFO
flag in the sa_flags
field of the sigaction
table associated with the
signal.
The setup_frame( )
function receives four
parameters, which have the following meanings:
sig
Signal number
ka
Address of the k_sigaction
table associated with
the signal
oldset
Address of a bit mask array of blocked signals
regs
Address in the Kernel Mode stack area where the User Mode register contents are saved
The setup_frame( )
function pushes onto the User
Mode stack a data structure called a
frame
,
which contains the information needed to handle the signal and to
ensure the correct return to the sys_sigreturn( )
function. A frame is a sigframe
table that
includes the following fields (see Figure 10-3):
pretcode
Return address of the signal handler function; it points to the
retcode
field (later in this list) in the same
table.
sig
The signal number; this is the parameter required by the signal handler.
sc
S
tructure of type sigcontext
containing the hardware context of the User Mode process right before
switching to Kernel Mode (this information is copied from the Kernel
Mode stack of current
). It also contains a bit
array that specifies the blocked regular signals of the process.
fpstate
Structure of type _fpstate
that may be used to
store the floating point registers of the User Mode process (see
Section 3.3.4).
extramask
Bit array that specifies the blocked real-time signals.
retcode
Eight-byte code issuing a sigreturn( )
system
call; this code is executed when returning from the signal handler.
The setup_frame( )
function starts by invoking
get_sigframe( )
to compute the first memory
location of the frame. That memory location is usually[71]
in the User Mode stack, so the function returns the value:
(regs->esp - sizeof(struct sigframe)) & 0xfffffff8
Since stacks grow toward lower addresses, the initial address of the frame is obtained by subtracting its size from the address of the current stack top and aligning the result to a multiple of 8.
The returned address is then verified by means of the
access_ok
macro; if it is valid, the function
repeatedly invokes _ _put_user( )
to fill all the
fields of the frame. Once this is done, it modifies the
regs
area of the Kernel Mode stack, thus ensuring
that control is transferred to the signal handler when
current
resumes its execution in User Mode:
regs->esp = (unsigned long) frame; regs->eip = (unsigned long) ka->sa.sa_handler;
The setup_frame( )
function terminates by
resetting the segmentation registers saved on the Kernel Mode stack
to their default value. Now the information needed by the signal
handler is on the top of the User Mode stack.
The setup_rt_frame( )
function is very similar to
setup_frame( )
, but it puts on the User Mode stack
an extended frame
(stored in the
rt_sigframe
data structure) that also includes the
content of the siginfo_t
table associated with the
signal.
After setting up the User
Mode stack, the handle_signal( )
function checks
the values of the flags associated with the signal.
If the received signal has the SA_ONESHOT
flag
set, it must be reset to its default action so that further
occurrences of the same signal will not trigger the execution of the
signal handler:
if (ka->sa.sa_flags & SA_ONESHOT) ka->sa.sa_handler = SIG_DFL;
Moreover, if the signal does not have the
SA_NODEFER
flag set, the signals in the
sa_mask
field of the sigaction
table must be blocked during the execution of the signal handler:
if (!(ka->sa.sa_flags & SA_NODEFER)) { spin_lock_irq(¤t->sigmask_lock); sigorsets(¤t->blocked, ¤t->blocked, &ka->sa.sa_mask); sigaddset(¤t->blocked, sig); recalc_sigpending(current); spin_unlock_irq(¤t->sigmask_lock); }
As described earlier, the recalc_sigpending( )
function checks whether the process has nonblocked pending signals
and sets its sigpending
field accordingly.
The function returns then to do_signal( )
, which
also returns immediately.
When
do_signal( )
returns, the current process resumes
its execution in User Mode. Because of the preparation by
setup_frame( )
described earlier, the
eip
register points to the first instruction of
the signal handler, while esp
points to the first
memory location of the frame that has been pushed on top of the User
Mode stack. As a result, the signal handler is executed.
When the signal handler terminates, the return address on top of the
stack points to the code in the retcode
field of
the frame. For signals without siginfo_t
table,
the code is equivalent to the following assembly language
instructions:
popl %eax movl $_ _NR_sigreturn, %eax int $0x80
Therefore, the signal number (that is, the sig
field of the frame) is discarded from the stack, and the
sigreturn( )
system call is then invoked.
The sys_sigreturn( )
function computes the address
of the pt_regs
data structure
regs
, which contains the hardware context of the
User Mode process (see Section 9.2.3).
From the value stored in the esp
field, it can
thus derive and check the frame address inside the User Mode stack:
frame = (struct sigframe *)(regs.esp - 8); if (verify_area(VERIFY_READ, frame, sizeof(*frame)) { force_sig(SIGSEGV, current); return 0; }
Then the function copies the bit array of signals that were blocked
before invoking the signal handler from the sc
field of the frame to the blocked
field of
current
. As a result, all signals that have been
masked for the execution of the signal handler are unblocked. The
recalc_sigpending( )
function is then invoked.
The sys_sigreturn( )
function must at this point
copy the process hardware context from the sc
field of the frame to the Kernel Mode stack and remove the frame from
the User Mode stack; it performs these two tasks by invoking the
restore_sigcontext( )
function.
If the signal was sent by a system call like
rt_sigqueueinfo( )
that required a
siginfo_t
table to be associated to the signal,
the mechanism is very similar. The return code in the
retcode
field of the extended frame invokes the
rt_sigreturn( )
system call; the corresponding
sys_rt_sigreturn( )
service routine copies the
process hardware context from the extended frame to the Kernel Mode
stack and restores the original User Mode stack content by removing
the extended frame from it.
The
request associated with a system call cannot always be immediately
satisfied by the kernel; when this happens, the process that issued
the system call is put in a TASK_INTERRUPTIBLE
or
TASK_UNINTERRUPTIBLE
state.
If the process is put in a TASK_INTERRUPTIBLE
state and some other process sends a signal to it, the kernel puts it
in the TASK_RUNNING
state without completing the
system call (see Section 4.8). When this
happens, the system call service routine does not complete its job,
but returns an EINTR
,
ERESTARTNOHAND
, ERESTARTSYS
, or
ERESTARTNOINTR
error code. The signal is delivered
to the process while switching back to User Mode.
In practice, the only error code a User Mode process can get in this
situation is EINTR
, which means that the system
call has not been completed. (The application programmer may check
this code and decide whether to reissue the system call.) The
remaining error codes are used internally by the kernel to specify
whether the system call may be reexecuted automatically after the
signal handler termination.
Table 10-6 lists the error codes related to unfinished system calls and their impact for each of the three possible signal actions. The terms that appear in the entries are defined in the following list:
The system call will not be automatically reexecuted; the process
will resume its execution in User Mode at the instruction following
the int $0x80
one and the eax
register will contain the -EINTR
value.
The kernel forces the User Mode process to reload the
eax
register with the system call number and to
reexecute the int $0x80
instruction; the process
is not aware of the reexecution and the error code is not passed to
it.
The system call is reexecuted only if the
SA_RESTART
flag of the delivered signal is set;
otherwise, the system call terminates with a
-EINTR
error code.
Table 10-6. Reexecution of system calls
Signal Action |
Error codes and their impact on system call execution | |||
---|---|---|---|---|
EINTR |
ERESTARTSYS |
ERESTARTNOHAND |
ERESTARTNOINTR | |
Default |
Terminate |
Reexecute |
Reexecute |
Reexecute |
Ignore |
Terminate |
Reexecute |
Reexecute |
Reexecute |
Catch |
Terminate |
Depends |
Terminate |
Reexecute |
When delivering a signal, the kernel must be sure that the process
really issued a system call before attempting to reexecute it. This
is where the orig_eax
field of the
regs
hardware context plays a critical role.
Let’s recall how this field is initialized when the
interrupt or exception handler starts:
The field contains the IRQ number associated with the interrupt minus 256 (see Section 4.6.1.4).
0x80
exception
The field contains the system call number (see Section 9.2.2).
The field contains the value -1 (see Section 4.5.1).
Therefore, a non-negative value in the orig_eax
field means that the signal has woken up a
TASK_INTERRUPTIBLE
process that was sleeping in a
system call. The service routine recognizes that the system call was
interrupted, and thus returns one of the previously mentioned error
codes.
If the signal is explicitly ignored or if its default action is
enforced, do_signal( )
analyzes the error code of
the system call to decide whether the unfinished system call must be
automatically reexecuted, as specified in Table 10-6. If the call must be restarted, the function
modifies the regs
hardware context so that, when
the process is back in User Mode, eip
points to
the int $0x80
instruction and
eax
contains the system call number:
if (regs->orig_eax >= 0) { if (regs->eax == -ERESTARTNOHAND || regs->eax == -ERESTARTSYS || regs->eax == -ERESTARTNOINTR) { regs->eax = regs->orig_eax; regs->eip -= 2; } }
The regs->eax
field is filled with the return
code of a system call service routine (see Section 9.2.2).
If the signal is caught, handle_signal( )
analyzes
the error code and, possibly, the SA_RESTART
flag
of the sigaction
table to decide whether the
unfinished system call must be reexecuted:
if (regs->orig_eax >= 0) { switch (regs->eax) { case -ERESTARTNOHAND: regs->eax = -EINTR; break; case -ERESTARTSYS: if (!(ka->sa.sa_flags & SA_RESTART)) { regs->eax = -EINTR; break; } /* fallthrough */ case -ERESTARTNOINTR: regs->eax = regs->orig_eax; regs->eip -= 2; } }
If the system call must be restarted, handle_signal( )
proceeds exactly as do_signal( )
;
otherwise, it returns an -EINTR
error code to the
User Mode process.
[71] Linux allows processes to specify an alternate stack for their
signal handlers by invoking the sigaltstack( )
system call; this feature is also requested by the X/Open standard.
When an alternate stack is present, the get_sigframe( )
function returns an address inside that stack. We
don’t discuss this feature further, since it is
conceptually similar to regular signal handling.