Chapter 15. Job Control

Job control, a feature standardized by POSIX.1 and mandated by many standards, allows a single terminal to run multiple jobs. Each job is a group of one or more processes, usually connected by pipes. Mechanisms are provided to move jobs between the foreground and the background and to prevent background jobs from accessing the terminal.

Job Control Basics

Recall from Chapter 10 that each active terminal runs a single group of processes, called a session. Each session is made up of process groups, and each process group contains one or more individual processes.

One of the process groups in a session is the foreground process group. The rest are background process groups. The foreground process group may be changed to any process group belonging to the session, allowing the user to switch among foreground process groups. Processes that are members of the foreground process group are often called foreground processes; processes that are not are called background processes.

Restarting Processes

Every process is in one of three states: running, stopped, or zombied. Running processes are terminated by calling the exit() system call or by being sent a fatal signal. Processes are moved between the running and stopped states exclusively through signals generated by another process, the kernel, or themselves.[1]

When a process receives SIGCONT, the kernel moves it from the stopped state to the running state; if the process is already running, the signal does not affect its state. The process may catch the signal, with the kernel moving the process to the running state before delivering the signal.

Stopping Processes

Four signals move a running process to the stopped state. SIGSTOP is never generated by the kernel. It is provided to allow users to stop arbitrary processes. It cannot be caught or ignored; it always stops the target process. The other three signals that stop processes, SIGTSTP, SIGTTIN, and SIGTTOU, may be generated by the terminal on which the process is running or by another process. Although these signals behave similarly, they are generated under different circumstances.


This signal is sent to every process in a terminal’s foreground process group when a user presses the terminal’s suspend key.[2]


When a background process attempts to read from the terminal, it is sent SIGTTIN.


This process is normally generated by a background process attempting to write to its terminal. The signal is generated only if the terminal’s TOSTOP attribute is set, as discussed on page 387.


This signal is also generated by a background process calling either tcflush(), tcflow(), tcsetattr(), tcsetpgrp(), tcdrain(), or tcsendbreak().

[2] Normally, the suspend key is Ctrl-Z. The stty program allows users to change the suspend key for a terminal, and Chapter 16 details how a program can change it.

The default action of each of these three signals is to stop the process. They all may be caught or ignored. In both cases, the process is not stopped.

Handling Job Control Signals

Although many applications can be stopped and restarted with no ill effects, other processes need to handle process stops and starts. Most editors, for example, need to modify many of the terminal parameters while they are running. When users suspend the process, they expect their terminal to be restored to its default state.

When a process needs to perform actions before being suspended, it needs to provide a signal handler for SIGTSTP. This lets the kernel notify the process that it needs to suspend itself.

Upon receiving SIGTSTP, the process should immediately perform whatever actions it needs to take in order to allow suspension (such as restoring the terminal to its original state) and suspend itself. The simplest way for the process to suspend itself is by sending itself SIGSTOP. Most shells display messages that indicate which signal caused the process to stop, however, and if the process sent itself SIGSTOP, it would look different from most suspended processes. To avoid this nuisance, most applications reset their SIGTSTP handler to SIG_DFL and send themselves a SIGTSTP.

Processes that require special code for clean suspensions normally need to perform special actions when they are restarted. This is easily done by providing a signal handler for SIGCONT, which performs such actions. If the process suspends itself with SIGTSTP, such special actions probably include setting a signal handler for SIGTSTP.

The following code provides a simple signal handler for both SIGCONT and SIGTSTP. When the user suspends or restarts the process, the process displays a message before stopping or continuing.

 1: /* monitor.c */
 3: #include <signal.h>
 4: #include <stdio.h>
 5: #include <string.h>
 6: #include <unistd.h>
 8: void catchSignal(int sigNum, int useDefault);
10: void handler(int signum) {
11:     if (signum == SIGTSTP) {
12:         write(STDOUT_FILENO, "got SIGTSTP
", 12);
13:         catchSignal(SIGTSTP, 1);
14:         kill(getpid(), SIGTSTP);
15:     } else {
16:         write(STDOUT_FILENO, "got SIGCONT
", 12);
17:         catchSignal(SIGTSTP, 0);
18:     }
19: }
21: void catchSignal(int sigNum, int useDefault) {
22:     struct sigaction sa;
24:     memset(&sa, 0, sizeof(sa));
26:     if (useDefault)
27:         sa.sa_handler = SIG_DFL;
28:     else
29:         sa.sa_handler = handler;
31:     if (sigaction(sigNum, &sa, NULL)) perror("sigaction");
32: }
34: int main() {
35:     catchSignal(SIGTSTP, 0);
36:     catchSignal(SIGCONT, 0);
38:     while (1);
40:     return 0;
41: }

Job Control in ladsh

Adding job control to ladsh is the last addition to the simple shell, the final source code to which appears in Appendix B. The first step is to add a member to each of struct childProgram, struct job, and struct jobSet. As ladsh has not been discussed for a while, it may help to refer back to page 149, where these data structures were first introduced. Here is the final definition of struct childProgram:

35: struct childProgram {
36:     pid_t pid;              /* 0 if exited */
37:     char ** argv;           /* program name and arguments */
38:     int numRedirections;    /* elements in redirection array */
39:     struct redirectionSpecifier * redirections;  /* I/O redirs */
40:     glob_t globResult;      /* result of parameter globbing */
41:     int freeGlob;           /* should we free globResult? */
42:     int isStopped;          /* is the program currently running? */
43: };

We already differentiate between running children and terminated children through the pid member of struct childProgram —it is zero if the child has terminated and it contains a valid pid otherwise. The new member, isStopped, is nonzero if the process has been stopped and zero otherwise. Note that its value is meaningless if the pid member is zero.

An analogous change needs to be made to struct job. It previously kept track of the number of programs in a job and how many of those processes were still running. Its new member, stoppedProgs, records how many of the job’s processes are currently stopped. It could be calculated from the isStopped members of the children that comprise the job, but it is convenient to track it separately. This change gives the final form for struct job:

45: struct job {
46:     int jobId;              /* job number */
47:     int numProgs;           /* number of programs in job */
48:     int runningProgs;       /* number of programs running */
49:     char * text;            /* name of job */
50:     char * cmdBuf;          /* buffer various argv's point to */
51:     pid_t pgrp;             /* process group ID for the job */
52:     struct childProgram * progs; /* array of programs in job */
53:     struct job * next;      /* to track background commands */
54:     int stoppedProgs;      /* num of programs alive, but stopped */
55: };

Like previous versions of ladsh, ladsh4.c ignores SIGTTOU. It does this to allow tcsetpgrp() to be used even when the shell is not a foreground process. As the shell will have proper job control now, however, we do not want our children to ignore the signal. As soon as a new process is fork() ed by runCommand(), it sets the handler for SIGTTOU to SIG_DFL. This allows the terminal driver to suspend background processes that attempt to write to (or otherwise manipulate) the terminal. Here is the code that begins creating the child process, where SIGTTOU is reset and some additional synchronization work is performed:

514:       pipe(controlfds);
516:       if (!(newJob.progs[i].pid = fork())) {
517:           signal(SIGTTOU, SIG_DFL);
519:           close(controlfds[1]);
520:          /* this read will return 0 when the write side closes */
521:           read(controlfds[0], &len, 1);
522:           close(controlfds[0]);

The controlfds pipe is used to suspend the child process until after the shell has moved the child into the proper process group. By closing the write side of the pipe and reading from the read side, the child process stops until the parent closes the write side of the pipe, which happens after the setpgid() call on line 546. This type of mechanism is necessary to ensure that the child gets moved into the process group before the exec() occurs. If we waited until after the exec(), we are not assured that the process would be in the right process group before it starts accessing the terminal (which may not be allowed).

ladsh checks for terminated children in two places. The primary place is when it wait()s for processes in the foreground process group. When the foreground process has terminated or been stopped, ladsh checks for changes in the states of its background processes through the checkJobs() function. Both of these code paths need to be modified to handle stopped children, as well as terminated ones.

Adding the WUNTRACED flag to the waitpid() call, which waits on foreground processes, allows it to notice stopped processes, as well. When a process has been stopped rather than terminated, the child’s isStopped flag is set and the job’s stoppedProgs count is incremented. If all the programs in the job have been stopped, ladsh moves itself back to the foreground and waits for a user’s command. Here is how the portion of ladsh’s main loop that waits on the foreground process now looks:

708:         /* a job is running in the foreground; wait for it */
709:         i = 0;
710:         while (!jobList.fg->progs[i].pid ||
711:                jobList.fg->progs[i].isStopped) i++;
713:         waitpid(jobList.fg->progs[i].pid, &status, WUNTRACED);
715:         if (WIFSIGNALED(status) &&
716:                 (WTERMSIG(status) != SIGINT)) {
717:             printf("%s
", strsignal(status));
718:         }
720:         if (WIFEXITED(status) || WIFSIGNALED(status)) {
721:             /* the child exited */
722:             jobList.fg->runningProgs--;
723:             jobList.fg->progs[i].pid = 0;
725:             if (!jobList.fg->runningProgs) {
726:                 /* child exited */
728:                 removeJob(&jobList, jobList.fg);
729:                 jobList.fg = NULL;
731:                 /* move the shell to the foreground */
732:                 if (tcsetpgrp(0, getpid()))
733:                     perror("tcsetpgrp");
734:             }
735:         } else {
736:             /* the child was stopped */
737:             jobList.fg->stoppedProgs++;
738:             jobList.fg->progs[i].isStopped = 1;
740:             if (jobList.fg->stoppedProgs ==
741:                                 jobList.fg->runningProgs) {
742:                 printf("
743:                        jobList.fg->jobId,
744:                        "Stopped", jobList.fg->text);
745:                 jobList.fg = NULL;
746:             }
747:         }
749:             if (!jobList.fg) {
750:                 /* move the shell to the foreground */
751:                 if (tcsetpgrp(0, getpid()))
752:                     perror("tcsetpgrp");
753:             }
754:         }

Similarly, background tasks may be stopped by signals. We again add WUNTRACED to the waitpid(), which checks the states of background processes. When a background process has been stopped, the isStopped flag and stoppedProgs counter are updated, and if the entire job has been stopped, a message is printed.

The final ability that ladsh requires is to be able to move jobs between running in the foreground, running in the background, and being stopped. Two built-in commands allow this: fg and bg. They are limited versions of the normal shell commands that go by the same name. Both take a single parameter, which is a job number preceded by a % (for compatibility with standard shells). The fg command moves the specified job to the foreground; bg sets it running in the background.

Both chores are done by sending SIGCONT to every process in the process group being activated. Although it could send the signal to each process through separate kill() calls, it is slightly simpler to send it to the entire process group using a single kill(). Here is the implementation of the fg and bg built-in commands:

461:     } else if (!strcmp(newJob.progs[0].argv[0], "fg") ||
462:                !strcmp(newJob.progs[0].argv[0], "bg")) {
463:         if (!newJob.progs[0].argv[1] || newJob.progs[0].argv[2]) {
464:             fprintf(stderr,
465:                     "%s: exactly one argument is expected
466:                     newJob.progs[0].argv[0]);
467:             return 1;
468:         }
470:         if (sscanf(newJob.progs[0].argv[1], "%%%d", &jobNum) != 1) {
471:              fprintf(stderr, "%s: bad argument '%s'
472:                      newJob.progs[0].argv[0],
473:                      newJob.progs[0].argv[1]);
474:              return 1;
475:         }
477:         for (job = jobList->head; job; job = job->next)
478:             if (job->jobId == jobNum) break;
480:         if (!job) {
481:             fprintf(stderr, "%s: unknown job %d
482:                     newJob.progs[0].argv[0], jobNum);
483:             return 1;
484:         }
486:         if (*newJob.progs[0].argv[0] == 'f') {
487:             /* Make this job the foreground job */
489:             if (tcsetpgrp(0, job->pgrp))
490:                 perror("tcsetpgrp");
491:             jobList->fg = job;
492:         }
494:         /* Restart the processes in the job */
495:         for (i = 0; i < job->numProgs; i++)
496:             job->progs[i].isStopped = 0;
498:         kill(-job->pgrp, SIGCONT);
500:         job->stoppedProgs = 0;
502:         return 0;
503:     }

Job control was the final ability that ladsh required in order to be usable. It is still missing many features present in regular shells, such as shell and environment variables, but it illustrates all the low-level tasks that shells perform. The complete source code to the final version of ladsh appears in Appendix B.

[1] Stopped processes cannot generate signals, however, so they cannot restart themselves either.

