9. Processes and System Calls: Breaking boundaries

System calls are your hotline to the OS

C programs rely on the operating system for pretty much everything. They make system calls if they want to talk to the hardware. System calls are just functions that live inside the operating system’s kernel. Most of the code in the C Standard Library depends on them. Whenever you call printf() to display something on the command line, somewhere at the back of things, a system call will be made to the operating system to send the string of text to the screen.

Let’s look at an example of a system call. We’ll begin with one called (appropriately) system().

system() takes a single string parameter and executes it as if you had typed it on the command line:

The system() function is an easy way of running other programs from your code—particularly if you’re creating a quick prototype and you’d sooner call external programs rather than write lots and lots of C code.

Let’s compile the program and then watch it in action:

Now, when you look in the same directory as the program, there’s a new file that’s been created called reports.log:

The program worked. It read a comment from the command line and called the echo command to add the comment to the end of the file.

Even though you could have written the whole program in C, by using system(), you simplified the program and got it working with very little work.

Q:
Does the system() function get compiled into my program?
A:
No. The system() function—like all system calls—doesn’t live in your program. It lives in the main operating system.
Q:
So, when I make a system call, I’m making a call to some external piece of code, like a library?
A:
Kind of. But the details depend on the operating system. On some operating systems, the code for a system call lives inside the kernel of the operating system. On other operating systems, it might simply be stored in some dynamic library.

Then someone busted into the system...

There’s a downside to the system() function. It’s quick and easy to use, but it’s also kinda sloppy. Before getting into the problems with system(), let’s see what it takes to break the program.

The code worked by stitching together a string containing a command, like this:

But what if someone entered a comment like this?

By injecting some command-line code into the text, you can make the program run whatever code you like:

Is this a big problem? If a user can run guard_log, she can just as easily run some other program. But what if your code has been called from a web server? Or if it’s processing data from a file?

Security’s not the only problem

This example injects a piece of code to list the contents of the root directory, but it could have deleted files or launched a virus. But you shouldn’t just worry about security.

What if the comments contain apostrophes?
That might break the quotes in the command.
What if the PATH variable causes the system() function to call the wrong program?
What if the program we’re calling needs to have a specific set of environment variables set up first?

The system() function is easy to use, but most of the time, you’re going to need something more structured—some way of calling a specific program, with a set of command-line arguments and maybe even some environment variables.

Geek Bits

What’s the kernel?

On most machines, system calls are functions that live inside the kernel of the operating system. But what is the kernel? You never actually see the kernel on the screen, but it’s always there, controlling your computer. The kernel is the most important program on your computer, and it’s in charge of three things:

Processes

No program can run on the system without the kernel loading it into memory. The kernel creates processes and makes sure they get the resources they need. The kernel also watches for processes that become too greedy or crash.

Memory

Your machine has a limited supply of memory, so the kernel has to carefully ration the amount of memory each process can take. The kernel can increase the virtual memory size by quietly loading and unloading sections of memory to disk.

Hardware

The kernel uses device drivers to talk to the equipment that’s plugged into the computer. Your program can use the keyboard and the screen and the graphics processor without knowing too much about them, because the kernel talks to them on your behalf.

System calls are the functions that your program uses to talk to the kernel.

The exec() functions give you more control

When you call the system() function, the operating system has to interpret the command string and decide which programs to run and how to run them. And that’s where the problem is: the operating system needs to interpret the string, and you’ve already seen how easy it is to get that wrong. So, the solution is to remove the ambiguity and tell the operating system precisely which program you want to run. That’s what the exec() functions are for.

exec() functions replace the current process

A process is just a program running in memory. If you type taskmgr on Windows or ps -ef on most other machines, you’ll see the processes running on your system. The operating system tracks each process with a number called the process identifier ( PID).

The exec() functions replace the current process by running some other program. You can say which command-line arguments or environment variables to use, and when the new program starts it will have exactly the same PID as the old one. It’s like a relay race, where your program hands over its process to the new program.

A process is a program running in memory.

There are many exec() functions

Over time, programmers have created several different versions of exec(). Each version has a slightly different name and its own set of parameters. Even though there are lots of versions, there are really just two groups of exec() functions: the list functions and the array functions.

The exec() functions are in unistd.h.

The list functions: execl(), execlp(), execle()

The list functions accept command-line arguments as a list of parameters, like this:

The program.
This might be the full pathname of the program— execl()/ execle()—or just a command name to search for— execlp()—but the first parameter tells the exec() function what program it will run.
The command-line arguments.
You need to list one by one the command-line arguments you want to use. Remember: the first command-line argument is always the name of the program. That means the first two parameters passed to a list version of exec() should always be the same string.
NULL.
That’s right. After the last command-line argument, you need a NULL. This tells the function that there are no more arguments.
Environment variables (maybe).
If you call an exec() function whose name ends with ...e(), you can also pass an array of environment variables. This is just an array of strings like "POWER=4", "SPEED=17", "PORT=OPEN", ....

Watch it!

Spaces in command line arguments can confuse MinGW.

If you pass two arguments “I like” and “turtles,” MinGW programs might send three arguments: “I,” “like,” and “turtles.”

The array functions: execv(), execvp(), execve()

If you already have your command-line arguments stored in an array, you might find these two versions easier to use:

The only difference between these two functions is that execvp will search for the program using the PATH variable.

You can figure out which exec() function you need by constructing the name. Each exec() function can be followed by one or two characters that must be l, v, p, or e. The characters tell you which feature you want to use. So, for the execle() function:

execle = exec + l + e = LIST of arguments + an ENVIRONMENT

The l and v characters always come before p and e, and the p and e characters are optional.

Uses	Character
List of args	l
Array/vector of args	v
Search the path	p
Environment vars	e

Passing environment variables

Every process has a set of environment variables. These are the values you see when you type set or env on the command line, and they usually tell the process useful information, such as the location of the home directory or where to find the commands. C programs can read environment variables with the getenv() system call. You can see getenv() being used in the diner_info program on the right.

If you want to run a program using command-line arguments and environment variables, you can do it like this:

The execle() function will set the command-line arguments and environment variables and then replace the current process with diner_info.

But what if there’s a problem?

If there’s a problem calling the program, the existing process will keep running. That’s useful, because it means that if you can’t start that second process, you’ll be able to recover from the error and give the user more information on what went wrong. And luckily, the C Standard Library provides some built-in code to help you with that.

Watch it!

If you’re passing an environment on Cygwin, be sure to include a PATH variable.

On Cygwin, the PATH variable is needed when programs are loaded. So, if you’re passing environment variables on Cygwin, be sure to include PATH=/usr/bin.

Most system calls go wrong in the same way

Because system calls depend on something outside your program, they might go wrong in some way that you can’t control. To deal with this problem, most system calls go wrong in the same way.

Take the execle() call, for example. It’s really easy to see when an exec() call goes wrong. If an exec() call is successful, the current program stops running. So, if the program runs anything after the call to exec(), there must have been a problem:

But just telling if a system call worked is not enough. You normally want to know why a system call failed. That’s why most system calls follow the golden rules of failure.

The Golden Rules of Failure

Tidy up as much as you can.
Set the errno variable to an error value.
Return –1.

The errno variable is a global variable that’s defined in errno.h, along with a whole bunch of standard error values, like:

Now you could check the value of errno against each of these values, or you could look up a standard piece of error text using a function in string.h called strerror():

So, if the system can’t find the program you are running and it sets the errno variable to ENOENT, the above code will display this message:

No such file or directory

Different machines have different commands to tell you about their network configuration. On Linux and Mac machines, there’s the /sbin/ifconfig program, and on Windows there’s a command called ipconfig that’s stored somewhere on the command path.

This program tries to run the /sbin/ifconfig program and, if that fails, it will try the ipconfig command. There’s no need to pass arguments to either command. Think carefully. What type of exec() commands will you need?

Different machines have different commands to tell you about their network configuration. On Linux and Mac machines, there’s the /sbin/ifconfig program, and on Windows there’s a command called ipconfig that’s stored somewhere on the command path.

This program tries to run the /sbin/ifconfig program and, if that fails, it will try the ipconfig command. There’s no need to pass arguments to either command. Think carefully. What type of exec() commands will you need?

Q:
Isn’t system() just easier to use than exec()?
A:
Yes. But because the operating system needs to interpret the string you pass to system(), it can be a bit buggy. Particularly if you create the command string dynamically.
Q:
Why are there so many exec() functions?
A:
Over time, people wanted to create processes in different ways. The different versions of exec() were created for more flexibility.
Q:
Do I always have to check the return value of a system call? Doesn’t it make the program really long?
A:
If you make system calls and don’t check for errors, your code will be shorter. But it will probably also have more bugs. It is better to think about errors when you first write code. It will make it much easier to catch bugs later on.
Q:
If I call an exec() function, can I do anything afterward?
A:
No. If the exec() function is successful, it will change the process so that it runs the new program instead of your program. That means the program containing the exec() call will stop as soon as it runs the exec() function.

System calls are functions that live in the operating system.
When you make a system call, you are calling code outside your program.
system() is a system call to run a command string.
system() is easy to use, but it can cause bugs.
The exec() system calls let you run programs with more control.
There are several versions of the exec() system call.
System calls usually, but not always, return –1 if there’s a problem.
They will also set the errno variable to an error number.

The guys over at Starbuzz have come up with a new order-generation program that they call coffee:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
  char *w = getenv("EXTRA");
  if (!w)
    w = getenv("FOOD");
  if (!w)
    w = argv[argc - 1];
  char *c = getenv("EXTRA");
  if (!c)
    c = argv[argc - 1];
  printf("%s with %s
", c, w);
  return 0;
}

To try it out, they’ve created this test program. Can you match up these code fragments to the output they produce?

The guys over at Starbuzz have come up with a new order-generation program that they call coffee:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
  char *w = getenv("EXTRA");
  if (!w)
    w = getenv("FOOD");
  if (!w)
    w = argv[argc - 1];
  char *c = getenv("EXTRA");
  if (!c)
    c = argv[argc - 1];
  printf("%s with %s
", c, w);
  return 0;
}

To try it out, they’ve created this test program. Can you match up these code fragments to the output they produce?

Read the news with RSS

RSS feeds are a common way for websites to publish their latest news stories. Each RSS feed is just an XML file containing a summary of stories and links. Of course, it’s possible to write a C program that will read RSS files straight off the Web, but it involves a few programming ideas that you haven’t seen yet. But that’s not a problem if you can find another program that will handle the RSS processing for you.

Do this!

Download RSS Gossip from https://github.com/dogriffiths/rssgossip/zipball/master. Also, if you don’t have Python installed, you can get it here: http://www.python.org/.

RSS Gossip is a small Python script that can search RSS feeds for stories containing a piece of text. To run the script, you will need Python installed. Once you have Python and rssgossip.py, you can search for stories like this:

The editor wants a program on his machine that can search a lot of RSS feeds all at the same time. You could do that if you ran the rssgossip.py several times for different RSS feeds. Fortunately, the out-of-work actors have made a start on the program for you. Trouble is, they’re having problems creating the call to exec() the rssgossip.py script. Think carefully about what you need to do to run the script, and then complete the newshound code.

And for extra bonus points...

What will the program do when it runs?

The editor wants a program on his machine that can search a lot of RSS feeds all at the same time. You could do that if you ran the rssgossip.py several times for different RSS feeds. Fortunately, the out-of-work actors have made a start on the program for you. Trouble is, they’re having problems creating the call to exec() the rssgossip.py script. You were to think carefully about what you need to do to run the script, and then complete the newshound code.

But what will the program do when you run it?

When you compile and run the program, it looks like it works:

The newshound program has the rssgossip.py script using data from the array of RSS feeds.

Actually there is a problem.

Although the newshound program managed to run the rssgossip.py script, it looks like it didn’t manage to run the script for all of the feeds. In fact, the only news it displayed came from the first feed on the list. That meant the other news stories matching the search terms were missed.

Brain Power

Look at the code of the newshound program again and think about how it works. Why do you think it failed to run the rssgossip.py script for any of the other newsfeeds?

exec() is the end of the line for your program

The exec() functions replace the current function by running a new program. But what happens to the original program? It terminates, and it terminates immediately. That’s why the program only ran the rssgossip.py script for the first newsfeed. After it had called execle() the first time, the newshound program terminated.

But if you want to start another process and keep your original process running, how do you do it?

fork() will clone your process

You’re going to get around this problem by using a system call named fork().

fork() makes a complete copy of the current process. The brand-new copy will be running the same program, on the same line number. It will have exactly the same variables that contain exactly the same values. The only difference is that the copy process will have a different process identifier from the original.

The original process is called the parent process, and the newly created copy is called the child process.

But how can cloning the current process fix the problems with exec()? Let’s see.

Watch it!

Unlike Linux and the Mac, Windows doesn’t support fork() natively.

To use fork() on a Windows machine, you should first install Cygwin.

Running a child process with fork() + exec()

The trick is to only call an exec() function on a child process. That way, your original parent process will be able to continue running. Let’s look at the process step by step.

1. Make a copy

Begin by making a copy of your current process by calling the fork() system call.

The processes need some way of telling which of them is the parent process and which is the child, so the fork() function returns 0 to the child process, and it will return a nonzero value to the parent process.

2. If you’re the child process, call exec()

At this point, you have two identical processes running, both of them using identical code. But the child process (the one that received a 0 from the fork() call) now needs to replace itself by calling exec():

Now you have two separate processes: the child process is running the rssgossip.py script, and the original parent process is free to continue doing something else.

You call fork() like this:

pid_t pid = fork();

fork() will actually return an integer value that is 0 for the child process and positive for the parent process. The parent process will receive the process identifier of the child process.

But what is pid_t? Different operating systems use different kinds of integers to store process IDs: some might use shorts and some might use ints. So pid_t is always set to the type that the operating system uses.

Now, if you compile and run the code, this happens:

By fork-ing a copy of itself and then exec-ing the Python script in a separate process, the newshound program is able to run a separate process for each of the RSS feeds. And the great thing is that these processes will all run at the same time.

That’s a lot faster than reading the newsfeeds one at a time. By learning how to create and run separate processes with fork() and exec(), not only can you make the most of your existing software, but you can also improve the performance of your code.

Q:
Does system() run programs in a separate process?
A:
Yes. But system() gives you less control over exactly how the program runs.
Q:
Isn’t fork-ing processes really inefficient? I mean, it copies an entire process, and then a moment later we replace the child process by doing an exec()?
A:
Operating systems use lots of tricks to make fork-ing processes really quick. For example, the operating system cheats and avoids making an actual copy of the parent process’s data. Instead, the child and parent processes share the same data.
Q:
But what if one of the processes changes some data in memory? Won’t that screw things up?
A:
It would, but the operating system will catch that a piece of memory is going to change, and then it will make a separate copy of that piece of memory for the child process.
Q:
That technique sounds quite cool. Does it have a name?
A:
Yes; it’s called “copy-on-write.”
Q:
Is a pid_t just an int?
A:
It depends on the platform. The only thing you know is that it will be some integer type.
Q:
I stored the result of a fork() call in an int, and it worked just fine.
A:
It’s best to always use pid_t to store process IDs. If you don’t, you might cause problems with other system calls or if your code is compiled on another machine.
Q:
Why doesn’t Windows support the fork() system call?
A:
Windows manages processes very differently from other operating systems, and the kinds of tricks fork() needs to do in order to work efficiently are very hard to do on Windows. This may be why there isn’t a version of fork() built in.
Q:
But Cygwin lets me do fork()s on Windows, right?
A:
Yes. The gurus who work on Cygwin did a lot of work to make Windows processes look like processes that are used on Unix, Linux, and the Mac. But because they still need to rely on Windows to create the underlying processes, fork() on Cygwin can be a little slower than fork() on other platforms.
Q:
So, if I’m just interested in writing code to work on Windows, is there something else I should use instead?
A:
Yes. There’s a function called CreateProcess() that’s like an enhanced version of system(). To find out more, go to http://msdn.microsoft.com and search for “CreateProcess.”
Q:
Won’t the output of the various feeds get mixed up?
A:
The operating system will make sure that each string is printed completely.

Your C Toolbox

You’ve got Chapter 9 under your belt, and now you’ve added processes and system calls to your toolbox. For a complete list of tooltips in the book, see Appendix B.

Table of Contents for
9. Processes and System Calls: Breaking boundaries

Chapter 9. Processes and System Calls: Breaking boundaries

System calls are your hotline to the OS

Then someone busted into the system...

Security’s not the only problem

Geek Bits

The exec() functions give you more control

exec() functions replace the current process

There are many exec() functions

The list functions: execl(), execlp(), execle()

Watch it!

The array functions: execv(), execvp(), execve()

Passing environment variables

But what if there’s a problem?

Watch it!

Most system calls go wrong in the same way

The Golden Rules of Failure

Read the news with RSS

Brain Power

exec() is the end of the line for your program

fork() will clone your process

Watch it!

Running a child process with fork() + exec()

1. Make a copy

2. If you’re the child process, call exec()

Your C Toolbox

Table of Contents for 9. Processes and System Calls: Breaking boundaries

Create new playlist

Sign In

Sign Up

Chapter 9. Processes and System Calls: Breaking boundaries

System calls are your hotline to the OS

Then someone busted into the system...

Security’s not the only problem

Geek Bits

The exec() functions give you more control

exec() functions replace the current process

There are many exec() functions

The list functions: execl(), execlp(), execle()

Watch it!

The array functions: execv(), execvp(), execve()

Passing environment variables

But what if there’s a problem?

Watch it!

Most system calls go wrong in the same way

The Golden Rules of Failure

Read the news with RSS

Brain Power

exec() is the end of the line for your program

fork() will clone your process

Watch it!

Running a child process with fork() + exec()

1. Make a copy

2. If you’re the child process, call exec()

Your C Toolbox

Table of Contents for
9. Processes and System Calls: Breaking boundaries