It’s time to think outside the box.
You’ve already seen that you can build complex applications by connecting small tools together on the command line. But what if you want to use other programs from inside your own code? In this chapter, you’ll learn how to use system services to create and control processes. That will give your programs access to email, the Web, and any other tool you’ve got installed. By the end of the chapter, you’ll have the power to go beyond C.
C programs rely on the operating system for pretty much
everything. They make system calls if
they want to talk to the hardware. System calls are just functions that
live inside the operating system’s kernel. Most of the code in the C Standard
Library depends on them. Whenever you call printf()
to display something on the command
line, somewhere at the back of things, a system call will be made to the
operating system to send the string of text to the screen.
Let’s look at an example of a system call. We’ll begin with one
called (appropriately) system()
.
system()
takes a single string
parameter and executes it as if you had typed it on the command
line:
The system()
function is an
easy way of running other programs from your code—particularly if you’re
creating a quick prototype and you’d sooner call external programs
rather than write lots and lots of C code.
There’s a downside to the system()
function. It’s quick and easy to use,
but it’s also kinda sloppy. Before getting into the problems with
system()
, let’s see what it takes to
break the program.
The code worked by stitching together a string containing a command, like this:
But what if someone entered a comment like this?
By injecting some command-line code into the text, you can make the program run whatever code you like:
Is this a big problem? If a user can run guard_log
, she can just as easily run some
other program. But what if your code has been called from a
web server? Or if it’s processing data from a
file?
This example injects a piece of code to list the contents of the root directory, but it could have deleted files or launched a virus. But you shouldn’t just worry about security.
What if the comments contain apostrophes?
That might break the quotes in the command.
What if the PATH variable causes the system() function to call the wrong program?
What if the program we’re calling needs to have a specific set of environment variables set up first?
The system()
function is easy
to use, but most of the time, you’re going to need something more
structured—some way of calling a specific program,
with a set of command-line arguments and maybe even some
environment variables.
What’s the kernel?
On most machines, system calls are functions that live inside the kernel of the operating system. But what is the kernel? You never actually see the kernel on the screen, but it’s always there, controlling your computer. The kernel is the most important program on your computer, and it’s in charge of three things:
Processes
No program can run on the system without the kernel loading it into memory. The kernel creates processes and makes sure they get the resources they need. The kernel also watches for processes that become too greedy or crash.
Memory
Your machine has a limited supply of memory, so the kernel has to carefully ration the amount of memory each process can take. The kernel can increase the virtual memory size by quietly loading and unloading sections of memory to disk.
Hardware
The kernel uses device drivers to talk to the equipment that’s plugged into the computer. Your program can use the keyboard and the screen and the graphics processor without knowing too much about them, because the kernel talks to them on your behalf.
System calls are the functions that your program uses to talk to the kernel.
When you call the system()
function, the operating system has to
interpret the command string and decide which programs to run and how to
run them. And that’s where the problem is: the operating system needs to
interpret the string, and you’ve already seen how
easy it is to get that wrong. So, the solution is to remove the
ambiguity and tell the operating
system precisely which program you want to run. That’s what the
exec()
functions are for.
A process is just a program running in memory. If you type
taskmgr
on Windows or ps -ef
on
most other machines, you’ll see the processes running on your system.
The operating system tracks each process with a number called the
process identifier ( PID).
The exec()
functions
replace the current process by
running some other program. You can say which command-line
arguments or environment variables to
use, and when the new program starts it will have exactly the same PID
as the old one. It’s like a relay race, where your program hands over
its process to the new program.
A process is a program running in memory.
Over time, programmers have created several different
versions of exec()
. Each version has
a slightly different name and its own set of parameters. Even though
there are lots of versions, there are really just two groups of exec()
functions: the list functions and the array functions.
The exec() functions are in unistd.h.
The list functions accept command-line arguments as a list of parameters, like this:
The program.
This might be the full pathname of the program— execl()
/ execle()
—or just a command name to
search for— execlp()
—but the
first parameter tells the exec()
function what program it will
run.
The command-line arguments.
You need to list one by one the command-line arguments you
want to use. Remember: the first command-line argument is always
the name of the program. That means the first two parameters
passed to a list version of exec()
should always be the same string.
NULL.
That’s right. After the last command-line argument, you need
a NULL
. This tells the function
that there are no more arguments.
Environment variables (maybe).
If you call an exec()
function whose name ends with ...e()
, you can also pass an array of
environment variables. This is just an array of strings like
"POWER=4", "SPEED=17", "PORT=OPEN",
...
.
If you already have your command-line arguments stored in an array, you might find these two versions easier to use:
The only difference between these two functions is that execvp
will
search for the program using the PATH
variable.
Every process has a set of environment
variables. These are the values you see when you type
set
or env
on the command line, and they usually tell
the process useful information, such as the location of the home
directory or where to find the commands. C programs can read environment
variables with the getenv()
system call. You can see
getenv()
being used in the diner_info
program on the right.
If you want to run a program using command-line arguments and environment variables, you can do it like this:
The execle()
function will set
the command-line arguments and environment variables and then replace
the current process with diner_info
.
If there’s a problem calling the program, the existing process will keep running. That’s useful, because it means that if you can’t start that second process, you’ll be able to recover from the error and give the user more information on what went wrong. And luckily, the C Standard Library provides some built-in code to help you with that.
Because system calls depend on something outside your program, they might go wrong in some way that you can’t control. To deal with this problem, most system calls go wrong in the same way.
Take the execle()
call, for
example. It’s really easy to see when an exec()
call goes wrong. If an exec()
call is successful, the current program
stops running. So, if the program runs anything
after the call to exec()
, there must
have been a problem:
But just telling if a system call worked is not enough. You normally want to know why a system call failed. That’s why most system calls follow the golden rules of failure.
Tidy up as much as you can.
Set the errno variable to an error value.
Return –1.
The errno
variable is a global variable
that’s defined in errno.h, along with a whole bunch
of standard error values, like:
Now you could check the value of errno
against each of these values, or you
could look up a standard piece of error text using a function in
string.h called strerror()
:
So, if the system can’t find the program you are running and it
sets the errno
variable to ENOENT
, the above code will display this
message:
No such file or directory
RSS feeds are a common way for websites to publish their latest news stories. Each RSS feed is just an XML file containing a summary of stories and links. Of course, it’s possible to write a C program that will read RSS files straight off the Web, but it involves a few programming ideas that you haven’t seen yet. But that’s not a problem if you can find another program that will handle the RSS processing for you.
Do this!
Download RSS Gossip from https://github.com/dogriffiths/rssgossip/zipball/master. Also, if you don’t have Python installed, you can get it here: http://www.python.org/. |
RSS Gossip is a small Python script that can search RSS feeds for stories containing a piece of text. To run the script, you will need Python installed. Once you have Python and rssgossip.py, you can search for stories like this:
The exec()
functions
replace the current function by running a new
program. But what happens to the original program? It terminates, and it
terminates immediately. That’s why
the program only ran the rssgossip.py script for
the first newsfeed. After it had called execle()
the first time, the newshound
program terminated.
But if you want to start another process and keep your original process running, how do you do it?
You’re going to get around this problem by using a system call
named fork()
.
fork()
makes a complete
copy of the current process. The
brand-new copy will be running the same program, on the same line
number. It will have exactly the same variables that contain exactly
the same values. The only difference is that the copy process will
have a different process identifier from the original.
The original process is called the parent process, and the newly created copy is called the child process.
But how can cloning the current process fix the problems with
exec()
? Let’s see.
The trick is to only call an exec()
function on a child
process. That way, your original parent process will be able
to continue running. Let’s look at the process step by step.
Begin by making a copy of your current process by calling the
fork()
system call.
The processes need some way of telling which of them is the
parent process and which is the child, so the fork()
function returns 0 to the child
process, and it will return a nonzero value to the parent process.
At this point, you have two identical processes running, both of
them using identical code. But the child process (the one that
received a 0 from the fork()
call)
now needs to replace itself by calling exec()
:
Now you have two separate processes: the child process is running the rssgossip.py script, and the original parent process is free to continue doing something else.
You’ve got Chapter 9 under your belt, and now you’ve added processes and system calls to your toolbox. For a complete list of tooltips in the book, see Appendix B.