Unix systems provide a family of
functions that replace the execution context of a process with a new
context described by an executable file. The names of these functions
start with the prefix exec
, followed by one or two
letters; therefore, a generic function in the family is usually
referred to as an exec
function.
The exec
functions are listed in Table 20-7; they differ in how the parameters are
interpreted.
Table 20-7. The exec functions
Function name |
PATH search |
Command-line arguments |
Environment array |
---|---|---|---|
|
No |
List |
No |
|
Yes |
List |
No |
|
No |
List |
Yes |
|
No |
Array |
No |
|
Yes |
Array |
No |
|
No |
Array |
Yes |
The first parameter of each function denotes the pathname of the file
to be executed. The pathname can be absolute or relative to the
process’s current directory. Moreover, if the name
does not include any / characters, the execlp( )
and execvp( )
functions search for the executable
file in all directories specified by the PATH
environment variable.
Besides the first parameter, the execl( )
,
execlp( )
, and execle( )
functions include a variable number of additional parameters. Each
points to a string describing a command-line argument for the new
program; as the
"l
" character in
the function names suggests, the parameters are organized in a list
terminated by a NULL
value. Usually, the first
command-line argument duplicates the executable filename. Conversely,
the execv( )
, execvp( )
, and
execve( )
functions specify the command-line
arguments with a single parameter; as the v
character in the function names suggests, the parameter is the
address of a vector of pointers to command-line argument strings. The
last component of the array must be NULL
.
The execle( )
and execve( )
functions receive as their last parameter the address of an array of
pointers to environment strings; as usual, the last component of the
array must be NULL
. The other functions may access
the environment for the new program from the external
environ
global variable, which is defined in the C
library.
All exec
functions, with the exception of
execve( )
, are wrapper routines defined in the C
library and use execve( )
, which is the only
system call offered by Linux to deal with program execution.
The sys_execve( )
service routine receives the
following parameters:
The address of the executable file pathname (in the User Mode address space).
The address of a NULL
-terminated array (in the
User Mode address space) of pointers to strings (again in the User
Mode address space); each string represents a command-line argument.
The address of a NULL
-terminated array (in the
User Mode address space) of pointers to strings (again in the User
Mode address space); each string represents an environment variable
in the NAME=value
format.
The function copies the executable file pathname into a newly
allocated page frame. It then invokes the do_execve( )
function, passing to it the pointers to the page frame,
to the pointer’s arrays, and to the location of the
Kernel Mode stack where the User Mode register contents are saved. In
turn, do_execve( )
performs the following
operations:
Statically allocates a linux_binprm
data
structure, which will be filled with data concerning the new
executable file.
Invokes path_init( )
, path_walk( )
, and dentry_open( )
to get the dentry
object, the file object, and the inode object associated with the
executable file. On failure, returns the proper error code.
Verifies that the executable file is not being written by checking
the i_writecount
field of the inode; stores
-1
in that field to forbid further write accesses.
Invokes the prepare_binprm( )
function to fill the
linux_binprm
data structure. This function, in
turn, performs the following operations:
Checks whether the permissions of the file allow its execution; if not, returns an error code.
Initializes the e_uid
and e_gid
fields of the linux_binprm
structure, taking into
account the values of the setuid and
setgid flags of the executable file. These
fields represent the effective user and group IDs, respectively. Also
checks process capabilities (a compatibility hack explained in the
earlier section Section 20.1.1).
Fills the buf
field of the
linux_binprm
structure with the first 128 bytes of
the executable file. These bytes include the magic number of the
executable format and other information suitable for recognizing the
executable file.
Copies the file pathname, command-line arguments, and environment strings into one or more newly allocated page frames. (Eventually, they are assigned to the User Mode address space.)
Invokes the search_binary_handler( )
function,
which scans the formats
list and tries to apply
the load_binary
method of each element, passing to
it the linux_binprm
data structure. The scan of
the formats
list terminates as soon as a
load_binary
method succeeds in acknowledging the
executable format of the file.
If the executable file format is not present in the
formats
list, releases all allocated page frames
and returns the error code -ENOEXEC
. Linux cannot
recognize the executable file format.
Otherwise, returns the code obtained from the
load_binary
method associated with the executable
format of the file.
The load_binary
method corresponding to an
executable file format performs the following operations (we assume
that the executable file is stored on a filesystem that allows file
memory mapping and that it requires one or more shared libraries):
Checks some magic numbers stored in the first 128 bytes of the file
to identify the executable format. If the magic numbers
don’t match, returns the error code
-ENOEXEC
.
Reads the header of the executable file. This header describes the program’s segments and the shared libraries requested.
Gets from the executable file the pathname of the program interpreter, which is used to locate the shared libraries and map them into memory.
Gets the dentry object (as well as the inode object and the file object) of the program interpreter.
Checks the execution permissions of the program interpreter.
Copies the first 128 bytes of the program interpreter into a buffer.
Performs some consistency checks on the program interpreter type.
Invokes the flush_old_exec( )
function to release
almost all resources used by the previous computation; in turn, this
function performs the following operations:
If the table of signal handlers is shared with other processes,
allocates a new table and decrements the usage counter of the old
one; this is done by invoking the make_private_signals( )
function.
Invokes the exec_mmap( )
function to release the
memory descriptor, all memory regions, and all page frames assigned
to the process and to clean up the process’s Page
Tables.
Updates the table of signal handlers by resetting each signal to its
default action. This is done by invoking the
release_old_signals( )
and
flush_signal_handlers( )
functions.
Sets the comm
field of the process descriptor with
the executable file pathname.
Invokes the flush_thread( )
function to clear the
values of the floating point registers and debug registers saved in
the TSS segment.
Invokes the de_thread( )
function to detach the
process from the old thread group (see Section 3.2.2).
Invokes the flush_old_files( )
function to close
all open files having the corresponding flag in the
files->close_on_exec
field of the process
descriptor set (see Section 12.2.6).[136]
Now we have reached the point of no return: the function cannot restore the previous computation if something goes wrong.
Sets up the new personality of the process—that is, the
personality
field in the process descriptor.
Clears the PF_FORKNOEXEC
flag in the process
descriptor. This flag, which is set when a process is forked and
cleared when it executes a new program, is required for process
accounting.
Invokes the setup_arg_pages( )
function to
allocate a new memory region descriptor for the
process’s User Mode stack and to insert that memory
region into the process’s address space.
setup_arg_pages( )
also assigns the page frames
containing the command-line arguments and the environment variable
strings to the new memory region.
Invokes the do_mmap( )
function to create a new
memory region that maps the text segment (that is, the code) of the
executable file. The initial linear address of the memory region
depends on the executable format, since the
program’s executable code is usually not
relocatable. Therefore, the function assumes that the text segment is
loaded starting from some specific logical address offset (and thus
from some specified linear address). ELF programs are loaded starting
from linear address 0x08048000
.
Invokes the do_mmap( )
function to create a new
memory region that maps the data segment of the executable file.
Again, the initial linear address of the memory region depends on the
executable format, since the executable code expects to find its
variables at specified offsets (that is, at specified linear
addresses). In an ELF program, the data segment is loaded right after
the text segment.
Allocates additional memory regions for any other specialized segments of the executable file. Usually, there are none.
Invokes a function that loads the program interpreter. If the program
interpreter is an ELF executable, the function is named
load_elf_interp( )
. In general, the function
performs the operations in Steps 11 through 13, but for the program
interpreter instead of the file to be executed. The initial addresses
of the memory regions that will include the text and data of the
program interpreter are specified by the program interpreter itself;
however, they are very high (usually above
0x40000000
) to avoid collisions with the memory
regions that map the text and data of the file to be executed (see
the earlier section Section 20.1.4).
Stores in the binfmt
field of the process
descriptor the address of the linux_binfmt
object
of the executable format.
Determines the new capabilities of the process.
Creates specific program interpreter tables and stores them on the User Mode stack between the command-line arguments and the array of pointers to environment strings (see Figure 20-1).
Sets the values of the start_code
,
end_code
, end_data
,
start_brk
, brk
, and
start_stack
fields of the
process’s memory descriptor.
Invokes the do_brk( )
function to create a new
anonymous memory region mapping the bss segment of the program. (When
the process writes into a variable, it triggers demand paging, and
thus the allocation of a page frame.) The size of this memory region
was computed when the executable program was linked. The initial
linear address of the memory region must be specified, since the
program’s executable code is usually not
relocatable. In an ELF program, the bss segment is loaded right after
the data segment.
Invokes the start_thread( )
macro to modify the
values of the User Mode registers eip
and
esp
saved on the Kernel Mode stack, so that they
point to the entry point of the program interpreter and to the top of
the new User Mode stack, respectively.
If the process is being traced, sends the SIGTRAP
signal to it.
Returns the value 0 (success).
When the execve( )
system call terminates and the
calling process resumes its execution in User Mode, the execution
context is dramatically changed: the code that invoked the system
call no longer exists. In this sense, we could say that
execve( )
never returns on success. Instead, a new
program to be executed is mapped in the address space of the process.
However, the new program cannot yet be executed, since the program interpreter must still take care of loading the shared libraries.[137]
Although the program interpreter runs in User Mode, we briefly sketch
out here how it operates. Its first job is to set up a basic
execution context for itself, starting from the information stored by
the kernel in the User Mode stack between the array of pointers to
environment strings and arg_start
. Then the
program interpreter must examine the program to be executed to
identify which shared libraries must be loaded and which functions in
each shared library are effectively requested. Next, the interpreter
issues several mmap( )
system calls to create
memory regions mapping the pages that will hold the library functions
(text and data) actually used by the program. Then the interpreter
updates all references to the symbols of the shared library,
according to the linear addresses of the library’s
memory regions. Finally, the program interpreter terminates its
execution by jumping to the main entry point of the program to be
executed. From now on, the process will execute the code of the
executable file and of the shared libraries.
As you may have noticed, executing a program is a complex activity that involves many facets of kernel design, such as process abstraction, memory management, system calls, and filesystems. It is the kind of topic that makes you realize what a marvelous piece of work Linux is!
[136] These flags can be read and
modified by means of the fcntl( )
system
call.
[137] Things are much simpler if the executable
file is statically linked—that is, if no shared library is
requested. The load_binary
method just maps the
text, data, bss, and stack segments of the program into the process
memory regions, and then sets the User Mode eip
register to the entry point of the new program.