Of the six
typical cases mentioned earlier in Section 8.1, in which a process gets new
memory regions, the first one—issuing a fork( )
system call—requires the creation of a whole new
address space for the child process. Conversely, when a process
terminates, the kernel destroys its address space. In this section,
we discuss how these two activities are performed by Linux.
In Section 3.4.1, we mentioned that the
kernel invokes the copy_mm( )
function while
creating a new process. This function creates the process address
space by setting up all Page Tables and memory descriptors of the new
process.
Each process usually has its own address space, but lightweight
processes can be created by calling clone( )
with
the CLONE_VM
flag set. These processes share the
same address space; that is, they are allowed to address the same set
of pages.
Following the COW approach described earlier, traditional processes inherit the address space of their parent: pages stay shared as long as they are only read. When one of the processes attempts to write one of them, however, the page is duplicated; after some time, a forked process usually gets its own address space that is different from that of the parent process. Lightweight processes, on the other hand, use the address space of their parent process. Linux implements them simply by not duplicating address space. Lightweight processes can be created considerably faster than normal processes, and the sharing of pages can also be considered a benefit so long as the parent and children coordinate their accesses carefully.
If the new process has been created by means of the clone( )
system call and if the CLONE_VM
flag
of the flag
parameter is set, copy_mm( )
gives the clone (tsk
) the address
space of its parent (current
):
if (clone_flags & CLONE_VM) { atomic_inc(¤t->mm->mm_users); tsk->mm = current->mm; tsk->active_mm = current->mm; return 0; }
If the CLONE_VM
flag is not set, copy_mm( )
must create a new address space (even though no memory is
allocated within that address space until the process requests an
address). The function allocates a new memory descriptor, stores its
address in the mm
field of the new process
descriptor tsk
, and then initializes its fields:
tsk->mm = kmem_cache_alloc(mm_cachep, SLAB_KERNEL); tsk->active_mm = tsk->mm; memcpy(tsk->mm, current->mm, sizeof(*tsk->mm)); atomic_set(&tsk->mm->mm_users, 1); atomic_set(&tsk->mm->mm_count, 1); init_rwsem(&tsk->mm->mmap_sem); tsk->mm->page_table_lock = SPIN_LOCK_UNLOCKED; tsk->mm->pgd = pgd_alloc(tsk->mm);
Remember that the pgd_alloc( )
macro allocates a
Page Global Directory for the new process.
The dup_mmap( )
function is then invoked to
duplicate both the memory regions and the Page Tables of the parent
process:
down_write(¤t->mm->mmap_sem); dup_mmap(tsk->mm); up_write(¤t->mm->mmap_sem); copy_segments(tsk, tsk->mm);
The dup_mmap( )
function inserts the new memory
descriptor tsk->mm
in the global list of memory
descriptors. Then it scans the list of regions owned by the parent
process, starting from the one pointed by
current->mm->mmap
. It duplicates each
vm_area_struct
memory region descriptor
encountered and inserts the copy in the list of regions owned by the
child process.
Right after inserting a new memory region descriptor,
dup_mmap( )
invokes copy_page_range( )
to create, if necessary, the Page Tables needed to map
the group of pages included in the memory region and to initialize
the new Page Table entries. In particular, any page frame
corresponding to a private, writable page
(VM_SHARE
flag off and
VM_MAYWRITE
flag on) is marked as read-only for
both the parent and the child, so that it will be handled with the
Copy On Write mechanism. Before terminating, dup_mmap( )
also creates the red-black tree of memory regions of the
child process by invoking the build_mmap_rb( )
function.
Finally, copy_mm( )
invokes
copy_segments( )
, which initializes the
architecture-dependent portion of the child’s memory
descriptor. Essentially, if the parent has a custom LDT, a copy of it
is also assigned to the child.
When a
process terminates, the kernel invokes the exit_mm( )
function to release the address space owned by that
process:
mm_release(); if (tsk->mm) { atomic_inc(&tsk->mm->mm_count); mm = tsk->mm; tsk->mm = NULL; enter_lazy_tlb(mm, current, smp_processor_id()); mmput(mm); }
The mm_release( )
function wakes up any process
sleeping in the tsk->vfork_done
completion (see
Section 5.3.8). Typically, the
corresponding wait queue is nonempty only if the exiting process was
created by means of the vfork( )
system call (see
Section 3.4.1). The processor is also put
in lazy TLB mode (see Chapter 2).