Here is a solution to these issues: memory mapping via the mmap(2) system call. Linux provides the very powerful mmap(2) system call; it enables the developer to map any content directly into the process virtual address space (VAS). This content includes file data, hardware device (adapter) memory regions, or just generic memory regions. In this chapter, we shall only focus on using mmap(2) to map in a regular file's content into the process VAS. Before getting into how the mmap(2) becomes a solution to the memory wastage issue we just discussed, we first need to understand more about using the mmap(2) system call itself.
The signature of the mmap(2) system call is shown here:
#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset);
We want to map a given region of a file, from a given offset and for length bytes into our process VAS; a simplistic view of what we want to achieve is depicted in this diagram:
To achieve this file mapping to process VAS, we use the mmap(2) system call. Glancing at its signature, it's quite obvious what we need to do first: open the file to be mapped via the open(2) (in the appropriate mode: read-only or read-write, depending on what you want to do), thereby obtaining a file descriptor; pass this descriptor as the fifth parameter to mmap(2). The file region to be mapped into the process VAS can be specified via the sixth and second parameters respectively—the file offset at which the mapping should begin and the length (in bytes).
The first parameter, addr, is a hint to the kernel as to where in the process VAS the mapping should be created; the recommendation is to pass 0 (NULL) here, allowing the OS to decide the location of the new mapping. This is the correct portable way to use the mmap(2); however, some applications (and, yes, some malicious security hacks too!) use this parameter to try to predict where the mapping will occur. In any case, the actual (virtual) address where the mapping is created within the process VAS is the return value from the mmap(2); a NULL return indicates failure and must be checked for.
Another point to note is that most mappings—and always file mappings—are performed to page granularity, that is, in multiples of the page size; thus, the return address is usually page-aligned.
The third parameter to mmap(2) is an integer bitmask prot—the memory protections of the given region (recall we have already come across memory protections in Chapter 4, Dynamic Memory Allocation, in the Memory protection section). The prot parameter is a bitmask and can either be just the PROT_NONE bit (implying no permissions) or the bitwise OR of the remainder; this table enumerates the bits and their meaning:
Protection bit | Meaning |
PROT_NONE | No access allowed on the page(s) |
PROT_READ | Reads allowed on the page(s) |
PROT_WRITE | Writes allowed on the page(s) |
PROT_EXEC | Execute access allowed on the page(s) |
The page protections must match those of the file's open(2), of course. Also note that, on older x86 systems, writable memory used to imply readable memory (that is, PROT_WRITE => PROT_READ). This is no longer the case; you must explicitly specify whether the mapped pages are readable or not (the same holds true for executable pages too: it must be specified, the text segment being the canonical example). Why would you use PROT_NONE? A guard page is one realistic example (recall the Stack guards section from Chapter 14, Multithreading with Pthreads Part I - Essentials).