Swap in must take place when a process attempts to address a page within its address space that has been swapped out to disk. The Page Fault exception handler triggers a swap-in operation when the following conditions occur (see Section 8.4.2):
The page including the address that caused the exception is a valid one—that is, it belongs to a memory region of the current process.
The page is not present in memory—that is, the
Present
flag in the Page Table entry is cleared.
The Page Table entry associated with the page is not null, which means it contains a swapped-out page identifier.
As described in Section 8.4.3, the
handle_pte_fault( )
function, invoked by the
do_page_fault( )
exception handler, checks whether
the Page Table entry is non-null. If so, it invokes a quite handy
do_swap_page( )
function to swap in the page
required.
This do_swap_page( )
function acts on the
following parameters:
mm
Memory descriptor address of the process that caused the Page Fault exception
vma
Memory region descriptor address of the region that includes
address
address
Linear address that causes the exception
page_table
Address of the Page Table entry that maps address
orig_pte
Content of the Page Table entry that maps address
write_access
Flag denoting whether the attempted access was a read or a write
Contrary to other functions, do_swap_page( )
never
returns 0. It returns 1 if the page is already in the swap cache
(minor fault), 2 if the page was read from the swap area (major
fault), and -1 if an error occurred while performing the swap in. It
essentially executes the following steps:
Releases the page_table_lock
spin lock of the
memory descriptor (it was acquired by the caller function
handle_pte_fault( )
).
Gets the swapped-out page identifier from orig_pte
.
Invokes lookup_swap_cache( )
to check whether the
swap cache already contains a page corresponding to the swapped-out
page identifier; if the page is already in the swap cache, it jumps
to Step 6.
Invokes the swapin_readahead( )
function to read
from the swap area a group of at most 2n pages,
including the requested one. The value n is
stored in the page_cluster
variable, and is
usually equal to 3.[113] Each page is read by invoking the
read_swap_cache_async( )
function.
Invokes read_swap_cache_async( )
once more to swap
in precisely the page accessed by the process that caused the Page
Fault. This step might appear redundant, but it
isn’t really. The swapin_readahead( )
function might fail in reading the requested
page—for instance, because page_cluster
is
set to 0 or the function tried to read a group of pages including a
defective page slot (SWAP_MAP_BAD
). On the other
hand, if swapin_readahead( )
succeeded, this
invocation of read_swap_cache_async( )
terminates
quickly because it finds the page in the swap cache.
If, despite all efforts, the requested page was not added to the swap
cache, another kernel control path might have already swapped in the
requested page on behalf of a clone of this process. This case is
checked by temporarily acquiring the
page_table_lock
spin lock and comparing the entry
to which page_table
points with
orig_pte
. If they differ, the page has already
been swapped in by some other kernel thread, so the function returns
1 (minor fault); otherwise, it returns -1 (failure).
At this point, we know that the page is in the swap cache. Invokes
mark_page_accessed( )
(see the later section Section 16.7.2) and locks the page.
Acquires the page_table_lock
spin lock.
Checks whether another kernel control path has swapped in the
requested page on behalf of a clone of this process. In this case,
releases the page_table_lock
spin lock, unlocks
the page, and returns 1 (minor fault).
Invokes swap_free( )
to decrement the usage
counter of the page slot corresponding to entry
.
Checks whether the swap cache is at least 50 percent full
(nr_swap_pages
is smaller than a half of
total_swap_pages
). If so, checks whether the page
is owned only by the process that caused the fault (or one of its
clones); if this is the case, removes the page from the swap cache.
Increments the rss
field of the
process’s memory descriptor.
Unlocks the page.
Updates the Page Table entry so the process can find the page. The
function accomplishes this by writing the physical address of the
requested page and the protection bits found in the
vm_page_prot
field of the memory region into the
Page Table entry addressed by page_table
.
Moreover, if the access that caused the fault was a write and the
faulting process is the unique owner of the page, the function also
sets the Dirty
flag and the
Read/Write
flag to prevent a useless Copy on Write
fault.
Releases the mm->page_table_lock
spin lock and
returns 1 (minor fault) or 2 (major fault).
[113] The system administrator may tune
this value by writing into the
/proc/sys/vm/page-cluster
file. Swap-in
read-ahead can be disabled by setting page_cluster
to 0.