Home Page Icon
Home Page
Table of Contents for
Understanding the Linux Kernel, 2nd Edition
Close
Understanding the Linux Kernel, 2nd Edition
by Marco Cesati, Daniel P. Bovet
Understanding the Linux Kernel, Second Edition
Understanding the Linux Kernel, 2nd Edition
Preface
The Audience for This Book
Organization of the Material
Level of Description
Overview of the Book
Background Information
Conventions in This Book
How to Contact Us
Acknowledgments
1. Introduction
Linux Versus Other Unix-Like Kernels
Hardware Dependency
Linux Versions
Basic Operating System Concepts
Multiuser Systems
Users and Groups
Processes
Kernel Architecture
An Overview of the Unix Filesystem
Files
Hard and Soft Links
File Types
File Descriptor and Inode
Access Rights and File Mode
File-Handling System Calls
Opening a file
Accessing an opened file
Closing a file
Renaming and deleting a file
An Overview of Unix Kernels
The Process/Kernel Model
Process Implementation
Reentrant Kernels
Process Address Space
Synchronization and Critical Regions
Nonpreemptive kernels
Interrupt disabling
Semaphores
Spin locks
Avoiding deadlocks
Signals and Interprocess Communication
Process Management
Zombie processes
Process groups and login sessions
Memory Management
Virtual memory
Random access memory usage
Kernel Memory Allocator
Process virtual address space handling
Swapping and caching
Device Drivers
2. Memory Addressing
Memory Addresses
Segmentation in Hardware
Segmentation Registers
Segment Descriptors
Fast Access to Segment Descriptors
Segmentation Unit
Segmentation in Linux
Paging in Hardware
Regular Paging
Extended Paging
Hardware Protection Scheme
An Example of Regular Paging
Three-Level Paging
The Physical Address Extension (PAE) Paging Mechanism
Hardware Cache
Translation Lookaside Buffers (TLB)
Paging in Linux
The Linear Address Fields
Page Table Handling
Reserved Page Frames
Process Page Tables
Kernel Page Tables
Provisional kernel Page Tables
Final kernel Page Table when RAM size is less than 896 MB
Final kernel Page Table when RAM size is between 896 MB and 4096 MB
Final kernel Page Table when RAM size is more than 4096 MB
Fix-Mapped Linear Addresses
Handling the Hardware Cache and the TLB
Handling the hardware cache
Handling the TLB
3. Processes
Processes, Lightweight Processes, and Threads
Process Descriptor
Process State
Identifying a Process
Processor descriptors handling
The current macro
The process list
Doubly linked lists
The list of TASK_RUNNING processes
The pidhash table and chained lists
Parenthood Relationships Among Processes
How Processes Are Organized
Wait queues
Handling wait queues
Process Resource Limits
Process Switch
Hardware Context
Task State Segment
The thread field
Performing the Process Switch
Saving the FPU, MMX, and XMM Registers
Creating Processes
The clone( ), fork( ), and vfork( ) System Calls
Kernel Threads
Creating a kernel thread
Process 0
Process 1
Other kernel threads
Destroying Processes
Process Termination
Process Removal
4. Interrupts and Exceptions
The Role of Interrupt Signals
Interrupts and Exceptions
IRQs and Interrupts
The Advanced Programmable Interrupt Controller (APIC)
Exceptions
Interrupt Descriptor Table
Hardware Handling of Interrupts and Exceptions
Nested Execution of Exception and Interrupt Handlers
Initializing the Interrupt Descriptor Table
Interrupt, Trap, and System Gates
Preliminary Initialization of the IDT
Exception Handling
Saving the Registers for the Exception Handler
Entering and Leaving the Exception Handler
Interrupt Handling
I/O Interrupt Handling
Interrupt vectors
IRQ data structures
IRQ distribution in multiprocessor systems
Saving the registers for the interrupt handler
The do_IRQ( ) function
Reviving a lost interrupt
Interrupt service routines
Dynamic allocation of IRQ lines
Interprocessor Interrupt Handling
Softirqs, Tasklets, and Bottom Halves
Softirqs
The softirq kernel threads
Tasklets
Bottom Halves
Extending a bottom half
Returning from Interrupts and Exceptions
The ret_ from_exception( ) Function
The ret_ from_intr( ) Function
The ret_ from_sys_call( ) Function
The ret_ from_ fork( ) Function
5. Kernel Synchronization
Kernel Control Paths
When Synchronization Is Not Necessary
Synchronization Primitives
Atomic Operations
Memory Barriers
Spin Locks
Read/Write Spin Locks
Getting and releasing a lock for reading
Getting and releasing a lock for writing
The Big Reader Lock
Semaphores
Getting and releasing semaphores
Read/Write Semaphores
Completions
Local Interrupt Disabling
Global Interrupt Disabling
Disabling Deferrable Functions
Synchronizing Accesses to Kernel Data Structures
Choosing Among Spin Locks, Semaphores, and Interrupt Disabling
Protecting a data structure accessed by exceptions
Protecting a data structure accessed by interrupts
Protecting a data structure accessed by deferrable functions
Protecting a data structure accessed by exceptions and interrupts
Protecting a data structure accessed by exceptions and deferrable functions
Protecting a data structure accessed by interrupts and deferrable functions
Protecting a data structure accessed by exceptions, interrupts, and deferrable functions
Examples of Race Condition Prevention
Reference Counters
The Global Kernel Lock
Memory Descriptor Read/Write Semaphore
Slab Cache List Semaphore
Inode Semaphore
6. Timing Measurements
Hardware Clocks
Real Time Clock
Time Stamp Counter
Programmable Interval Timer
CPU Local Timers
The Linux Timekeeping Architecture
Timekeeping Architecture in Uniprocessor Systems
PIT’s interrupt service routine
The TIMER_BH bottom half
Timekeeping Architecture in Multiprocessor Systems
Initialization of the timekeeping architecture
The local timer interrupt handler
CPU’s Time Sharing
Updating the Time and Date
Updating System Statistics
Checking the Current Process CPU Resource Limit
Keeping Track of System Load
Profiling the Kernel Code
Checking the NMI Watchdogs
Software Timers
Dynamic Timers
Dynamic timers and race conditions
Dynamic timers handling
An Application of Dynamic Timers
System Calls Related to Timing Measurements
The time( ), ftime( ), and gettimeofday( ) System Calls
The adjtimex( ) System Call
The setitimer( ) and alarm( ) System Calls
7. Memory Management
Page Frame Management
Page Descriptors
Memory Zones
Non-Uniform Memory Access (NUMA)
Initialization of the Memory Handling Data Structures
Requesting and Releasing Page Frames
Kernel Mappings of High-Memory Page Frames
Permanent kernel mappings
Temporary kernel mappings
The Buddy System Algorithm
Data structures
Allocating a block
Freeing a block
Memory Area Management
The Slab Allocator
Cache Descriptor
Slab Descriptor
General and Specific Caches
Interfacing the Slab Allocator with the Buddy System
Allocating a Slab to a Cache
Releasing a Slab from a Cache
Object Descriptor
Aligning Objects in Memory
Slab Coloring
Local Array of Objects in Multiprocessor Systems
Allocating an Object in a Cache
The uniprocessor case
The multiprocessor case
Releasing an Object from a Cache
The uniprocessor case
The multiprocessor case
General Purpose Objects
Noncontiguous Memory Area Management
Linear Addresses of Noncontiguous Memory Areas
Descriptors of Noncontiguous Memory Areas
Allocating a Noncontiguous Memory Area
Releasing a Noncontiguous Memory Area
8. Process Address Space
The Process’s Address Space
The Memory Descriptor
Memory Descriptor of Kernel Threads
Memory Regions
Memory Region Data Structures
Memory Region Access Rights
Memory Region Handling
Finding the closest region to a given address: find_vma( )
Finding a region that overlaps a given interval: find_vma_intersection( )
Finding a free interval: arch_get_unmapped_area( )
Inserting a region in the memory descriptor list: insert_vm_struct( )
Allocating a Linear Address Interval
Releasing a Linear Address Interval
First phase: scanning the memory regions
Second phase: updating the Page Tables
Page Fault Exception Handler
Handling a Faulty Address Outside the Address Space
Handling a Faulty Address Inside the Address Space
Demand Paging
Copy On Write
Handling Noncontiguous Memory Area Accesses
Creating and Deleting a Process Address Space
Creating a Process Address Space
Deleting a Process Address Space
Managing the Heap
9. System Calls
POSIX APIs and System Calls
System Call Handler and Service Routines
Initializing System Calls
The system_call( ) Function
Parameter Passing
Verifying the Parameters
Accessing the Process Address Space
Dynamic Address Checking: The Fixup Code
The exception tables
Generating the exception tables and the fixup code
Kernel Wrapper Routines
10. Signals
The Role of Signals
Actions Performed upon Delivering a Signal
Data Structures Associated with Signals
Operations on Signal Data Structures
Generating a Signal
The send_sig_info( ) and send_sig( ) Functions
The force_sig_info( ) and force_sig( ) Functions
Delivering a Signal
Ignoring the Signal
Executing the Default Action for the Signal
Catching the Signal
Setting up the frame
Evaluating the signal flags
Starting the signal handler
Terminating the signal handler
Reexecution of System Calls
System Calls Related to Signal Handling
The kill( ) System Call
Changing a Signal Action
Examining the Pending Blocked Signals
Modifying the Set of Blocked Signals
Suspending the Process
System Calls for Real-Time Signals
11. Process Scheduling
Scheduling Policy
Process Preemption
How Long Must a Quantum Last?
The Scheduling Algorithm
Data Structures Used by the Scheduler
Process descriptor
CPU’s data structures
The schedule( ) Function
Direct invocation
Lazy invocation
Actions performed by schedule( ) before a process switch
Actions performed by schedule( ) after a process switch
How good is a runnable process?
Scheduling on multiprocessor systems
Performance of the Scheduling Algorithm
The algorithm does not scale well
The predefined quantum is too large for high system loads
I/O-bound process boosting strategy is not optimal
Support for real-time applications is weak
System Calls Related to Scheduling
The nice( ) System Call
The getpriority( ) and setpriority( ) System Calls
System Calls Related to Real-Time Processes
The sched_getscheduler( ) and sched_setscheduler( ) system calls
The sched_ getparam( ) and sched_setparam( ) system calls
The sched_ yield( ) system call
The sched_ get_priority_min( ) and sched_ get_priority_max( ) system calls
The sched_rr_ get_interval( ) system call
12. The Virtual Filesystem
The Role of the Virtual Filesystem (VFS)
The Common File Model
System Calls Handled by the VFS
VFS Data Structures
Superblock Objects
Inode Objects
File Objects
dentry Objects
The dentry Cache
Files Associated with a Process
Filesystem Types
Special Filesystems
Filesystem Type Registration
Filesystem Mounting
Mounting the Root Filesystem
Mounting a Generic Filesystem
Unmounting a Filesystem
Pathname Lookup
Standard Pathname Lookup
Parent Pathname Lookup
Lookup of Symbolic Links
Implementations of VFS System Calls
The open( ) System Call
The read( ) and write( ) System Calls
The close( ) System Call
File Locking
Linux File Locking
File-Locking Data Structures
FL_FLOCK Locks
FL_POSIX Locks
13. Managing I/O Devices
I/O Architecture
I/O Ports
Accessing I/O ports
I/O Interfaces
Custom I/O interfaces
General-purpose I/O interfaces
Device Controllers
Mapping addresses of I/O shared memory
Accessing the I/O shared memory
Direct Memory Access (DMA)
Putting DMA to work
Device Files
Old-Style Device Files
Devfs Device Files
VFS Handling of Device Files
Device Drivers
Levels of Kernel Support
Buffering Strategies of Device Drivers
Registering a Device Driver
Initializing a Device Driver
Monitoring I/O Operations
Polling mode
Interrupt mode
Block Device Drivers
Keeping Track of Block Device Drivers
Initializing a Block Device Driver
Sectors, Blocks, and Buffers
Buffer Heads
An Overview of Block Device Driver Architecture
Request descriptors
Request queue descriptors
Block device low-level driver descriptor
The ll_rw_block( ) Function
Scheduling the activation of the strategy routine
Extending the request queue
Low-Level Request Handling
Block and Page I/O Operations
Block I/O operations
Page I/O operations
Character Device Drivers
14. Disk Caches
The Page Cache
The address_space Object
Page Cache Data Structures
The page hash table
The lists of page descriptors in the address_space object
Page descriptor fields related to the page cache
Page Cache Handling Functions
The Buffer Cache
Buffer Head Data Structures
The list of unused buffer heads
Lists of buffer heads for cached buffers
The hash table of cached buffer heads
Buffer usage counter
Buffer Pages
Allocating buffer pages
The getblk( ) Function
Writing Dirty Buffers to Disk
The bdflush kernel thread
The kupdate kernel thread
The sync( ), fsync( ), and fdatasync( ) system calls
15. Accessing Files
Reading and Writing a File
Reading from a File
The readpage method for regular files
The readpage method for block device files
Read-Ahead of Files
The accessed page is locked (synchronous read-ahead)
The accessed page is unlocked (asynchronous read-ahead)
Writing to a File
The prepare_write and commit_write methods for regular files
The prepare_write and commit_write methods for block device files
Memory Mapping
Memory Mapping Data Structures
Creating a Memory Mapping
Destroying a Memory Mapping
Demand Paging for Memory Mapping
Flushing Dirty Memory Mapping Pages to Disk
Direct I/O Transfers
16. Swapping: Methods for Freeing Memory
What Is Swapping?
Which Kind of Page to Swap Out
How to Distribute Pages in the Swap Areas
How to Select the Page to Be Swapped Out
When to Perform Page Swap Out
Swap Area
Swap Area Descriptor
Swapped-Out Page Identifier
Activating and Deactivating a Swap Area
The sys_swapon( ) service routine
The sys_swapoff( ) service routine
The try_to_unuse( ) function
Allocating and Releasing a Page Slot
The scan_swap_map( ) function
The get_swap_page( ) function
The swap_free( ) function
The Swap Cache
Swap Cache Helper Functions
Transferring Swap Pages
The rw_swap_ page( ) Function
The read_swap_cache_async( ) Function
The rw_swap_ page_nolock( ) Function
Swapping Out Pages
The try_to_swap_out( ) Function
Swapping in Pages
The do_swap_page( ) Function
Reclaiming Page Frame
Outline of the Page Frame Reclaiming Algorithm
The Least Recently Used (LRU) Lists
Moving pages across the LRU lists
The try_to_ free_ pages( ) Function
The shrink_caches( ) Function
The shrink_cache( ) Function
Reclaiming Page Frames from the Dentry and Inode Caches
Reclaiming page frames from the dentry cache
Reclaiming page frames from the inode cache
The kswapd Kernel Thread
17. The Ext2 and Ext3 Filesystems
General Characteristics of Ext2
Ext2 Disk Data Structures
Superblock
Group Descriptor and Bitmap
Inode Table
How Various File Types Use Disk Blocks
Regular file
Directory
Symbolic link
Device file, pipe, and socket
Ext2 Memory Data Structures
The ext2_sb_info and ext2_inode_info Structures
Bitmap Caches
Creating the Ext2 Filesystem
Ext2 Methods
Ext2 Superblock Operations
Ext2 Inode Operations
Ext2 File Operations
Managing Ext2 Disk Space
Creating Inodes
Deleting Inodes
Data Blocks Addressing
File Holes
Allocating a Data Block
Releasing a Data Block
The Ext3 Filesystem
Journaling Filesystems
The Ext3 Journaling Filesystem
The Journaling Block Device Layer
Log records
Atomic operation handles
Transactions
How Journaling Works
18. Networking
Main Networking Data Structures
Network Architectures
Network Interface Cards
BSD Sockets
INET Sockets
The Destination Cache
Routing Data Structures
The Forwarding Information Base (FIB)
The routing cache
The neighbor cache
The Socket Buffer
System Calls Related to Networking
The socket( ) System Call
Socket initialization
Socket’s files
The bind( ) System Call
The connect( ) System Call
Writing Packets to a Socket
Transport layer: the udp_sendmsg( ) function
Transport and network layers: the ip_build_xmit( ) function
Data link layer: composing the hardware header
Data link layer: enqueueing the socket buffer for transmission
Sending Packets to the Network Card
Receiving Packets from the Network Card
19. Process Communication
Pipes
Using a Pipe
Pipe Data Structures
The pipefs special filesystem
Creating and Destroying a Pipe
Reading from a Pipe
Writing into a Pipe
FIFOs
Creating and Opening a FIFO
System V IPC
Using an IPC Resource
The ipc( ) System Call
IPC Semaphores
Undoable semaphore operations
The queue of pending requests
IPC Messages
IPC Shared Memory
Swapping out pages of IPC shared memory regions
Demand paging for IPC shared memory regions
20. Program Execution
Executable Files
Process Credentials and Capabilities
Process capabilities
Command-Line Arguments and Shell Environment
Libraries
Program Segments and Process Memory Regions
Execution Tracing
Executable Formats
Execution Domains
The exec Functions
A. System Startup
Prehistoric Age: The BIOS
Ancient Age: The Boot Loader
Booting Linux from Floppy Disk
Booting Linux from Hard Disk
Middle Ages: The setup( ) Function
Renaissance: The startup_32( ) Functions
Modern Age: The start_kernel( ) Function
B. Modules
To Be (a Module) or Not to Be?
Module Implementation
Module Usage Counter
Exporting Symbols
Module Dependency
Linking and Unlinking Modules
Linking Modules on Demand
The modprobe Program
The request_module( ) Function
C. Source Code Structure
21. Bibliography
Books on Unix Kernels
Books on the Linux Kernel
Books on PC Architecture and Technical Manuals on Intel Microprocessors
Other Online Documentation Sources
Index
Colophon
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Preface
Understanding the Linux Kernel, 2nd Edition
Daniel P. Bovet
Marco Cesati
Editor
Andy Oram
Copyright © 2002 O'Reilly Media, Inc.
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset