Chapter 2. Getting Started with the Kernel

In this chapter, we introduce some of the Basics of the Linux kernel: where to get its source, how to compile it, and how to install the new kernel. We then go over some kernel assumptions, differences between the kernel and user-space programs, and common methods used in the kernel.

The kernel has some intriguing differences over other beasts, but certainly nothing that cannot be tamed. Let's tackle it.

Obtaining the Kernel Source

The current Linux source code is always available in both a complete tarball and an incremental patch from the official home of the Linux kernel, http://www.kernel.org.

Unless you have a specific reason to work with an older version of the Linux source, you always want the latest code. The repository at kernel.org is the place to get it, along with additional patches from a number of leading kernel developers.

Installing the Kernel Source

The kernel tarball is distributed in both GNU zip (gzip) and bzip2 format. Bzip2 is the default and preferred format, as it generally compresses quite a bit better than gzip. The Linux kernel tarball in bzip2 format is named linux-x.y.z.tar.bz2, where x.y.z is the version of that particular release of the kernel source. After downloading the source, uncompressing and untarring it is simple. If your tarball is compressed with bzip2, run

$ tar xvjf linux-x.y.z.tar.bz2

If it is compressed with GNU zip, run

$ tar xvzf linux-x.y.z.tar.gz

This uncompresses and untars the source to the directory linux-x.y.z.

Using Patches

Throughout the Linux kernel community, patches are the lingua franca of communication. You will distribute your code changes in patches as well as receive code from others as patches. More relevant to the moment is the incremental patches that are provided to move from one version of the kernel source to another. Instead of downloading each large tarball of the kernel source, you can simply apply an incremental patch to go from one version to the next. This saves everyone bandwidth and you time. To apply an incremental patch, from inside your kernel source tree, simply run

$ patch –p1 < ../patch-x.y.z

Generally, a patch to a given version of the kernel is applied against the previous version.

Generating and applying patches is discussed in much more depth in later chapters.

The Kernel Source Tree

The kernel source tree is divided into a number of directories, most of which contain many more subdirectories. The directories in the root of the source tree, along with their descriptions, are listed in Table 2.1.

Table 2.1. Directories in the Root of the Kernel Source Tree

Directory

Description

arch

Architecture-specific source

crypto

Crypto API

Documentation

Kernel source documentation

drivers

Device drivers

fs

The VFS and the individual file systems

include

Kernel headers

init

Kernel boot and initialization

ipc

Interprocess communication code

kernel

Core subsystems, such as the scheduler

lib

Helper routines

mm

Memory management subsystem and the VM

net

Networking subsystem

scripts

Scripts used to build the kernel

security

Linux Security Module

sound

Sound subsystem

usr

Early user-space code (called initramfs)

A number of files in the root of the source tree deserve mention. The file COPYING is the kernel license (the GNU GPL v2). CREDITS is a listing of developers with a more than trivial amount of code in the kernel. MAINTAINERS lists the names of the individuals who maintain subsystems and drivers in the kernel. Finally, Makefile is the base kernel Makefile.

Building the Kernel

Building the kernel is easy. In fact, it is surprisingly easier than compiling and installing other system-level components, such as glibc. The 2.6 kernel series introduces a new configuration and build system, which makes the job even easier and is a welcome improvement over 2.4.

Because the Linux source code is available, it follows that you are able to configure and custom tailor it before compiling. Indeed, it is possible to compile support into your kernel for just the features and drivers you require. Configuring the kernel is a required step before building it. Because the kernel offers a myriad of features and supports tons of varied hardware, there is a lot to configure. Kernel configuration is controlled by configuration options, which are prefixed by CONFIG in the form CONFIG_FEATURE. For example, symmetrical multiprocessing (SMP) is controlled by the configuration option CONFIG_SMP. If this option is set, SMP is enabled; if unset, SMP is disabled. The configure options are used both to decide which files to build and to manipulate code via preprocessor directives.

Configuration options that control the build process are either Booleans or tristates. A Boolean option is either yes or no. Kernel features, such as CONFIG_PREEMPT, are usually Booleans. A tristate option is one of yes, no, or module. The module setting represents a configuration option that is set, but is to be compiled as a module (that is, a separate dynamically loadable object). In the case of tristates, a yes option explicitly means to compile the code into the main kernel image and not a module. Drivers are usually represented by tristates.

Configuration options can also be strings or integers. These options do not control the build process but instead specify values that kernel source can access as a preprocessor macro. For example, a configuration option can specify the size of a statically allocated array.

Vendor kernels, such as those provided by Novell and Red Hat, are precompiled as part of the distribution. Such kernels typically enable a good cross section of the needed kernel features and compile nearly all the drivers as modules. This provides for a great base kernel with support for a wide range of hardware as separate modules. Unfortunately, as a kernel hacker, you will have to compile your own kernels and learn what modules to include or not include on your own.

Thankfully, the kernel provides multiple tools to facilitate configuration. The simplest tool is a text-based command-line utility:

$ make config

This utility goes through each option, one by one, and asks the user to interactively select yes, no, or (for tristates) module. Because this takes a long time, unless you are paid by the hour, you should use an ncurses-based graphical utility:

$ make menuconfig

Or an X11-based graphical utility:

$ make xconfig

Or, even better, a gtk+-based graphical utility:

$ make gconfig

These three utilities divide the various configuration options into categories, such as “Processor type and features.” You can move through the categories, view the kernel options, and of course change their values.

The command

$ make defconfig

creates a configuration based on the defaults for your architecture. Although these defaults are somewhat arbitrary (on i386, they are rumored to be Linus's configuration!), they provide a good start if you have never configured the kernel before. To get off and running quickly, run this command and then go back and ensure that configuration options for your hardware are enabled.

The configuration options are stored in the root of the kernel source tree, in a file named .config. You may find it easier (as most of the kernel developers do) to just edit this file directly. It is quite easy to search for and change the value of the configuration options. After making changes to your configuration file, or when using an existing configuration file on a new kernel tree, you can validate and update the configuration:

$ make oldconfig

You should always run this before building a kernel, in fact. After the kernel configuration is set, you can build it:

$ make

Unlike kernels before 2.6, you no longer need to run make dep before building the kernel—the dependency tree is maintained automatically. You also do not need to specify a specific build type, such as bzImage, or build modules separately, as you did in old versions. The default Makefile rule will handle everything!

Minimizing Build Noise

A trick to minimize build noise, but still see warnings and errors, is to redirect the output from make(1):

$ make > ../some_other_file

If you do need to see the build output, you can read the file. Because the warnings and errors are output to standard error, however, you normally do not need to. In fact, I just do

$ make > /dev/null

which redirects all the worthless output to that big ominous sink of no return, /dev/null.

Spawning Multiple Build Jobs

The make(1) program provides a feature to split the build process into a number of jobs. Each of these jobs then runs separately and concurrently, significantly speeding up the build process on multiprocessing systems. It also improves processor utilization because the time to build a large source tree also includes some time spent in I/O wait (time where the process is idle waiting for an I/O request to complete).

By default, make(1) spawns only a single job. Makefiles all too often have their dependency information screwed up. With incorrect dependencies, multiple jobs can step on each other's toes, resulting in errors in the build process. The kernel's Makefiles, naturally, have no such coding mistakes. To build the kernel with multiple jobs, use

$ make -jn

where n is the number of jobs to spawn. Usual practice is to spawn one or two jobs per processor. For example, on a dual processor machine, one might do

$ make –j4

Using utilities such as the excellent distcc(1) or ccache(1) can also dramatically improve kernel build time.

Installing the Kernel

After the kernel is built, you need to install it. How it is installed is very architecture and boot loader dependent—consult the directions for your boot loader on where to copy the kernel image and how to set it up to boot. Always keep a known-safe kernel or two around in case your new kernel has problems!

As an example, on an x86 using grub, you would copy arch/i386/boot/bzImage to /boot, name it something like vmlinuz-version, and edit /boot/grub/grub.conf with a new entry for the new kernel. Systems using LILO to boot would instead edit /etc/lilo.conf and then rerun lilo(8).

Installing modules, thankfully, is automated and architecture-independent. As root, simply run

% make modules_install

to install all the compiled modules to their correct home in /lib.

The build process also creates the file System.map in the root of the kernel source tree. It contains a symbol lookup table, mapping kernel symbols to their start addresses. This is used during debugging to translate memory addresses to function and variable names.

A Beast of a Different Nature

The kernel has several differences compared to normal user-space applications that, although not making it necessarily harder to program than user-space, certainly provide unique challenges to kernel development.

These differences make the kernel a beast of a different nature. Some of the usual rules are bent; other rules are entirely new. Although some of the differences are obvious (we all know the kernel can do anything it wants), others are not so obvious. The most important of these differences are

  • The kernel does not have access to the C library.

  • The kernel is coded in GNU C.

  • The kernel lacks memory protection like user-space.

  • The kernel cannot easily use floating point.

  • The kernel has a small fixed-size stack.

  • Because the kernel has asynchronous interrupts, is preemptive, and supports SMP, synchronization and concurrency are major concerns within the kernel.

  • Portability is important.

Let's briefly look at each of these issues because all kernel development must keep them in mind.

No libc

Unlike a user-space application, the kernel is not linked against the standard C library (or any other library, for that matter). There are multiple reasons for this, including some chicken-and-the-egg situations, but the primary reason is speed and size. The full C library—or even a decent subset of it—is too large and too inefficient for the kernel.

Do not fret: Many of the usual libc functions have been implemented inside the kernel. For example, the common string manipulation functions are in lib/string.c. Just include <linux/string.h> and have at them.

Of the missing functions, the most familiar is printf(). The kernel does not have access to printf(), but it does have access to printk(). The printk() function copies the formatted string into the kernel log buffer, which is normally read by the syslog program. Usage is similar to printf():

printk("Hello world! A string: %s and an integer: %d
", a_string, an_integer);

One notable difference between printf() and printk() is that printk() allows you to specify a priority flag. This flag is used by syslogd(8) to decide where to display kernel messages. Here is an example of these priorities:

printk(KERN_ERR "this is an error!
");

We will use printk() throughout this book. Later chapters have more information on printk().

GNU C

Like any self-respecting Unix kernel, the Linux kernel is programmed in C. Perhaps surprisingly, the kernel is not programmed in strict ANSI C. Instead, where applicable, the kernel developers make use of various language extensions available in gcc (the GNU Compiler Collection, which contains the C compiler used to compile the kernel and most everything else written in C on a Linux system).

The kernel developers use both ISO C99[1] and GNU C extensions to the C language. These changes wed the Linux kernel to gcc, although recently other compilers, such as the Intel C compiler, have sufficiently supported enough gcc features that they too can compile the Linux kernel. The ISO C99 extensions that the kernel uses are nothing special and, because C99 is an official revision of the C language, are slowly cropping up in a lot of other code. The more interesting, and perhaps unfamiliar, deviations from standard ANSI C are those provided by GNU C. Let's look at some of the more interesting extensions that may show up in kernel code.

Inline Functions

GNU C supports inline functions. An inline function is, as its name suggests, inserted inline into each function call site. This eliminates the overhead of function invocation and return (register saving and restore), and allows for potentially more optimization because the compiler can optimize the caller and the called function together. As a downside (nothing in life is free), code size increases because the contents of the function are copied to all the callers, which increases memory consumption and instruction cache footprint. Kernel developers use inline functions for small time-critical functions. Making large functions inline, especially those that are used more than once or are not time critical, is frowned upon by the kernel developers.

An inline function is declared when the keywords static and inline are used as part of the function definition. For example:

static inline void dog(unsigned long tail_size)

The function declaration must precede any usage, or else the compiler cannot make the function inline. Common practice is to place inline functions in header files. Because they are marked static, an exported function is not created. If an inline function is used by only one file, it can instead be placed toward the top of just that file.

In the kernel, using inline functions is preferred over complicated macros for reasons of type safety.

Inline Assembly

The gcc C compiler enables the embedding of assembly instructions in otherwise normal C functions. This feature, of course, is used in only those parts of the kernel that are unique to a given system architecture.

The asm() compiler directive is used to inline assembly code.

The Linux kernel is programmed in a mixture of C and assembly, with assembly relegated to low-level architecture and fast path code. The vast majority of kernel code is programmed in straight C.

Branch Annotation

The gcc C compiler has a built-in directive that optimizes conditional branches as either very likely taken or very unlikely taken. The compiler uses the directive to appropriately optimize the branch. The kernel wraps the directive in very easy-to-use macros, likely() and unlikely().

For example, consider an if statement such as the following:

if (foo) {
        /* ... */
}

To mark this branch as very unlikely taken (that is, likely not taken):

/* we predict foo is nearly always zero ... */
if (unlikely(foo)) {
        /* ... */
}

Conversely, to mark a branch as very likely taken:

/* we predict foo is nearly always nonzero ... */
if (likely(foo)) {
        /* ... */
}

You should only use these directives when the branch direction is overwhelmingly a known priori or when you want to optimize a specific case at the cost of the other case. This is an important point: These directives result in a performance boost when the branch is correctly predicted, but a performance loss when the branch is mispredicted. A very common usage for unlikely() and likely() is error conditions. As one might expect, unlikely() finds much more use in the kernel because if statements tend to indicate a special case.

No Memory Protection

When a user-space application attempts an illegal memory access, the kernel can trap the error, send SIGSEGV, and kill the process. If the kernel attempts an illegal memory access, however, the results are less controlled. (After all, who is going to look after the kernel?) Memory violations in the kernel result in an oops, which is a major kernel error. It should go without saying that you must not illegally access memory, such as dereferencing a NULL pointer—but within the kernel, the stakes are much higher!

Additionally, kernel memory is not pageable. Therefore, every byte of memory you consume is one less byte of available physical memory. Keep that in mind next time you have to add one more feature to the kernel!

No (Easy) Use of Floating Point

When a user-space process uses floating-point instructions, the kernel manages the transition from integer to floating point mode. What the kernel has to do when using floating-point instructions varies by architecture, but the kernel normally catches a trap and does something in response.

Unlike user-space, the kernel does not have the luxury of seamless support for floating point because it cannot trap itself. Using floating point inside the kernel requires manually saving and restoring the floating point registers, among possible other chores. The short answer is: Don't do it; no floating point in the kernel.

Small, Fixed-Size Stack

User-space can get away with statically allocating tons of variables on the stack, including huge structures and many-element arrays. This behavior is legal because user-space has a large stack that can grow in size dynamically (developers of older, less intelligent operating systems—say, DOS—might recall a time when even user-space had a fixed-sized stack).

The kernel stack is neither large nor dynamic; it is small and fixed in size. The exact size of the kernel's stack varies by architecture. On x86, the stack size is configurable at compile-time and can be either 4 or 8KB. Historically, the kernel stack is two pages, which generally implies that it is 8KB on 32-bit architectures and 16KB on 64-bit architectures—this size is fixed and absolute. Each process receives its own stack.

The kernel stack is discussed in much greater detail in later chapters.

Synchronization and Concurrency

The kernel is susceptible to race conditions. Unlike a single-threaded user-space application, a number of properties of the kernel allow for concurrent access of shared resources and thus require synchronization to prevent races. Specifically,

  • Linux is a preemptive multi-tasking operating system. Processes are scheduled and rescheduled at the whim of the kernel's process scheduler. The kernel must synchronize between these tasks.

  • The Linux kernel supports multiprocessing. Therefore, without proper protection, kernel code executing on two or more processors can access the same resource.

  • Interrupts occur asynchronously with respect to the currently executing code. Therefore, without proper protection, an interrupt can occur in the midst of accessing a shared resource and the interrupt handler can then access the same resource.

  • The Linux kernel is preemptive. Therefore, without protection, kernel code can be preempted in favor of different code that then accesses the same resource.

Typical solutions to race conditions include spinlocks and semaphores.

Later chapters provide a thorough discussion of synchronization and concurrency.

Portability Is Important

Although user-space applications do not have to aim for portability, Linux is a portable operating system and should remain one. This means that architecture-independent C code must correctly compile and run on a wide range of systems, and that architecture-dependent code must be properly segregated in system-specific directories in the kernel source tree.

A handful of rules—such as remain endian neutral, be 64-bit clean, do not assume the word or page size, and so on—go a long way. Portability is discussed in extreme depth in a later chapter.

So Here We Are

The kernel is indeed a unique and inimitable beast: No memory protection, no tried-and-true libc, a small stack, a huge source tree. The Linux kernel plays by its own rules, running with the big boys and stopping just long enough to break the customs with which we are familiar. Despite this, however, the kernel is just a program. It is not very different from the usual, the accustomed, the status quo. Do not be afraid: Stand up to it, call it names, push it around.

Realizing that the kernel is not as daunting as first appearances might suggest is the first step on the road to having everything just make sense. To reach that utopia, however, you have to jump in, read the source, hack the source, and not be disheartened.

The introduction in the previous chapter and the basics in this chapter will—I hope—lay the foundation for the monument of knowledge we will construct throughout the rest of this book. In the following chapters, we will look at specific concepts of and subsystems in the kernel.



[1] ISO C99 is the latest major revision to the ISO C standard. C99 adds numerous enhancements to the previous major revision, ISO C90, including named structure initializers and a complex type. The latter of which you cannot use safely from within the kernel.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset