Appendix B. Kernel Random Number Generator

The Linux kernel implements a strong random number generator that is theoretically capable of generating true random numbers. The random number generator gathers environmental noise from device drivers into an entropy pool. This pool is accessible from both user processes and the kernel as a source of data that is not only random but also non-deterministic to an outside attacker. Such numbers are of use in various applications, most notably cryptography.

True random numbers differ from the pseudo-random numbers generated by functions such as those found in the C library. Pseudo-random numbers are created by a deterministic function. Although the function may generate a sequence of numbers that exhibit some properties of a true random number, they are only statistically random. Pseudo-random numbers are deterministic: Knowing one number in the sequence provides information about the rest of the sequence. In fact, the initial value of the sequence (known as the seed) usually determines the entire sequence. For applications that need truly random and non-deterministic numbers, such as cryptography, a pseudo-random number is usually unacceptable.

As opposed to a pseudo-random number, a true random is produced independently of its generating function. Further, knowing some value in a sequence of truly random numbers does not enable an external party to deduce future values from the generator because the generator is non-deterministic.

From physics, entropy is a measurement of disorder and randomness in a system. In thermodynamics, entropy is measured in energy per unit temperature (Joules/Kelvin). When Claude Shannon[1], the founder of information theory, looked for a term to represent randomness in information, the great mathematician John von Neumann[2] supposedly suggested he use the term entropy because no one really understand what that meant anyhow. Shannon agreed, and today the term is sometimes called Shannon entropy. In hindsight, some scientists find the dual use confusing, and prefer simply the term uncertainty when discussing information. Kernel hackers, on the other hand, think entropy sounds cool and encourage its use.

In discussions of random number generators, Shannon entropy is an important property. It is measured in bits per symbol; high entropy implies there is little useful information (but lots of random junk) in a sequence of characters. The kernel maintains an entropy pool that is fed data obtained from non-deterministic device events. Ideally, this pool is entirely random. To help keep track of the entropy in the pool, the kernel keeps a measurement of the uncertainty of the data in the pool. As the kernel feeds data into the pool, it estimates the amount of randomness in the data that it is adding. Conversely, the kernel keeps track of entropy as it is removed from the pool. This measurement is called the entropy estimate. Optionally, the kernel can refuse a request for a random number if the entropy estimate is zero.

The kernel random number generator was introduced in kernel version 1.3.30 and lives at drivers/char/random.c in the kernel source.

Design and Implementation

Computers are predictable devices. Indeed, it is hard to find randomness in a system whose behavior is entirely programmed. The environment of the machine, however, is full of noise that is accessible and non-deterministic. Such sources include the timing of various hardware devices and user interaction. For example, the time between key presses, the movement of the mouse, the timing—particularly the low-order bits of such timing—between certain interrupts, and the time taken to complete a block I/O request are all both non-deterministic and not measurable by an outside attacker. Randomness from these values is taken and fed into the entropy pool. The pool grows to become a random and unpredictable mixture of noise. As the values are added to the pool, an estimate of the randomness is calculated and a tally is kept. This allows the kernel to keep track of the entropy in the pool. Figure B.1 is a diagram of the flow of entropy into and out of the pool.

The flow of entropy into and out of the kernel entropy pool.

Figure B.1. The flow of entropy into and out of the kernel entropy pool.

The kernel provides a set of interfaces to allow access to the entropy pool, both from within the kernel and from user-space. When the interfaces are accessed, the kernel first takes the SHA hash of the pool. SHA (Secure Hash Algorithm) is a message digest algorithm developed by the National Security Agency (NSA) and made a U.S. federal standard by NIST (via FIPS 186). A message digest is an algorithm that takes a variable-sized input (small or large) and outputs a fixed-size hash value (typically 128 or 160 bits) that is a “digest” of the original input. From the outputted hash value, the input cannot be reconstructed. Further, trivial manipulations to the input (for example, changing a single character) result in a radically different hash value. Message digest algorithms have various uses, including data verification and fingerprinting. Other message digest algorithms include MD4 and MD5. The SHA hash, not the raw contents of the pool, is returned to the user; the contents of the entropy pool are never directly accessible. It is assumed impossible to derive any information about the state of the pool from the SHA hash. Therefore, knowing some values from the pool does not lend any knowledge to past or future values. Nonetheless, the kernel can use the entropy estimate to refuse to return data if the pool has zero entropy. As entropy is read from the pool, the entropy estimate is decreased in response to how much information is now known about the pool.

When the estimate reaches zero, the kernel can still return random numbers. Theoretically, however, an attacker is then capable of inferring future output given prior output. This would require that the attacker have nearly all the prior outputs from the entropy pool and that the attacker successfully perform cryptanalysis on SHA. Because SHA is believed to be secure, this possibility is not feasible. To high-security cryptography users who accept no risk, however, the entropy estimate ensures the strength of the random numbers. To the vast majority of users this extra assurance is not needed.

The Dilemma of System Startup

When the kernel first boots, it completes a series of actions that are almost entirely predictable. Consequently, an attacker is able to infer much about the state of the entropy pool at boot. Worse, each boot is largely similar to the next and the pool would initialize to largely the same contents on each boot. This reduces the accuracy of the entropy estimate, which has no way of knowing that the entropy contributed during the boot sequence is less predictable than entropy contributed at other times.

To offset this problem, most Linux systems save some information from the entropy pool across system shutdowns. They do this by saving the contents of the entropy pool on each shutdown. When the system boots, the data is read and fed into /dev/urandom. This effectively loads the previous contents of the pool into the current pool, without increasing the entropy estimate.

Therefore, an attacker cannot predict the state of the entropy pool without knowledge of both the current state of the system and the previous state of the system.

Interfaces to Input Entropy

The kernel exports a family of interfaces to facilitate feeding data into the entropy pool. They are called by the appropriate kernel subsystems or drivers. They are

void add_interrupt_randomness(int irq)
void add_keyboard_randomness(unsigned char scancode)
void add_mouse_randomness(__u32 mouse_data)

add_interrupt_randomness() is called by the interrupt system whenever an interrupt is received whose handler was registered with SA_SAMPLE_RANDOM. The parameter irq is the interrupt number. The random number generator uses the timing between interrupts as a source of noise. Note that not all devices are suitable for this; if the device generates interrupts deterministically (for example, the timer interrupt) or may be influenced by an outside attacker (for example, a network device) it should not feed the pool. An acceptable device is a hard disk, which generates interrupts at an unpredictable rate.

add_keyboard_randomness() uses the scancode and the timing between successive key presses to feed the entropy pool. Interestingly, the routine is smart enough to ignore autorepeat (when the user holds a key down) because both the scancode and timing interval would then be constant, contributing no entropy. The sole parameter is the scancode of the pressed key.

add_mouse_randomness() uses the mouse position as well as the timing between interrupts to feed the pool. The parameter mouse_data is the hardware-reported position of the mouse.

All three of these routines add the supplied data to the entropy pool, calculate an estimate of the entropy of the given data, and increment the entropy estimate by this amount.

All these exported interfaces use the internal function add_timer_randomness() to feed the pool. This function calculates the timing between successive events of the same time and adds the delay to the pool. For example, the timing between two successive disk interrupts is largely random—especially when measured very precisely. Often, the low-order bits are electrical noise. After this function feeds the pool, it calculates how much randomness was present in the data. It does this by calculating the first-, second-, and third-order deltas from the previous timing and first- and second-order deltas. The greatest of these deltas, rounded down to 12 bits, is used as the entropy estimate.

Interfaces to Output Entropy

The kernel exports one interface for obtaining random data from within the kernel:

void get_random_bytes(void *buf, int nbytes)

This function stores nbytes worth of random data in the buffer pointed to by buf. The function returns the values, whether or not the entropy estimate is zero. This is less of a concern to the kernel than user-space cryptography applications. The random data is suitable for a handful of kernel tasks, most notably in networking, where the random data seeds the initial TCP sequence number.

Kernel code can do the following to receive a word-size helping of randomness:

unsigned long rand;

get_random_bytes(&rand, sizeof (unsigned long));

For user-space processes, the kernel provides two character devices, /dev/random and /dev/urandom, for obtaining entropy. The first, /dev/random, is more suitable when very strong random numbers are desired, as in high-security cryptographic applications. It returns only up to the maximum number of bits of random data noted by the entropy estimate. When the entropy estimate reaches zero, /dev/random blocks and does not return the remaining data until the entropy estimate is sufficiently positive. Conversely, the device /dev/urandom does not have this feature and is generally just as secure. Both devices return numbers from the same pool.

Reading from either file is simple. Here is an example of a user-space function that an application can use to extract a word of entropy:

unsigned long get_random(void)
{
        unsigned long seed;
        int fd;

        fd = open("/dev/urandom", O_RDONLY);
        if (fd == -1) {
                perror("open");
                return 0;
        }
        if (read(fd, &seed, sizeof (seed)) < 0) {
                perror("read");
                seed = 0;
        }
        if (close(fd))
                perror("close");

        return seed;
}

Alternatively, you can easily read bytes bytes into the file file by using the dd(1) program:

dd if=/dev/urandom of=file count=1 bs=bytes


[1] Claude E. Shannon (April 30, 1916–February 24, 2001) was an engineer at Bell Labs whose most famous work, A Mathematical Theory of Communication, published in 1948, introduced the concept of information theory and Shannon entropy. Shannon also enjoyed riding his unicycle.

[2] John von Neumann (December 28, 1903–February 8, 1957) was a member of the Institute for Advanced Study at Princeton. In his life, he made numerous contributions to mathematics, economics, and computer science. Some of his most important contributions were game theory, Neumann algebras, and the von Neumann bottleneck.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset