Chapter 19. Random Numbers

The word random means different things to different programmers at different times. For most applications, the pseudo-random numbers provided by the C library are quite sufficient. Because they allow you to reproduce conditions if necessary (perhaps for debugging purposes), they are preferable to truly random numbers.

But certain applications, including cryptography, require truly random numbers for best results. The Linux kernel samples events from the unpredictable outside world to provide cryptographically strong random numbers.

Computers are predictable. Most of the tasks that we want computers to do are tasks in which predictability is the most important thing. Even when bugs appear in your program, you want them to appear predictably so that you can find them and squash them.

Pseudo-Random Numbers

Sometimes you want to provide the appearance of unpredictability, however. The C library contains functions for generating a well-respected series of pseudo-random numbers. These functions are easy to use and are the same on every Unix platform. Here is an example of how these functions are typically used:

#include <stdlib.h>
#include <time.h>...
    srand(time(NULL) + getpid());
    for (...; ...; ...) {
        do_something(rand());
}

It is common to seed the pseudo-random number generator with the current date, as returned by the time() function. time() returns the number of seconds since January 1, 1970, so the seed changes once per second and is therefore unique over a long time span (approximately 49,710 days on a 32-bit computer). If you want to keep a program from acting the same for two people who start it in the same second, add the current process ID to the time.

The numbers subsequently returned by the rand() function satisfy the mathematical property of random distribution but not of high entropy: For large enough samples, they are relatively well distributed within the space of possible 32-bit numbers, but it is possible to infer other numbers given one number. This means that these kinds of pseudo-random numbers are useful for almost every application that requests a random distribution of numbers. This includes games, Monte Carlo methods (here it is important to save the seed so that you or others can verify your results), and protocols that handle collision by inserting random delays.

Note that for debugging purposes, you may want to save the seed you used when you called srand() so that if a bug occurs that relies on data produced by the rand() function, you can use the same seed to produce the same stream of random numbers to reproduce the bug.

Cryptography and Random Numbers

The authors of this book are not cryptography experts. Writing cryptographic software is a particularly subtle pursuit, and anyone who attempts it without proper research will fail to write robust and secure cryptographic applications. This chapter has two, and only two, purposes:

  • To convince people who are not experts in cryptography that this is an area best left to experts.

  • To let experts in cryptography know that a particularly useful tool is available for their use.

If you are not an expert in cryptography but need to use it, we suggest Applied Cryptography [Schneier, 1996] as an excellent overview and introduction to the topic.

Cryptography is generally no different from other software in its predictability requirements; when you give a program the key to decrypt data, you want it to decrypt the data the same way every time. There is one exception: choosing a truly random key. No matter how sophisticated an encryption algorithm is, it is worthless if an attacker can guess what key was used to generate the data. For example, many encrypted messages include some sort of timestamp that tells approximately when they were created. If you then use the current time to seed a common pseudo-random number generator, it would not take long for the attacker to decrypt the data by simply using the time when the message was created to seed common pseudo-random number generators and try keys based on those numbers.

It does not work much better to ask a human to provide a key. In general, people pick keys that are hardly random. Their keys are generally related to natural language text, which is highly predictable in terms of information theory. Natural language is said to have low entropy; a truly random key has high entropy.

If every computer had a small radiation source, the unpredictable amount of time between the particles emitted by decaying atoms could be used to produce truly random numbers, numbers that could not be predicted based on any other information. No other known data would be sufficient to predict which numbers might have been created by the emission of radiation.

Since most computers are not equipped with such devices, Linux improvises. Ted Ts’o wrote code that examines timings of external events (mouse clicks, keyboard keypresses, and so on) and extracts information from them, storing it in an entropy pool. There are components of human (and other external) interaction with the computer that are essentially random. The code that fills the entropy pool attempts to distinguish the amount of entropy that is being added, which allows the programmer to determine the amount of entropy used to generate random information. Recently, many computers have started to provide hardware-based sources of cryptographically random data; on Linux, this random data is fed into the system entropy pool as needed so that all Linux programs can use the same interface regardless of the hardware they use.

Programmers who need random numbers based on unpredictable events can take random numbers from the entropy pool through one of two similar devices: /dev/random and /dev/urandom. The /dev/random device returns only as many bytes of random data as the device currently estimates are in the entropy pool. The /dev/urandom device does not try to offer any guarantees about the amount of entropy in the information it returns; it generates as much random data as you want, based on the entropy pool. Whenever either device is read, it subtracts the number of bytes read from the entropy count.

When you read data from either device, it does not simply return the data that is in the entropy pool. It returns data stirred by a one-way hash algorithm that does not reveal the state of the pool in its output.

  • Use neither /dev/random nor /dev/urandom for data you want to replicate. They are particularly useless sources of data for Monte Carlo methods; even 1,2... n-1,n would be better; it is at least a repeatable series.

  • If you need only a certain amount of entropy, but need more raw data than entropy, you can read a small amount of data from one of the random devices (depending on what quality you need guaranteed) and then extend it with a hash function such as MD5 or SHA.

The source code to the random driver, drivers/char/random.c, includes considerable documentation on the details. If you do intend to write cryptographic code that uses the data provided by one of the interfaces, we recommend that you read that documentation first.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset