RC4 is a stream cipher that was developed by Rivest and has been widely used because of its speed and simplicity. The algorithm was originally secret, but it was leaked to the Internet in 1994 and has since been extensively analyzed. In particular, certain statistical biases were found in the keystream it generated, especially in the initial bits. Therefore, often a version called RC4-drop[n] is used, in which the first bits are dropped before starting the keystream. However, this version is still not recommended for situations requiring high security.
To start the generation of the keystream for RC4, the user chooses a key, which is a binary string between 40 and 256 bits long. This is put into the Key Scheduling Algorithm. It starts with an array consisting of the numbers from 0 to 255, regarded as 8-bit bytes, and outputs a permutation of these entries, as follows:
This algorithm starts by initializing the entries in as for running from 0 through 255. Suppose the user-supplied key is (let’s say the key length is 40).
The algorithm starts with and . The value of is updated to . Then and are swapped, so now and .
We now move to . The value is updated to , so is swapped with itself, which means it is not changed here.
We now move to . The value is updated to , so is swapped with , yielding and .
We now move to . The value is updated to , so and are swapped, yielding and .
Let’s look at one more value of , namely, . The value is updated to (recall that became 2 earlier), and we obtain and .
This process continues through and yields an array of length 256 consisting of a permutation of the numbers from 0 through 255.
The array is entered into the Pseudorandom Generation Algorithm.
This algorithm runs as long as needed and each round outputs a number between 0 and 255, regarded as an 8-bit byte. This byte is XORed with the corresponding byte of the plaintext to yield the ciphertext.
Weaknesses. Generally, the keystream that is output by a stream cipher should be difficult to distinguish from a randomly generated bitstream. For example, the R Game (see Section 4.5) could be played, and the probability of winning should be negligibly larger than 1/2. For RC4, there are certain observable biases. The second byte in the output should be 0 with probability 1/256. However, Mantin and Shamir [Mantin-Shamir] showed that this byte is 0 with twice that probability. Moreover, they found that the probability that the first two bytes are simultaneously 0 is instead of the expected .
Biases have also been found in the state that is output by the Key Scheduling Algorithm. For example, the probability that is about 37% larger than the expected probability of , while the probability that is 26% less than expected.
Although any key length from 40 to 255 bits can be chosen, the use of small key sizes is not recommended because the algorithm can succumb to a brute force attack.