1
Classical Information Theory and Classical Error Correction

Markus Grassl

Max‐Planck‐Institut für die Physik des Lichts, Staudstraße 2, 91058 Erlangen, Germany

1.1 Introduction

Information theory establishes a framework for any kind of communication and information processing. It allows us to derive bounds on the complexity or cost of tasks such as storing information using the minimal amount of space or sending data over a noisy channel. It provides means to quantify information. So one may ask, “How much does someone know about what somebody else knows?” – a question which is important in cryptographic context. Before studying the new aspects that quantum mechanics adds to information theory in later chapters, we will have a brief look at the basics of classical information theory in the next section.

While information theory provides an answer to the question how fast one can send information over a noisy channel, it usually does not give a constructive solution to this task. This is a problem error correction deals with. In Section 1.3, we give a short introduction to linear blocks codes, laying ground for the discussion of error‐correcting codes for quantum systems in Chapter 7.

1.2 Basics of Classical Information Theory

1.2.1 Abstract Communication System

The foundations of information theory have been laid by Claude Shannon in his landmark paper “A Mathematical Theory of Communication” (1). In that paper, Shannon introduces the basic mathematical concepts for communication systems and proves two fundamental coding theorems. Here, we mainly follow his approach.

The first important observation of Shannon is that although the process of communication is intended to transfer a message with some meaning, the design of a communication system can abstract from any meaning:

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.

Additionally, we can, to some extent, abstract from the physical channel that is used to transmit the message. For this, we introduce a transmitter and a receiver that convert the messages into some physical signal and vice versa. The general layout of such a communication system is illustrated in Figure 1.1. Given a channel and an information source, the basic problem is to transmit the messages produced by the information source through the channel as efficient and as reliable as possible. Efficient means that we can send as much information as possible per use of the channel, and reliable means that, despite the disturbance due to the noise added by the channel, the original message is (with high probability) reproduced by the receiver. Shannon has shown that one can treat the two problems separately. First, we will consider a noiseless channel, which transmits every input perfectly, and then we will deal with noisy channels. For simplicity, we will consider only discrete channels here, that is, both the input and the output of the channel, as well as those of the transmitter and receiver, are symbols of a finite discrete set.

Scheme for a general communication system.

Figure 1.1 Schematic diagram of a general communication system.

1.2.2 The Discrete Noiseless Channel

For a channel that transmits its inputs perfectly, the design of a communication system aims to maximize its efficiency, that is, the amount of information that can be sent through the channel per time. Usually, it is assumed that for each channel input, the transmission takes the same amount of time. Then, we want to maximize the throughput per channel use. Otherwise, we first have to consider how many symbols we can send through the channel per time. Following (1), we define

If we use the symbols images with durations images , then we get the recursive equation

equation

as we can partition the sequences of duration images by, say, the last symbol. For large images , images tends to images where images is the largest real solution of the characteristic equation

1.1 equation

Summarizing, we get

In order to maximize the efficiency of the communication system, we additionally need a measure for the amount of information that is produced by the source. Recall that we abstract from the meaning of a message, that is, a single message does not provide any information. Instead, we always consider a set of possible symbols, and each of the symbols will occur with some probability. The less frequent a symbol, the more surprising is its occurrence and hence it bears more information. The amount of information of a source is described as follows:

In this definition, we have assumed that the symbols are emitted independently, that is, the probability of receiving a sequence images of length images is given by

equation

If there are some correlations between the symbols, the entropy of the source decreases when the original symbols are combined to new symbols. This is due to the fact that the entropy is maximal when all probabilities are equal. As an example, we may consider any natural language. The entropy depends on whether we consider the single‐letter entropy, or the entropy of pairs or triples of letters, whole words, or even sentences. Of course, the entropy does also depend on the language itself.

From now on, we fix the alphabet of the information source and assume that there are no correlations between the symbols. A very important concept in information theory is that of images ‐typical words.

Asymptotically, this means that a sequence either occurs with negligible probability, that is, is nontypical, or it is a so‐called images ‐typical word, which are approximately equally distributed.

Fixing the length images one can order the sequences by decreasing probability. For images , we define images as the minimum number of sequences of length images that accumulate a total probability images . Shannon has shown that in the limit of large images , this fraction is independent of images :

The quantity images can be interpreted as the number of bits that are required to describe a sequence when considering only the most probable sequences with total probability images . From Theorem 1.1, we get that even for the finite length images , almost all words can be described in this way. The bounds for sending arbitrary sequences through the channel are given by Shannon's first fundamental coding theorem:

The small defect images compared to the maximal achievable transmission speed is due to the small extra information that is needed to encode the nontypical words of the source. An efficient scheme for encoding the output of the source is for example, the so‐called Huffman coding (2). In view of Theorem 1.1, one can also ignore the nontypical words that have a negligible total probability images in the encoding, resulting in a small error (lossy data compression).

1.2.3 The Discrete Noisy Channel

A discrete noisy channel maps an input symbol images from the (finite) input alphabet images to an output symbol images from the output alphabet images . A common assumption is that the channel is memoryless, that is, the probability of observing a symbol images depends only on the last channel input images and nothing else. The size of the input and output alphabets need not be the same, as depicted in Figure 1.2. Given the channel output images , the task for the receiver is to determine the most likely input images to the channel. For this, we consider how much information the channel output provides about the channel input. First, we define some general quantities for pairs of random variables (see, e.g., (3)).

Scheme for a discrete memoryless channel.

Figure 1.2 Schematic representation of a discrete memoryless channel. Arrows correspond to transitions with nonzero probability images .

For the joint entropy, we consider the channel input and the channel output together as one symbol.

The conditional entropy is a measure for the information that we additionally get when considering both images and images together and not only images . This is reflected by the following chain rule (see [ (3), Theorem 2.2.1]).

Another important quantity in information theory is the mutual information.

The relationship between entropy and mutual information is illustrated in Figure 1.3. From Theorem 1.4, we get the following equivalent expressions for the mutual information:

equation

With this preparation, we are ready to define the capacity of a noisy discrete memoryless channel.

Scheme for Relationship between entropy and mutual information.

Figure 1.3 Relationship between entropy and mutual information.

The justification of this definition is provided by Shannon's second fundamental coding theorem.

For the proof of this theorem, one considers a particular set of encoding schemes and then averages the frequency of errors. This average can be made arbitrarily small, implying that at least one of the encoding schemes must have a negligible error probability.

Before we turn our attention to the explicit construction of error‐correcting codes, we consider a particular interesting channel.

Scheme for binary symmetric channel (BSC) and its generalization, the uniform symmetric channel (USC).

Figure 1.4 The binary symmetric channel (BSC) and its generalization, the uniform symmetric channel (USC). Each symbol is transmitted correctly with probability images . If an error occurs, each of the other symbols is equally likely.

The generalization of the BSC to more than one input symbol is shown in Figure 1.4. Again, a symbol is transmitted correctly with probability images . If an error occurs, each of the, say, images other symbols is equally likely, that is, it occurs with probability images . These types of channels are extremal in the sense that the transition probabilities only depend on whether a symbol is transmitted correctly or not. Hence, an incorrect symbol bears minimal information about the input symbol. Any deviation from this symmetry results in an increased capacity.

1.3 Linear Block Codes

1.3.1 Repetition Code

When sending information over a noisy channel, on the highest level of abstraction, we distinguish only the cases whether a symbol is transmitted correctly or not. Then the difference between the input sequence and the output sequence is measured by the Hamming distance.

In order to be able to correct errors, we use only a subset of all possible sequences. In particular, we may take a subset of all possible sequences of length images .

The simplest code that can be used to detect or correct errors is the repetition code. A repetition code with rate images transmits every symbol twice. At the receiver, the two symbols are compared, and if they differ, an error is detected. Using this code over a channel with error probability images , the probability of an undetected error is images . Sending more than two copies of each symbol, we can decrease the probability of an undetected error even more. But at the same time, the rate of the code decreases since the number of codewords remains fixed while the length of the code increases.

A repetition code can not only be used to detect errors but also to correct errors. For this, we send three copies of each symbols, that is, we have a repetition code with rate images . At the receiver, the three symbols are compared. If at most one symbol is wrong, the two error‐free symbols agree and we assume that the corresponding symbol is correct. Again, increasing the number of copies sent increases the number of errors that can be corrected. For the general situation, we consider the distance between two words of the block code images .

The error‐correcting ability of a code is related to its minimum distance.

Geometrical illustration of the codewords.

Figure 1.5 Geometry of the codewords. Any sphere of radius images around a codeword contains exactly one codeword. The spheres of radius images are disjoint.

1.3.2 Finite Fields

For a general block code images over a alphabet images , we have to make a list of all codewords, that is, the description of the code is proportional to its size. In order to get a more efficient description—and thereby more efficient algorithms for encoding and decoding—we impose some additional structure. In particular, we require that the elements of the alphabet have the algebraic structure of a field, that is, we can add, subtract, and multiply any two elements, and every nonzero element has a multiplicative inverse. First, we consider a finite field whose size is a prime number.

Table 1.1 The extended Euclidean algorithm (see (4)).

image

The smallest field is the binary field images , which has only two elements 0 and 1. Note that the integers modulo a composite number do not form a field as some nonzero elements do not have a multiplicative inverse. For example, for the integers modulo 4 we have images .

In order to construct a field whose size is not a prime number, one uses the following construction.

It can be shown that for any prime number images and for any positive integer images , there exists an irreducible polynomial of degree images over images , that is, for any prime power images , there exists a finite field of that size. Furthermore, it can be shown that any finite field can be obtained by the construction of Proposition 1.2. Hence, we get (see, e.g., (5))

1.3.3 Generator and Parity Check Matrix

In order to get a more efficient description of a block code images of length images , we consider only codes whose alphabet is a finite field images . Furthermore, we require that the code images forms a linear vector space over the field images , that is,

equation

This implies that the code has images elements for some images , images . We will use the notation images . Instead of listing all images elements, it is sufficient to specify a basis of images linear independent vectors in images . Alternatively, the linear space images can be given as the solution of images linearly independent homogeneous equations.

The generator matrix with images entries provides a compact description of a code with images elements. Moreover, encoding of information sequences images of length images corresponds to the linear map given by images , that is,

equation

The parity check matrix images can be used to check whether a vector lies in the code.

The reason for defining the parity check matrix images as a matrix with images columns and images rows and not as its transpose is motivated by the following.

As we have seen in Theorem 1.6, the minimum distance of a code is a criterion for its error‐correcting ability. For linear codes, the minimum distance equals the minimum Hamming weight of the nonzero codewords as

equation

The minimum Hamming weight of a linear code can be computed using the parity check matrix.

1.3.4 Hamming Codes

The last proposition can be used to construct codes. For a single‐error‐correcting code, we require images . This implies that any two columns in images have to be linearly independent, that is, no column is a scalar multiple of another column. If we fix the redundancy images , it is possible to find images vectors with this property, which can be combined to a parity check matrix images . This construction gives the following class of single‐error‐correcting codes (see (6,7)).

For binary Hamming codes, the parity check matrix images consists of all images nonzero vectors of length images . If we order those columns in such a way that the images th column equals the binary expansion images of images , error correction is particularly easy. If images is an error of weight 1, then the syndrome images equals the images th column of images and hence the binary expansion of images . Therefore, the syndrome directly provides the position of the error.

Usually, a received vector will be decoded as the codeword, which is closest in the Hamming distance. In general, decoding an arbitrary linear binary code is an NP hard problem (8). More precisely, it was shown that it is an NP complete problem to decide whether there is a vector images , which corresponds to a given syndrome images and whose Hamming weight is at most images . Hence, we cannot expect to have an efficient general algorithm for decoding.

Instead, by exhaustive search we can precompute an error vector of minimal Hamming weight corresponding to each syndrome. For this, we first arrange all codewords as the first row of an array, where the all‐zero codeword is the first element. Among the remaining vectors of length images , we pick a vector images with minimal Hamming weight. This vector is the first element of the next row in our array. The remaining entries of this row are obtained by adding the vector images to the corresponding codeword in the first row. This guarantees that all elements of a row correspond to the same syndrome. We proceed until all images vectors have been arranged into an array with images rows and images columns, the so‐called standard array. The elements in the first column of the standard array are called coset leaders, having minimal Hamming weight among all vectors in a row. Table 1.2 shows the standard array of a binary code images , which is the dual of the Hamming code of Example 1.4. Actually, the code images is a subcode of the Hamming code. In the first row, 16 codewords are listed. For the next seven rows, the coset leader is the unique vector of Hamming weight 1 in each coset, reflecting the fact that the code can correct a single error. For the next seven rows, the coset leader has weight 2, but each coset contains three vectors of weight 2. Hence, decoding succeeds only in one out of three cases. In the final row, we have even seven vectors of weight 3.

Table 1.2 Standard array for decoding the code images , the dual of a binary Hamming code.

0000000 0001111 0110011 0111100 1010101 1011010 1100110 1101001
0000001 0001110 0110010 0111101 1010100 1011011 1100111 1101000
0000010 0001101 0110001 0111110 1010111 1011000 1100100 1101011
0000100 0001011 0110111 0111000 1010001 1011110 1100010 1101101
0001000 0000111 0111011 0110100 1011101 1010010 1101110 1100001
0010000 0011111 0100011 0101100 1000101 1001010 1110110 1111001
0100000 0101111 0010011 0011100 1110101 1111010 1000110 1001001
1000000 1001111 1110011 1111100 0010101 0011010 0100110 0101001
0110000 0111111 0000011 0001100 1100101 1101010 1010110 1011001
1000001 1001110 1110010 1111101 0010100 0011011 0100111 0101000
1000010 1001101 1110001 1111110 0010111 0011000 0100100 0101011
1000100 1001011 1110111 1111000 0010001 0011110 0100010 0101101
1001000 1000111 1111011 1110100 0011101 0010010 0101110 0100001
1010000 1011111 1100011 1101100 0000101 0001010 0110110 0111001
1100000 1101111 1010011 1011100 0110101 0111010 0000110 0001001
1110000 1111111 1000011 1001100 0100101 0101010 0010110 0011001

1.4 Further Aspects

We have seen that the Hamming code is a code for which the correction of errors is rather simple, but it can only correct a single error. On the other hand, using an arbitrary linear code, the problem of error correction is NP complete. But luckily, there are other families of error‐correcting codes for which efficient algorithms exist to correct at least all errors of bounded weight. More about these codes can be found in any textbook on coding theory or the book by MacWilliams and Sloane (7), which is an excellent reference for the theory of error‐correcting codes.

References

  1. 1 Shannon, C.E. (1948) A mathematical theory of communication. Bell Syst. Tech. J., 27 (3), 379–423, 623–656, doi: 10.1002/j.1538‐7305.1948.tb01338.x.
  2. 2 Huffman, D.A. (1952) A method for the construction of minimum‐redundancy codes. Proc. Inst. Radio Eng., 40, 1098–1101.
  3. 3 Cover, T.M. and Thomas, J.A. (1991) Elements of Information Theory, John Wiley & Sons, Inc., New York.
  4. 4 Aho, A.V., Hopcroft, J.E., and Ullman, J.D. (1974) The Design and Analysis of Computer Algorithms, Addison Wesley, Reading, MA.
  5. 5 Jungnickel, D. (1993) Finite Fields: Structure and Arithmetics, BI‐Wissenschaftsverlag, Mannheim.
  6. 6 Hamming, R.W. (1986) Coding and Information Theory, Prentice‐Hall, Englewood Cliffs, NJ.
  7. 7 MacWilliams, F.J. and Sloane, N.J.A. (1977) The Theory of Error‐Correcting Codes, North‐Holland, Amsterdam.
  8. 8 Berlekamp, E.R., McEliece, R.J., and van Tilborg, H.C.A. (1978) On the inherent intractability of certain coding problems. IEEE Trans. Inf. Theory, 24 (3), 384–386.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset