Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

1
Classical Information Theory and Classical Error Correction

Markus Grassl

Max‐Planck‐Institut für die Physik des Lichts, Staudstraße 2, 91058 Erlangen, Germany

1.1 Introduction

Information theory establishes a framework for any kind of communication and information processing. It allows us to derive bounds on the complexity or cost of tasks such as storing information using the minimal amount of space or sending data over a noisy channel. It provides means to quantify information. So one may ask, “How much does someone know about what somebody else knows?” – a question which is important in cryptographic context. Before studying the new aspects that quantum mechanics adds to information theory in later chapters, we will have a brief look at the basics of classical information theory in the next section.

While information theory provides an answer to the question how fast one can send information over a noisy channel, it usually does not give a constructive solution to this task. This is a problem error correction deals with. In Section 1.3, we give a short introduction to linear blocks codes, laying ground for the discussion of error‐correcting codes for quantum systems in Chapter 7.

1.2 Basics of Classical Information Theory

1.2.1 Abstract Communication System

The foundations of information theory have been laid by Claude Shannon in his landmark paper “A Mathematical Theory of Communication” (1). In that paper, Shannon introduces the basic mathematical concepts for communication systems and proves two fundamental coding theorems. Here, we mainly follow his approach.

The first important observation of Shannon is that although the process of communication is intended to transfer a message with some meaning, the design of a communication system can abstract from any meaning:

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.

Additionally, we can, to some extent, abstract from the physical channel that is used to transmit the message. For this, we introduce a transmitter and a receiver that convert the messages into some physical signal and vice versa. The general layout of such a communication system is illustrated in Figure 1.1. Given a channel and an information source, the basic problem is to transmit the messages produced by the information source through the channel as efficient and as reliable as possible. Efficient means that we can send as much information as possible per use of the channel, and reliable means that, despite the disturbance due to the noise added by the channel, the original message is (with high probability) reproduced by the receiver. Shannon has shown that one can treat the two problems separately. First, we will consider a noiseless channel, which transmits every input perfectly, and then we will deal with noisy channels. For simplicity, we will consider only discrete channels here, that is, both the input and the output of the channel, as well as those of the transmitter and receiver, are symbols of a finite discrete set.

Scheme for a general communication system. — Figure 1.1 Schematic diagram of a general communication system.

1.2.2 The Discrete Noiseless Channel

For a channel that transmits its inputs perfectly, the design of a communication system aims to maximize its efficiency, that is, the amount of information that can be sent through the channel per time. Usually, it is assumed that for each channel input, the transmission takes the same amount of time. Then, we want to maximize the throughput per channel use. Otherwise, we first have to consider how many symbols we can send through the channel per time. Following (1), we define

If we use the symbols with durations , then we get the recursive equation

as we can partition the sequences of duration by, say, the last symbol. For large , tends to where is the largest real solution of the characteristic equation

1.1

Summarizing, we get

In order to maximize the efficiency of the communication system, we additionally need a measure for the amount of information that is produced by the source. Recall that we abstract from the meaning of a message, that is, a single message does not provide any information. Instead, we always consider a set of possible symbols, and each of the symbols will occur with some probability. The less frequent a symbol, the more surprising is its occurrence and hence it bears more information. The amount of information of a source is described as follows:

In this definition, we have assumed that the symbols are emitted independently, that is, the probability of receiving a sequence of length is given by

If there are some correlations between the symbols, the entropy of the source decreases when the original symbols are combined to new symbols. This is due to the fact that the entropy is maximal when all probabilities are equal. As an example, we may consider any natural language. The entropy depends on whether we consider the single‐letter entropy, or the entropy of pairs or triples of letters, whole words, or even sentences. Of course, the entropy does also depend on the language itself.

From now on, we fix the alphabet of the information source and assume that there are no correlations between the symbols. A very important concept in information theory is that of ‐typical words.

Asymptotically, this means that a sequence either occurs with negligible probability, that is, is nontypical, or it is a so‐called ‐typical word, which are approximately equally distributed.

Fixing the length one can order the sequences by decreasing probability. For , we define as the minimum number of sequences of length that accumulate a total probability . Shannon has shown that in the limit of large , this fraction is independent of :

The quantity can be interpreted as the number of bits that are required to describe a sequence when considering only the most probable sequences with total probability . From Theorem 1.1, we get that even for the finite length , almost all words can be described in this way. The bounds for sending arbitrary sequences through the channel are given by Shannon's first fundamental coding theorem:

The small defect compared to the maximal achievable transmission speed is due to the small extra information that is needed to encode the nontypical words of the source. An efficient scheme for encoding the output of the source is for example, the so‐called Huffman coding (2). In view of Theorem 1.1, one can also ignore the nontypical words that have a negligible total probability in the encoding, resulting in a small error (lossy data compression).

1.2.3 The Discrete Noisy Channel

A discrete noisy channel maps an input symbol from the (finite) input alphabet to an output symbol from the output alphabet . A common assumption is that the channel is memoryless, that is, the probability of observing a symbol depends only on the last channel input and nothing else. The size of the input and output alphabets need not be the same, as depicted in Figure 1.2. Given the channel output , the task for the receiver is to determine the most likely input to the channel. For this, we consider how much information the channel output provides about the channel input. First, we define some general quantities for pairs of random variables (see, e.g., (3)).

Scheme for a discrete memoryless channel. — Figure 1.2 Schematic representation of a discrete memoryless channel. Arrows correspond to transitions with nonzero probability .

images — Figure 1.2 Schematic representation of a discrete memoryless channel. Arrows correspond to transitions with nonzero probability .

For the joint entropy, we consider the channel input and the channel output together as one symbol.

The conditional entropy is a measure for the information that we additionally get when considering both and together and not only . This is reflected by the following chain rule (see [ (3), Theorem 2.2.1]).

Another important quantity in information theory is the mutual information.

The relationship between entropy and mutual information is illustrated in Figure 1.3. From Theorem 1.4, we get the following equivalent expressions for the mutual information:

With this preparation, we are ready to define the capacity of a noisy discrete memoryless channel.

Scheme for Relationship between entropy and mutual information. — Figure 1.3 Relationship between entropy and mutual information.

The justification of this definition is provided by Shannon's second fundamental coding theorem.

For the proof of this theorem, one considers a particular set of encoding schemes and then averages the frequency of errors. This average can be made arbitrarily small, implying that at least one of the encoding schemes must have a negligible error probability.

Before we turn our attention to the explicit construction of error‐correcting codes, we consider a particular interesting channel.

Scheme for binary symmetric channel (BSC) and its generalization, the uniform symmetric channel (USC). — Figure 1.4 The binary symmetric channel (BSC) and its generalization, the uniform symmetric channel (USC). Each symbol is transmitted correctly with probability . If an error occurs, each of the other symbols is equally likely.

The generalization of the BSC to more than one input symbol is shown in Figure 1.4. Again, a symbol is transmitted correctly with probability . If an error occurs, each of the, say, other symbols is equally likely, that is, it occurs with probability . These types of channels are extremal in the sense that the transition probabilities only depend on whether a symbol is transmitted correctly or not. Hence, an incorrect symbol bears minimal information about the input symbol. Any deviation from this symmetry results in an increased capacity.

1.3 Linear Block Codes

1.3.1 Repetition Code

When sending information over a noisy channel, on the highest level of abstraction, we distinguish only the cases whether a symbol is transmitted correctly or not. Then the difference between the input sequence and the output sequence is measured by the Hamming distance.

In order to be able to correct errors, we use only a subset of all possible sequences. In particular, we may take a subset of all possible sequences of length .

The simplest code that can be used to detect or correct errors is the repetition code. A repetition code with rate transmits every symbol twice. At the receiver, the two symbols are compared, and if they differ, an error is detected. Using this code over a channel with error probability , the probability of an undetected error is . Sending more than two copies of each symbol, we can decrease the probability of an undetected error even more. But at the same time, the rate of the code decreases since the number of codewords remains fixed while the length of the code increases.

A repetition code can not only be used to detect errors but also to correct errors. For this, we send three copies of each symbols, that is, we have a repetition code with rate . At the receiver, the three symbols are compared. If at most one symbol is wrong, the two error‐free symbols agree and we assume that the corresponding symbol is correct. Again, increasing the number of copies sent increases the number of errors that can be corrected. For the general situation, we consider the distance between two words of the block code .

The error‐correcting ability of a code is related to its minimum distance.

Geometrical illustration of the codewords. — Figure 1.5 Geometry of the codewords. Any sphere of radius around a codeword contains exactly one codeword. The spheres of radius are disjoint.

1.3.2 Finite Fields

For a general block code over a alphabet , we have to make a list of all codewords, that is, the description of the code is proportional to its size. In order to get a more efficient description—and thereby more efficient algorithms for encoding and decoding—we impose some additional structure. In particular, we require that the elements of the alphabet have the algebraic structure of a field, that is, we can add, subtract, and multiply any two elements, and every nonzero element has a multiplicative inverse. First, we consider a finite field whose size is a prime number.

Table 1.1 The extended Euclidean algorithm (see (4)).

The smallest field is the binary field , which has only two elements 0 and 1. Note that the integers modulo a composite number do not form a field as some nonzero elements do not have a multiplicative inverse. For example, for the integers modulo 4 we have .

In order to construct a field whose size is not a prime number, one uses the following construction.

It can be shown that for any prime number and for any positive integer , there exists an irreducible polynomial of degree over , that is, for any prime power , there exists a finite field of that size. Furthermore, it can be shown that any finite field can be obtained by the construction of Proposition 1.2. Hence, we get (see, e.g., (5))

1.3.3 Generator and Parity Check Matrix

In order to get a more efficient description of a block code of length , we consider only codes whose alphabet is a finite field . Furthermore, we require that the code forms a linear vector space over the field , that is,

This implies that the code has elements for some , . We will use the notation . Instead of listing all elements, it is sufficient to specify a basis of linear independent vectors in . Alternatively, the linear space can be given as the solution of linearly independent homogeneous equations.

The generator matrix with entries provides a compact description of a code with elements. Moreover, encoding of information sequences of length corresponds to the linear map given by , that is,

The parity check matrix can be used to check whether a vector lies in the code.

The reason for defining the parity check matrix as a matrix with columns and rows and not as its transpose is motivated by the following.

As we have seen in Theorem 1.6, the minimum distance of a code is a criterion for its error‐correcting ability. For linear codes, the minimum distance equals the minimum Hamming weight of the nonzero codewords as

The minimum Hamming weight of a linear code can be computed using the parity check matrix.

1.3.4 Hamming Codes

The last proposition can be used to construct codes. For a single‐error‐correcting code, we require . This implies that any two columns in have to be linearly independent, that is, no column is a scalar multiple of another column. If we fix the redundancy , it is possible to find vectors with this property, which can be combined to a parity check matrix . This construction gives the following class of single‐error‐correcting codes (see (6,7)).

For binary Hamming codes, the parity check matrix consists of all nonzero vectors of length . If we order those columns in such a way that the th column equals the binary expansion of , error correction is particularly easy. If is an error of weight 1, then the syndrome equals the th column of and hence the binary expansion of . Therefore, the syndrome directly provides the position of the error.

Usually, a received vector will be decoded as the codeword, which is closest in the Hamming distance. In general, decoding an arbitrary linear binary code is an NP hard problem (8). More precisely, it was shown that it is an NP complete problem to decide whether there is a vector , which corresponds to a given syndrome and whose Hamming weight is at most . Hence, we cannot expect to have an efficient general algorithm for decoding.

Instead, by exhaustive search we can precompute an error vector of minimal Hamming weight corresponding to each syndrome. For this, we first arrange all codewords as the first row of an array, where the all‐zero codeword is the first element. Among the remaining vectors of length , we pick a vector with minimal Hamming weight. This vector is the first element of the next row in our array. The remaining entries of this row are obtained by adding the vector to the corresponding codeword in the first row. This guarantees that all elements of a row correspond to the same syndrome. We proceed until all vectors have been arranged into an array with rows and columns, the so‐called standard array. The elements in the first column of the standard array are called coset leaders, having minimal Hamming weight among all vectors in a row. Table 1.2 shows the standard array of a binary code , which is the dual of the Hamming code of Example 1.4. Actually, the code is a subcode of the Hamming code. In the first row, 16 codewords are listed. For the next seven rows, the coset leader is the unique vector of Hamming weight 1 in each coset, reflecting the fact that the code can correct a single error. For the next seven rows, the coset leader has weight 2, but each coset contains three vectors of weight 2. Hence, decoding succeeds only in one out of three cases. In the final row, we have even seven vectors of weight 3.

0000000	0001111	0110011	0111100	1010101	1011010	1100110	1101001
0000001	0001110	0110010	0111101	1010100	1011011	1100111	1101000
0000010	0001101	0110001	0111110	1010111	1011000	1100100	1101011
0000100	0001011	0110111	0111000	1010001	1011110	1100010	1101101
0001000	0000111	0111011	0110100	1011101	1010010	1101110	1100001
0010000	0011111	0100011	0101100	1000101	1001010	1110110	1111001
0100000	0101111	0010011	0011100	1110101	1111010	1000110	1001001
1000000	1001111	1110011	1111100	0010101	0011010	0100110	0101001
0110000	0111111	0000011	0001100	1100101	1101010	1010110	1011001
1000001	1001110	1110010	1111101	0010100	0011011	0100111	0101000
1000010	1001101	1110001	1111110	0010111	0011000	0100100	0101011
1000100	1001011	1110111	1111000	0010001	0011110	0100010	0101101
1001000	1000111	1111011	1110100	0011101	0010010	0101110	0100001
1010000	1011111	1100011	1101100	0000101	0001010	0110110	0111001
1100000	1101111	1010011	1011100	0110101	0111010	0000110	0001001
1110000	1111111	1000011	1001100	0100101	0101010	0010110	0011001

1.4 Further Aspects

We have seen that the Hamming code is a code for which the correction of errors is rather simple, but it can only correct a single error. On the other hand, using an arbitrary linear code, the problem of error correction is NP complete. But luckily, there are other families of error‐correcting codes for which efficient algorithms exist to correct at least all errors of bounded weight. More about these codes can be found in any textbook on coding theory or the book by MacWilliams and Sloane (7), which is an excellent reference for the theory of error‐correcting codes.

References

1 Shannon, C.E. (1948) A mathematical theory of communication. Bell Syst. Tech. J., 27 (3), 379–423, 623–656, doi: 10.1002/j.1538‐7305.1948.tb01338.x.
2 Huffman, D.A. (1952) A method for the construction of minimum‐redundancy codes. Proc. Inst. Radio Eng., 40, 1098–1101.
3 Cover, T.M. and Thomas, J.A. (1991) Elements of Information Theory, John Wiley & Sons, Inc., New York.
4 Aho, A.V., Hopcroft, J.E., and Ullman, J.D. (1974) The Design and Analysis of Computer Algorithms, Addison Wesley, Reading, MA.
5 Jungnickel, D. (1993) Finite Fields: Structure and Arithmetics, BI‐Wissenschaftsverlag, Mannheim.
6 Hamming, R.W. (1986) Coding and Information Theory, Prentice‐Hall, Englewood Cliffs, NJ.
7 MacWilliams, F.J. and Sloane, N.J.A. (1977) The Theory of Error‐Correcting Codes, North‐Holland, Amsterdam.
8 Berlekamp, E.R., McEliece, R.J., and van Tilborg, H.C.A. (1978) On the inherent intractability of certain coding problems. IEEE Trans. Inf. Theory, 24 (3), 384–386.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.