20.1 Probability Review

In this section we briefly introduce the concepts from probability needed for what follows. An understanding of probability and the various identities that arise is essential for the development of entropy.

Consider an experiment X with possible outcomes in a finite set X. For example, X could be flipping a coin and X={heads, tails}. We assume each outcome is assigned a probability. In the present example, p(X=heads)=1/2 and p(X=tails)=1/2. Often, the outcome X of an experiment is called a random variable.

In general, for each xX, denote the probability that X=x by

pX(x)=px=p(X=x).

Note that xXpx=1. If AX, let

p(A)=xApx, 

which is the probability that X takes a value in A.

Often one performs an experiment where one is measuring several different events. These events may or may not be related, but they may be lumped together to form a new random event. For example, if we have two random events X and Y with possible outcomes X and Y, respectively, then we may create a new random event Z=(X, Y) that groups the two events together. In this case, the new event Z has a set of possible outcomes Z=X×Y, and Z is sometimes called a joint random variable.

Example

Draw a card from a standard deck. Let X be the suit of the card, so X={clubs, diamonds, hearts, spades}. Let Y be the value of the card, so Y={two, three, , ace}. Then Z gives the 52 possibilities for the card. Note that if xX and yY, then p((X, Y)=(x, y))=p(X=x, Y=y) is simply the probability that the card drawn has suit x and value y. Since all cards are equally probable, this probability is 1/52, which is the probability that X=x (namely 1/4) times the probability that Y=y (namely 1/13). As we discuss later, this means X and Y are independent.

Example

Roll a die. Suppose we are interested in two things: whether the number of dots is odd and whether the number is at least 2. Let X=0 if the number of dots is even and X=1 if the number of dots is odd. Let Y=0 if the number of dots is less than 2 and Y=1 if the number of dots is at least 2. Then Z=(X, Y) gives us the results of both experiments together. Note that the probability that the number of dots is odd and less than 2 is p(Z=(1, 0))=1/6. This is not equal to p(X=0)p(Y=0), which is (1/2)(1/6)=1/12. This means that X and Y are not independent. As we’ll see, this is closely related to the fact that knowing X gives us information about Y.

We denote

pX, Y(x, y)=p(X=x, Y=y).

Note that we can recover the probability that X=x as

pX(x)=yYpX, Y(x, y).

We say that two random events X and Y are independent if

pX, Y(x, y)=pX(x)pY(y)

for all xX and all yY. In the preceding example, the suit of a card and the value of the card were independent.

We are also interested in the probabilities for Y given that X=x has occurred. If pX(x)>0, define the conditional probability of Y=y given that X=x to be

pY(y|x)=pX, Y(x, y)pX(x).

One way to think of this is that we have restricted to the set where X=x. This has total probability pX(x)=ypX, Y(x, y). The fraction of this sum that comes from Y=y is pY(y|x).

Note that X and Y are independent if and only if

pY(y|x)=pY(y)

for all x, y. In other words, the probability of y is unaffected by what happens with X.

There is a nice way to go from the conditional probability of Y given X to the conditional probability of X given Y.

Bayes’s Theorem

If pX(x)>0 and pY(y)>0, then

pX(x|y)=pX(x)pY(y|x)pY(y).

The proof consists of simply writing the conditional probabilities in terms of their definitions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset