In this section we briefly introduce the concepts from probability needed for what follows. An understanding of probability and the various identities that arise is essential for the development of entropy.
Consider an experiment with possible outcomes in a finite set . For example, could be flipping a coin and . We assume each outcome is assigned a probability. In the present example, and . Often, the outcome of an experiment is called a random variable.
In general, for each , denote the probability that by
Note that . If , let
which is the probability that takes a value in .
Often one performs an experiment where one is measuring several different events. These events may or may not be related, but they may be lumped together to form a new random event. For example, if we have two random events and with possible outcomes and , respectively, then we may create a new random event that groups the two events together. In this case, the new event has a set of possible outcomes , and is sometimes called a joint random variable.
Draw a card from a standard deck. Let be the suit of the card, so . Let be the value of the card, so . Then gives the 52 possibilities for the card. Note that if and , then is simply the probability that the card drawn has suit and value . Since all cards are equally probable, this probability is 1/52, which is the probability that (namely 1/4) times the probability that (namely 1/13). As we discuss later, this means and are independent.
Roll a die. Suppose we are interested in two things: whether the number of dots is odd and whether the number is at least 2. Let if the number of dots is even and if the number of dots is odd. Let if the number of dots is less than 2 and if the number of dots is at least 2. Then gives us the results of both experiments together. Note that the probability that the number of dots is odd and less than 2 is . This is not equal to , which is . This means that and are not independent. As we’ll see, this is closely related to the fact that knowing gives us information about .
We denote
Note that we can recover the probability that as
We say that two random events and are independent if
for all and all . In the preceding example, the suit of a card and the value of the card were independent.
We are also interested in the probabilities for given that has occurred. If , define the conditional probability of given that to be
One way to think of this is that we have restricted to the set where . This has total probability . The fraction of this sum that comes from is .
Note that and are independent if and only if
for all . In other words, the probability of is unaffected by what happens with .
There is a nice way to go from the conditional probability of given to the conditional probability of given .
If and , then
The proof consists of simply writing the conditional probabilities in terms of their definitions.