1.4 Several random variables
1.4.1 Two discrete random variables
Suppose that with each elementary event ω in Ω, we can associate a pair of integers . We write
Strictly speaking, p(m, n) should be written as for reasons discussed earlier, but this degree of pedantry in the notation is rarely necessary. Clearly
The sequence (p(m, n)) is said to be a bivariate (probability) density (function) or bivariate pdf and is called the joint pdf of the random variables m and n (strictly and ). The corresponding joint distribution function, joint cdf or joint df is
Clearly, the density of m (called its marginal density) is
We can also define a conditional distribution for n given m (strictly for given ) by allowing
to define the conditional (probability) density (function) or conditional pdf. This represents our judgement as to the chance that takes the value n given that is known to have the value m. If it is necessary to make our notation absolutely precise, we can always write
so, for example, is the probability that m is 4 given is 3, but is the probability that is 4 given that takes the value 3, but it should be emphasized that we will not often need to use the subscripts. Evidently
and
We can also define a conditional distribution function or conditional df by
1.4.2 Two continuous random variables
As in Section 1.4, we have begun by restricting ourselves to integer values, which is more or less enough to deal with any discrete cases that arise. More generally, we can suppose that with each elementary event ω in Ω, we can associate a pair of real numbers. In this case, we define the joint distribution function or joint df as
Clearly the df of x is
and that of y is
It is usually the case that when neither x nor y is discrete there is a function p(x, y) such that
in which case p(x, y) is called a joint (probability) density (function) or joint pdf. When this is so, the joint distribution is said to be continuous (or more strictly to be absolutely continuous). We can find the density from the df by
Clearly,
and
The last formula is the continuous analogue of
in the discrete case.
By analogy with the discrete case, we define the conditional density of y given x (strictly of given ) as
provided . We can then define the conditional distribution function by
There are difficulties in the notion of conditioning on the event that because this event has probability zero for every x in the continuous case, and it can help to regard the above distribution as the limit of the distribution which results from conditioning on the event that is between x and , that is
as .
1.4.3 Bayes’ Theorem for random variables
It is worth noting that conditioning the random variable y by the value of x does not change the relative sizes of the probabilities of those pairs (x, y) that can still occur. That is to say, the probability p(y|x) is proportional to p(x, y) and the constant of proportionality is just what is needed, so that the conditional probabilities integrate to unity. Thus,
Moreover,
It is clear that
so that
This is, of course, a form of Bayes’ Theorem, and is in fact the commonest way in which it occurs in this book. Note that it applies equally well if the variables x and y are continuous or if they are discrete. The constant of proportionality is
in the continuous case or
in the discrete case.
1.4.4 Example
A somewhat artificial example of the use of this formula in the continuous case is as follows. Suppose y is the time before the first occurrence of a radioactive decay which is measured by an instrument, but that, because there is a delay built into the mechanism, the decay is recorded as having taken place at a time x> y. We actually have the value of x, but would like to say what we can about the value of y on the basis of this knowledge. We might, for example, have
Then
Often we will find that it is enough to get a result up to a constant of proportionality, but if we need the constant, it is very easy to find it because we know that the integral (or the sum in the discrete case) must be one. Thus, in this case
1.4.5 One discrete variable and one continuous variable
We also encounter cases where we have two random variables, one of which is continuous and one of which is discrete. All the aforementioned definitions and formulae extend in an obvious way to such a case provided we are careful, for example, to use integration for continuous variables but summation for discrete variables. In particular, the formulation
for Bayes’ Theorem is valid in such a case.
It may help to consider an example (again a somewhat artificial one). Suppose k is the number of successes in n Bernoulli trials, so , but that the value of π is unknown, your beliefs about it being uniformly distributed over the interval [0, 1] of possible values. Then
so that
The constant can be found by integration if it is required. Alternatively, a glance at Appendix A will show that, given k, π has a beta distribution
and that the constant of proportionality is the reciprocal of the beta function . Thus, this beta distribution should represent your beliefs about π after you have observed k successes in n trials. This example has a special importance in that it is the one which Bayes himself discussed.
1.4.6 Independent random variables
The idea of independence extends from independence of events to independence of random variables. The basic idea is that y is independent of x if being told that x has any particular value does not affect your beliefs about the value of y. Because of complications involving events of probability zero, it is best to adopt the formal definition that x and y are independent if
for all values x and y. This definition works equally well in the discrete and the continuous cases (and indeed in the case where one random variable is continuous and the other is discrete). It trivially suffices that p(x, y) be a product of a function of x and a function of y.
All the above generalizes in a fairly obvious way to the case of more than two random variables, and the notions of pairwise and mutual independence go through from events to random variables easily enough. However, we will find that we do not often need such generalizations.