2.1 Nature of Bayesian inference

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2.1.1 Preliminary remarks

In this section, a general framework for Bayesian statistical inference will be provided. In broad outline, we take prior beliefs about various possible hypotheses and then modify these prior beliefs in the light of relevant data which we have collected in order to arrive at posterior beliefs. (The reader may prefer to return to this section after reading Section 2.2, which deals with one of the simplest special cases of Bayesian inference.)

2.1.2 Post is prior times likelihood

Almost all of the situations we will think of in this book fit into the following pattern. Suppose that you are interested in the values of k unknown quantities

(where k can be one or more than one) and that you have some a priori beliefs about their values which you can express in terms of the pdf

Now suppose that you then obtain some data relevant to their values. More precisely, suppose that we have n observations

which have a probability distribution that depends on these k unknown quantities as parameters, so that the pdf (continuous or discrete) of the vector X depends on the vector θ in a known way. Usually the components of θ and X will be integers or real numbers, so that the components of X are random variables, and so the dependence of X on can be expressed in terms of a pdf

You then want to find a way of expressing your beliefs about θ taking into account both your priorbeliefs and the data. Of course, it is possible that your prior beliefs about θ may differ from mine, but very often we will agree on the way in which the data are related to θ [i.e. on the form of p(X|θ)]. If this is so, we will differ in our posteriorbeliefs (i.e. in our beliefs after we have obtained the data), but it will turn out that if we can collect enough data, then our posterior beliefs will usually become very close.

The basic tool we need is Bayes’ Theorem for random variables (generalized to deal with random vectors). From this theorem, we know that

Now we know that p(X|θ) considered as a function of X for fixed θ is a density, but we will find that we often want to think of it as a function of θ for fixed X. When we think of it in that way it does not have quite the same properties – for example, there is no reason why it should sum (or integrate) to unity. Thus, in the extreme case where p(X|θ) turns out not to depend on θ, then it is easily seen that it can quite well sum (or integrate) to ∞. When we are thinking of p(X|θ) as a function of θ we call it the likelihood function. We sometimes write

Just as we sometimes write to avoid ambiguity, if we really need to avoid ambiguity we write

but this will not usually be necessary. Sometimes it is more natural to consider the log-likelihood function

With this definition and the definition of p(θ) as the prior pdf for θ and of p(θ|X) as the posterior pdf for θ given X, we may think of Bayes’ Theorem in the more memorable form

This relationship summarizes the way in which we should modify our beliefs in order to take into account the data we have available.

2.1.3 Likelihood can be multiplied by any constant

Note that because of the way we write Bayes’ Theorem with a proportionality sign, it does not alter the result if we multiply by any constant or indeed more generally by anything which is a function of X alone. Accordingly, we can regard the definition of the likelihood as being any constant multiple of p(X|θ) rather than necessarily equalling p(X|θ) (and similarly the log-likelihood is undetermined up to an additive constant). Sometimes the integral

(interpreted as a multiple integral if k> 1 and interpreted as a summation or multiple summation in the discrete case), taken over the admissible range of θ, is finite, although we have already noted that this is not always the case. When it is, it is occasionally convenient to refer to the quantity

We shall call this the standardized likelihood, that is, the likelihood scaled so that it integrates to unity and can thus be thought of as a density.

2.1.4 Sequential use of Bayes’ Theorem

It should also be noted that the method can be applied sequentially. Thus, if you have an initial sample of observations X, you have

Now suppose that you have a second set of observations distributed independently of the first sample. Then

But independence implies

from which it is obvious that

and hence

So we can find your posterior for θ given X and by treating your posterior given X as the prior for the observation . This formula will work irrespective of the temporal order in which X and are observed, and this fact is one of the advantages of the Bayesian approach.

2.1.5 The predictive distribution

Occasionally (e.g. when we come to consider Bayesian decision theory and empirical Bayes methods), we need to consider the marginal distribution

which is called the predictive distribution of X, since it represents our current predictions of the value of X taking into account both the uncertainty about the value of θ and the residual uncertainty about X when θ is known.

One valuable use of the predictive distribution is in checking your underlying assumptions. If, for example, p(X) turns out to be small (in some sense) for the observed value of X, it might suggest that the form of the likelihood you have adopted was suspect. Some people have suggested that another thing you might re-examine in such a case is the prior distribution you have adopted, although there are logical difficulties about this if p(θ) just represents your prior beliefs. It might, however, be the case that seeing an observation the possibility of which you had rather neglected causes you to think more fully and thus bring out beliefs which were previously lurking below the surface.

There are actually two cases in which we might wish to consider a distribution for X taking into account both the uncertainty about the value of θ and the residual uncertainty about X when θ is known, depending on whether the distribution for θ under consideration does or does not take into account some current observations, and some authors reserve the term ‘predictive distribution’ for the former case and use the term preposterior distribution in cases where we do not yet have any observations to take into account. In this book, the term ‘predictive distribution’ is used in both cases.

2.1.6 A warning

The theory described earlier relies on the possibility of specifying the likelihood as a function, or equivalently on being able to specify the density p(X|θ) of the observations X save for the fact that the k parameters are unknown. It should be borne in mind that these assumptions about the form of the likelihood may be unjustified, and a blind following of the procedure described earlier can never lead to their being challenged (although the point made earlier in connection with the predictive distribution can be of help). It is all too easy to adopt a model because of its convenience and to neglect the absence of evidence for it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2.1 Nature of Bayesian inference

Create new playlist

Sign In

Sign Up

Table of Contents for
2.1 Nature of Bayesian inference