7.1 The likelihood principle

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

7.1.1 Introduction

This section would logically come much earlier in the book than it is placed, but it is important to have some examples of Bayesian procedures firmly in place before considering this material. The basic result is due to Birnbaum (1962), and a more detailed consideration of these issues can be found in Berger and Wolpert (1988).

The nub of the argument here is that in drawing any conclusion from an experiment only the actual observation x made (and not the other possible outcomes that might have occurred) is relevant. This is in contrast to methods by which, for example, a null hypothesis is rejected because the probability of a value as large as or larger than that actually observed is small, an approach which leads to Jeffreys’ criticism that was mentioned in Section 4.1 when we first considered hypothesis tests, namely, that ‘a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred’. Virtually all of the ideas discussed in this book abide by this principle, which is known as the likelihood principle (there are some exceptions, for example Jeffreys’ rule is not in accordance with it). We shall show that it follows from two other principles, called the conditionality principle and the sufficiency principle, both of which are hard to argue against.

In this section, we shall write x for a particular piece of data, not necessarily one-dimensional, the density of which depends on an unknown parameter θ. For simplicity, we shall suppose that x and θ are discrete (although they may be more than one-dimensional). The triple represents the essential features of an experiment to gain information about θ, and accordingly we shall refer to E as an experiment. Note that the random variable is a feature of the experiment, not the particular value x that may be observed when the experiment is carried out on a particular occasion. If such an experiment is carried out and the value x is observed, then we shall write for the evidence provided about the value of θ by carrying out experiment E and observing the value x. This ‘evidence’ is not presumed to be in any particular form. To a Bayesian, it would normally be the posterior distribution of θ or some feature of it, but for the moment we are not restricting ourselves to Bayesian inference, and a classical statistician might consider evidence to be made up of significance levels and confidence intervals, etc., while the notation does not rule out some form of evidence that would be new to both.

For example, you might be interested in the proportion θ of defective articles coming from a factory. A possible experiment E would consist in observing a fixed number n of articles chosen at random and observing the number x defective, so that is a family of binomial densities. To have a definite experiment, it is necessary to give n a specific value, for example, n = 100; once n is known E is fully determined. If we then observe that x = 3, then denotes the conclusions we arrive at about the value of θ.

7.1.2 The conditionality principle

The conditionality principle can be explained as the assertion that if you have decided which of two experiments you performed by tossing a coin, then once you tell me the end result of the experiment, it will not make any difference to any inferences I make about an unknown parameter θ whether or not I know which way the coin landed and hence which experiment was actually performed (assuming that the probability of the coin’s landing ‘heads’ does not in any way depend on θ). For example, if we are told that an analyst has reported on the chemical composition of a sample, then it is irrelevant whether we had always intended to ask him or her to analyze the sample or had tossed a coin to decided whether to ask that scientist or the one in the laboratory next door to analyze it. Put this way, the principle should seem plausible, and we shall now try to formalize it.

We first need to define a mixed experiment. Suppose that there are two experiments, and and that the random variable is such that , whatever θ is and independently of y and z. Then the mixed experiment consists of carrying out E1 if k = 1 and E2 if k = 2. It can also be defined as the triple where

and

We only need to assume the following rather weak form of the principle:

Weak conditionality principle. If E1, E2 and are as defined earlier, then

that is, the evidence about θ from is just the evidence from the experiment actually performed.

7.1.3 The sufficiency principle

The sufficiency principle says that if t(x) is sufficient for θ given x, then any inference we may make about θ may be based on the value of t, and once we know that we have no need of the value of x. We have already seen in Section 2.9 that Bayesian inference satisfies the sufficiency principle. The form in which the sufficiency principle will be used in this section is as follows:

7.1.3.1 Weak sufficiency principle

Consider the experiment and suppose that t=t(x) is sufficient for θ given x. Then if t(x1)=t(x2)

This clearly implies that, as stated in Corollary 2.1 in Section 2.9.3, ‘For any prior distribution, the posterior distribution of θ given is the same as the posterior distribution of θ given a sufficient statistic t’. In Bayesian statistics inference is based on the posterior distribution, but this principle makes it clear that even if we had some other method of arriving at conclusions, x1 and x2 would still lead to the same conclusions.

7.1.4 The likelihood principle

For the moment, we will state what the likelihood principle is – its implications will be explored later.

7.1.4.1 Likelihood principle

Consider two different experiments and where θ is the same quantity in each experiment. Suppose that there are particular possible outcomes of experiment E1 and of E2 such that

for some constant c, that is, the likelihoods of θ as given by these possible outcomes of the two experiments are proportional, so that

Then

The following theorem [due to Birnbaum (1962)] shows that the likelihood principle follows from the other two principles described earlier.

Theorem 7.1 The likelihood principle follows from the weak conditionality principle and the weak sufficiency principle.

Proof. If E1 and E2 are the two experiments about θ figuring in the statement of the likelihood principle, consider the mixed experiment which arose in connection with the weak conditionality principle. Define a statistic t by

(Note that if experiment 2 is performed and we observe the value then by the assumption of the likelihood principle there is a value such that so we can take this value of in the proof.) Now note that if then

whereas if then

and if but then

while for and all other x we have . In no case does depend on θ and hence, from the definition given when sufficiency was first introduced in Section 2.9, t is sufficient for θ given x. It follows from the weak sufficiency principle that . But the weak conditionality principle now ensures that

establishing the likelihood principle.

Corollary 7.1 If is an experiment, then should depend on E and x only through the likelihood

Proof. For any one particular value x1 of x define

so that (since we have assumed for simplicity that everything is discrete this will not, in general, be zero). Now let the experiment E1 consist simply of observing y, that is, of noting whether or not x=x1. Then the likelihood principle ensures that , and E1 depends solely on and hence solely on the likelihood of the observation actually made.

Converse 7.1 If the likelihood principle holds, then so do the weak conditionality principle and the weak sufficiency principle.

Proof. Using the notation introduced earlier for the mixed experiment, we see that if x=(1, y) then

and so by the likelihood principle , implying the weak conditionality principle. Moreover, if t is a sufficient statistic and t(x1)=t(x2), then x1 and x2 have proportional likelihood functions, so that the likelihood principle implies the weak sufficiency principle.

7.1.5 Discussion

From the formulation of Bayesian inference as ‘posterior is proportional to prior times likelihood,’ it should be clear that Bayesian inference obeys the likelihood principle. It is not logically necessary that if you find the arguments for the likelihood principle convincing, you have to accept Bayesian inference, and there are some authors, for example, Edwards (1992), who have argued for a non-Bayesian form of inference based on the likelihood. Nevertheless, I think that Savage was right in saying in the discussion on Birnbaum (1962) that ‘… I suspect that that once the likelihood principle is widely recognized, people will not long stop at that halfway house but will go forward and accept the implications of personalistic probability for statistics’.

Conversely, much of classical statistics notably fails to obey the likelihood principle – any use of tail areas (e.g. the probability of observing a value as large as that seen or greater) evidently involves matters other than the likelihood of the observations actually made. Another quotation from Savage, this time from Savage et al. (1962), may help to point to some of the difficulties that arise in connection with confidence intervals.

Imagine, for example, that two Meccans carefully drawn at random differ from each other in height by only 0.01 mm. Would you offer 19 to 1 odds that the standard deviation of the height of Meccans is less than 1.13 mm? That is the 95 per cent upper confidence limit computed with one degree of freedom. No, I think you would not have enough confidence in that limit to offer odds of 1 to 1.

In fact, the likelihood principle has serious consequences for both classical and Bayesian statisticians and some of these consequences will be discussed in the Sections 7.2–7.4. For classical statisticians, one of the most serious is the stopping rule principle, while for Bayesians one of the most serious is that Jeffreys’ rule for finding reference priors is incompatible with the likelihood principle.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7.1 The likelihood principle

Create new playlist

Sign In

Sign Up

Table of Contents for
7.1 The likelihood principle