4.1 Hypothesis testing

4.1.1 Introduction

If preferred, the reader may begin with the example at the end of this section, then return to the general theory at the beginning.

4.1.2 Classical hypothesis testing

Most simple problems in which tests of hypotheses arise are of the following general form. There is one unknown parameter θ which is known to be from a set Θ, and you want to know whether  or  where

Unnumbered Display Equation

Usually, you are able to make use of a set of observations  whose density  depends on θ. It is convenient to denote the set of all possible observations  by  .

In the language of classical statistics, it is usual to refer to

Unnumbered Display Equation

and to

Unnumbered Display Equation

and to say that if you decide to reject H0 when it is true then you have made a Type I error while if you decide not to reject H0 when it is false then you have made a Type II error.

A test is decided by a rejection region R where

Unnumbered Display Equation

Classical statisticians then say that decisions between tests should be based on the probabilities of Type I errors, that is,

Unnumbered Display Equation

and of Type II errors, that is,

Unnumbered Display Equation

In general, the smaller the probability of Type I error, the larger the probability of Type II error and vice versa. Consequently, classical statisticians recommend a choice of R which in some sense represents an optimal balance between the two types of errors. Very often R is chosen, so that the probability of a Type II error is as small as possible subject to the requirement that the probability of a Type I error is always less than or equal to some fixed value α known as the size of the test. This theory, which is largely due to Neyman and Pearson, is to be found in most books on statistical inference and is to be found in its fullest form in Lehmann (1986).

4.1.3 Difficulties with the classical approach

Other points will be made later about the comparison between the classical and the Bayesian approaches, but one thing to note at the outset is that, in the classical approach, we consider the probability (for various values of θ) of a set R to which the vector x of observations does, or does not, belong. Consequently, we are concerned not merely with the single vector of observations we actually made but also with others we might have made but did not. Thus, classically, if we suppose that  and we wish to test whether  or  is true (negative values being supposed impossible), then we reject H0 on the basis of a single observation x = 3 because the probability that an N(0, 1) random variable is 3 or greater is 0.001 350, even though we certainly did not make an observation greater than 3. This aspect of the classical approach led Jeffreys (1961, Section 7.2) to remark:

What the use of P implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred.

Note, however, that the form of the model, in this case the assumption of normally distributed observations of unit variance, does depend on an assumption about the whole distribution of all possible observations.

4.1.4 The Bayesian approach

The Bayesian approach is in many ways more straightforward. All we need to do is to calculate the posterior probabilities

Unnumbered Display Equation

and decide between H0 and H1 accordingly. (We note that p0+p1=1 as  and  .)

Although posterior probabilities of hypotheses are our ultimate goal we also need prior probabilities

Unnumbered Display Equation

to find them. (We note that  just as p0+p1=1.) It is also useful to consider the prior odds on H0 against H1, namely

Unnumbered Display Equation

and the posterior odds on H0 against H1, namely

Unnumbered Display Equation

(The notion of odds was originally introduced in the very first section of this book). Observe that if your prior odds are close to 1 then you regard H0 as more or less as likely as H1 a priori, while if the ratio is large you regard H0 as relatively likely and when it is small you regard it as relatively unlikely. Similar remarks apply to the interpretation of the posterior odds.

It is also useful to define the Bayes factor B in favour of H0 against H1 as

Unnumbered Display Equation

The interest in the Bayes factor is that it can sometimes be interpreted as the ‘odds in favour of H0 against H1 that are given by the data’. It is worth noting that because  and p1=1–p0 we can find the posterior probability p0 of H0 from its prior probability and the Bayes factor by

Unnumbered Display Equation

The aforementioned interpretation is clearly valid when the hypotheses are simple, that is,

Unnumbered Display Equation

for some  and  . For if so, then  and  so that

Unnumbered Display Equation

and hence, the Bayes factor is

Unnumbered Display Equation

It follows that B is the likelihood ratio of H0 against H1 which most statisticians (whether Bayesian or not) view as the odds in favour of H0 against H1 that are given by the data.

However, the interpretation is not quite as simple when H0 and H1 are composite, that is, contain more than one member. In such a case, it is convenient to write

Unnumbered Display Equation

and

Unnumbered Display Equation

where  is the prior density of θ, so that  is the restriction of  to  renormalized to give a probability density over  , and similarly for  . We then have

Unnumbered Display Equation

the constant of proportionality depending solely on  . Similarly,

Unnumbered Display Equation

and hence, the Bayes factor is

Unnumbered Display Equation

which is the ratio of ‘weighted’ (by  and  ) likelihoods of  and  .

Because this expression for the Bayes factor involves  and  as well as the likelihood function  itself, the Bayes factor cannot be regarded as a measure of the relative support for the hypotheses provided solely by the data. Sometimes, however, B will be relatively little affected within reasonable limits by the choice of  and  , and then we can regard B as a measure of relative support for the hypotheses provided by the data. When this is so, the Bayes factor is reasonably objective and might, for example, be included in a scientific report, so that different users of the data could determine their personal posterior odds by multiplying their personal prior odds by the factor.

It may be noted that the Bayes factor is referred to by a few authors simply as the factor. Jeffreys (1961) denoted it by K, but did not give it a name. A number of authors, most notably Peirce (1878) and (independently) Good (1950, 1983 and elsewhere), refer to the logarithm of the Bayes factor as the weight of evidence. The point of taking the logarithm is, of course, that if you have several experiments about two simple hypotheses, then the Bayes factors multiply, and so the weight of evidence adds.

4.1.5 Example

According to Watkins (1986, Section 13.3), the electroweak theory predicted the existence of a new particle, the W particle, of a mass m of  GeV. Experimental results showed that such a particle existed and had a mass of  GeV. If we take the mass to have a normal prior and likelihood and assume that the values after the  signs represent known standard deviations, and if we are prepared to take both the theory and the experiment into account, then we can conclude that the posterior for the mass is  where

Unnumbered Display Equation

(following the procedure of Section 2.2 on ‘Normal Prior and Likelihood’). Suppose that for some reason it was important to know whether or not this mass was less than 83.0 GeV. Then, since the prior distribution is N(82.4, 1.12), the prior probability  of this hypothesis is given by

Unnumbered Display Equation

where  is the distribution function of the standard normal distribution. From tables of the normal distribution, it follows that  so that the prior odds are

Unnumbered Display Equation

Similarly, the posterior probability of the hypothesis that  is  , and hence the posterior odds are

Unnumbered Display Equation

Thus, the Bayes factor is

Unnumbered Display Equation

In this case, the experiment has not much altered beliefs about the hypothesis under discussion, and this is represented by the nearness of B to 1.

4.1.6 Comment

A point about hypothesis tests well worth making is that they ‘are traditionally used as a method for testing between two terminal acts [but that] in actual practice [they] are far more commonly used [when we are] given the outcome of a sample [to decide whether] any final or terminal decision [should] be reached or should judgement be suspended until more sample evidence is available’ (Schlaifer, 1961, Section 13.2).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset