4.1 Hypothesis testing

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4.1.1 Introduction

If preferred, the reader may begin with the example at the end of this section, then return to the general theory at the beginning.

4.1.2 Classical hypothesis testing

Most simple problems in which tests of hypotheses arise are of the following general form. There is one unknown parameter θ which is known to be from a set Θ, and you want to know whether or where

Usually, you are able to make use of a set of observations whose density depends on θ. It is convenient to denote the set of all possible observations by .

In the language of classical statistics, it is usual to refer to

and to

and to say that if you decide to reject H0 when it is true then you have made a Type I error while if you decide not to reject H0 when it is false then you have made a Type II error.

A test is decided by a rejection region R where

Classical statisticians then say that decisions between tests should be based on the probabilities of Type I errors, that is,

and of Type II errors, that is,

In general, the smaller the probability of Type I error, the larger the probability of Type II error and vice versa. Consequently, classical statisticians recommend a choice of R which in some sense represents an optimal balance between the two types of errors. Very often R is chosen, so that the probability of a Type II error is as small as possible subject to the requirement that the probability of a Type I error is always less than or equal to some fixed value α known as the size of the test. This theory, which is largely due to Neyman and Pearson, is to be found in most books on statistical inference and is to be found in its fullest form in Lehmann (1986).

4.1.3 Difficulties with the classical approach

Other points will be made later about the comparison between the classical and the Bayesian approaches, but one thing to note at the outset is that, in the classical approach, we consider the probability (for various values of θ) of a set R to which the vector x of observations does, or does not, belong. Consequently, we are concerned not merely with the single vector of observations we actually made but also with others we might have made but did not. Thus, classically, if we suppose that and we wish to test whether or is true (negative values being supposed impossible), then we reject H0 on the basis of a single observation x = 3 because the probability that an N(0, 1) random variable is 3 or greater is 0.001 350, even though we certainly did not make an observation greater than 3. This aspect of the classical approach led Jeffreys (1961, Section 7.2) to remark:

What the use of P implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred.

Note, however, that the form of the model, in this case the assumption of normally distributed observations of unit variance, does depend on an assumption about the whole distribution of all possible observations.

4.1.4 The Bayesian approach

The Bayesian approach is in many ways more straightforward. All we need to do is to calculate the posterior probabilities

and decide between H0 and H1 accordingly. (We note that p0+p1=1 as and .)

Although posterior probabilities of hypotheses are our ultimate goal we also need prior probabilities

to find them. (We note that just as p0+p1=1.) It is also useful to consider the prior odds on H0 against H1, namely

and the posterior odds on H0 against H1, namely

(The notion of odds was originally introduced in the very first section of this book). Observe that if your prior odds are close to 1 then you regard H0 as more or less as likely as H1 a priori, while if the ratio is large you regard H0 as relatively likely and when it is small you regard it as relatively unlikely. Similar remarks apply to the interpretation of the posterior odds.

It is also useful to define the Bayes factor B in favour of H0 against H1 as

The interest in the Bayes factor is that it can sometimes be interpreted as the ‘odds in favour of H0 against H1 that are given by the data’. It is worth noting that because and p1=1–p0 we can find the posterior probability p0 of H0 from its prior probability and the Bayes factor by

The aforementioned interpretation is clearly valid when the hypotheses are simple, that is,

for some and . For if so, then and so that

and hence, the Bayes factor is

It follows that B is the likelihood ratio of H0 against H1 which most statisticians (whether Bayesian or not) view as the odds in favour of H0 against H1 that are given by the data.

However, the interpretation is not quite as simple when H0 and H1 are composite, that is, contain more than one member. In such a case, it is convenient to write

and

where is the prior density of θ, so that is the restriction of to renormalized to give a probability density over , and similarly for . We then have

Unnumbered Display Equation

the constant of proportionality depending solely on . Similarly,

and hence, the Bayes factor is

which is the ratio of ‘weighted’ (by and ) likelihoods of and .

Because this expression for the Bayes factor involves and as well as the likelihood function itself, the Bayes factor cannot be regarded as a measure of the relative support for the hypotheses provided solely by the data. Sometimes, however, B will be relatively little affected within reasonable limits by the choice of and , and then we can regard B as a measure of relative support for the hypotheses provided by the data. When this is so, the Bayes factor is reasonably objective and might, for example, be included in a scientific report, so that different users of the data could determine their personal posterior odds by multiplying their personal prior odds by the factor.

It may be noted that the Bayes factor is referred to by a few authors simply as the factor. Jeffreys (1961) denoted it by K, but did not give it a name. A number of authors, most notably Peirce (1878) and (independently) Good (1950, 1983 and elsewhere), refer to the logarithm of the Bayes factor as the weight of evidence. The point of taking the logarithm is, of course, that if you have several experiments about two simple hypotheses, then the Bayes factors multiply, and so the weight of evidence adds.

4.1.5 Example

According to Watkins (1986, Section 13.3), the electroweak theory predicted the existence of a new particle, the W particle, of a mass m of GeV. Experimental results showed that such a particle existed and had a mass of GeV. If we take the mass to have a normal prior and likelihood and assume that the values after the signs represent known standard deviations, and if we are prepared to take both the theory and the experiment into account, then we can conclude that the posterior for the mass is where

(following the procedure of Section 2.2 on ‘Normal Prior and Likelihood’). Suppose that for some reason it was important to know whether or not this mass was less than 83.0 GeV. Then, since the prior distribution is N(82.4, 1.12), the prior probability of this hypothesis is given by

where is the distribution function of the standard normal distribution. From tables of the normal distribution, it follows that so that the prior odds are

Similarly, the posterior probability of the hypothesis that is , and hence the posterior odds are

Thus, the Bayes factor is

In this case, the experiment has not much altered beliefs about the hypothesis under discussion, and this is represented by the nearness of B to 1.

4.1.6 Comment

A point about hypothesis tests well worth making is that they ‘are traditionally used as a method for testing between two terminal acts [but that] in actual practice [they] are far more commonly used [when we are] given the outcome of a sample [to decide whether] any final or terminal decision [should] be reached or should judgement be suspended until more sample evidence is available’ (Schlaifer, 1961, Section 13.2).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4.1 Hypothesis testing

Create new playlist

Sign In

Sign Up

Table of Contents for
4.1 Hypothesis testing