6.1 The Elements of a Test of Hypothesis

Suppose building specifications in a certain city require that the average breaking strength of residential sewer pipe be more than 2,400 pounds per foot of length (i.e., per linear foot). Each manufacturer who wants to sell pipe in that city must demonstrate that its product meets the specification. Note that we are interested in making an inference about the mean μ of a population. However, in this example we are less interested in estimating the value of μ than we are in testing a hypothesis about its value—that is, we want to decide whether the mean breaking strength of the pipe exceeds 2,400 pounds per linear foot.

A statistical hypothesisis a statement about the numerical value of a population parameter.

The method used to reach a decision is based on the rare-event concept explained in earlier chapters. We define two hypotheses: (1) The null hypothesis represents the status quo to the party performing the sampling experiment—the hypothesis that will be assumed to be true unless the data provide convincing evidence that it is false. (2) The alternative, or research, hypothesis is that which will be accepted only if the data provide convincing evidence of its truth. From the point of view of the city conducting the tests, the null hypothesis is that the manufacturer’s pipe does not meet specifications unless the tests provide convincing evidence otherwise. The null and alternative hypotheses are therefore

Null hypothesis (H0):μ2,400 (i.e., the manufacturer’s pipe does not meet specifications)

Alternative (research) hypothesis (Ha):μ>2,400 (i.e., the manufacturer’s pipe meets specifications)

The null hypothesis, denoted H0, represents the hypothesis that will be assumed to be true unless the data provide convincing evidence that it is false. This usually represents the “status quo” or some statement about the population parameter that the researcher wants to test.

The alternative (research) hypothesis, denoted Ha, represents the hypothesis that will be accepted only if the data provide convincing evidence of its truth. This usually represents the values of a population parameter for which the researcher wants to gather evidence to support.

How can the city decide when enough evidence exists to conclude that the manufacturer’s pipe meets specifications? Because the hypotheses concern the value of the population mean μ, it is reasonable to use the sample mean x to make the inference, just as we did when we formed confidence intervals for μ in Sections 5.2 and 5.3. The city will conclude that the pipe meets specifications only when the sample mean x convincingly indicates that the population mean exceeds 2,400 pounds per linear foot.

“Convincing” evidence in favor of the alternative hypothesis will exist when the value of x exceeds 2,400 by an amount that cannot be readily attributed to sampling variability. To decide, we compute a test statistic, i.e., a numerical value computed from the sample. Here, the test statistic is the z-value that measures the distance between the value of x and the value of μ specified in the alternative hypothesis. When the null hypothesis contains more than one value of μ, as in this case (H0: μ2,400), we use the value of μ closest to the values specified in the alternative hypothesis. The idea is that if the hypothesis that μ equals 2,400 can be rejected in favor of μ>2,400, then μ less than or equal to 2,400 can certainly be rejected. Thus, the test statistic is

z=x¯2,400σx¯=x¯2,400σ/n

Note that a value of z=1 means that x is 1 standard deviation above μ=2,400, a value of z=1.5 means that x is 1.5 standard deviations above μ=2,400, and so on. How large must z be before the city can be convinced that the null hypothesis can be rejected in favor of the alternative and conclude that the pipe meets specifications?

The test statistic is a sample statistic, computed from information provided in the sample, that the researcher uses to decide between the null and alternative hypotheses.

Figure 6.1

The sampling distribution of x, assuming μ=2,400

If you examine Figure 6.1, you will note that the chance of observing x more than 1.645 standard deviations above 2,400 is only .05—if in fact the true mean μ is 2,400. Thus, if the sample mean is more than 1.645 standard deviations above 2,400, either H0 is true and a relatively rare event has occurred (.05 probability), or Ha is true and the population mean exceeds 2,400. Because we would most likely reject the notion that a rare event has occurred, we would reject the null hypothesis (μ2,400) and conclude that the alternative hypothesis (μ>2,400) is true. What is the probability that this procedure will lead us to an incorrect decision?

Such an incorrect decision—deciding that the null hypothesis is false when in fact it is true—is called a Type I error. As indicated in Figure 6.1, the risk of making a Type I error is denoted by the symbol α—that is,

α=P(Type I error)=P(Rejecting the null hypothesis when in fact the null hypothesis is true)

A Type I error occurs if the researcher rejects the null hypothesis in favor of the alternative hypothesis when, in fact, H0 is true. The probability of committing a Type I error is denoted by α.

In our example,

α=P(z>1.645wheninfactμ=2,400)=.05

We now summarize the elements of the test:

H0:μ2,400(Pipe does not meet specifications.)Ha:μ>2,400(Pipe meets specifications.)Test statistic:z=x¯2,400σx¯Rejection region:z>1.645,which corresponds toα=.05

Note that the rejection region refers to the values of the test statistic for which we will reject the null hypothesis.

The rejection region of a statistical test is the set of possible values of the test statistic for which the researcher will reject H0 in favor of Ha.

To illustrate the use of the test, suppose we test 50 sections of sewer pipe and find the mean and standard deviation for these 50 measurements to be

x¯=2,460pounds per linear foots=200pounds per linear foot

As in the case of estimation, we can use s to approximate σ when s is calculated from a large set of sample measurements.

The test statistic is

z=x¯2,400σx¯=x¯2,400σ/nx¯2,400s/n

Substituting x=2,460,n=50, and s=200, we have

z2,4602,400200/50=6028.28=2.12

Therefore, the sample mean lies 2.12σx above the hypothesized value of μ,2,400, as shown in Figure 6.2. Because this value of z exceeds 1.645, it falls into the rejection region. That is, we reject the null hypothesis that μ=2,400 and conclude that μ>2,400. Thus, it appears that the company’s pipe has a mean strength that exceeds 2,400 pounds per linear foot.

Figure 6.2

Location of the test statistic for a test of the hypothesis H0:μ=2,400

How much faith can be placed in this conclusion? What is the probability that our statistical test could lead us to reject the null hypothesis (and conclude that the company’s pipe meets the city’s specifications) when in fact the null hypothesis is true? The answer is α=.05—that is, we selected the level of risk, α, of making a Type I error when we constructed the test. Thus, the chance is only 1 in 20 that our test would lead us to conclude the manufacturer’s pipe satisfies the city’s specifications when in fact the pipe does not meet specifications.

Now, suppose the sample mean breaking strength for the 50 sections of sewer pipe turned out to be x=2,430 pounds per linear foot. Assuming that the sample standard deviation is still s=200, the test statistic is

z=2,4302,400200/50=3028.28=1.06

Therefore, the sample mean x=2,430 is only 1.06 standard deviations above the null hypothesized value of μ=2,400. As shown in Figure 6.3, this value does not fall into the rejection region (z>1.645). Therefore, we know that we cannot reject H0 using α=.05. Even though the sample mean exceeds the city’s specification of 2,400 by 30 pounds per linear foot, it does not exceed the specification by enough to provide ­convincing evidence that the population mean exceeds 2,400.

Biography Egon S. Pearson (1895–1980)

The Neyman-Pearson Lemma

Egon Pearson was the only son of noteworthy British statistician Karl Pearson (see Biography, p. 463). As you might expect, Egon developed an interest in the statistical methods developed by his father and, upon completing graduate school, accepted a position to work for Karl in the Department of Applied Statistics at University College, London. Egon is best known for his collaboration with Jerzy Neyman (see Biography) on the development of the theory of hypothesis testing. One of the basic concepts in the Neyman-Pearson approach was that of the “null” and “alternative” hypotheses. Their famous Neyman-Pearson lemma was published in Biometrika in 1928. Egon Pearson had numerous other contributions to statistics and was known as an excellent teacher and lecturer. In his last major work, Egon fulfilled a promise made to his father by publishing an annotated version of Karl Pearson’s lectures on the early history of statistics.

A Type II error occurs if the researcher accepts the null hypothesis when, in fact, H0 is false. The probability of committing a Type II error is denoted by β.

Should we accept the null hypothesis H0:μ2,400 and conclude that the manufacturer’s pipe does not meet specifications? To do so would be to risk a Type II error—that of concluding that the null hypothesis is true (the pipe does not meet specifications) when in fact it is false (the pipe does meet specifications). We denote the probability of committing a Type II error by β. It is well known that β is often difficult to determine precisely. Rather than make a decision (accept H0) for which the probability of error (β) is unknown, we avoid the potential Type II error by avoiding the conclusion that the null hypothesis is true. Instead, we will simply state that the sample evidence is insufficient to reject H0 at α=.05. Because the null hypothesis is the “status-quo” hypothesis, the effect of not rejecting H0 is to maintain the status quo. In our pipe-testing example, the effect of having insufficient evidence to reject the null hypothesis that the pipe does not meet specifications is probably to prohibit the use of the manufacturer’s pipe unless and until there is sufficient evidence that the pipe does meet specifications—that is, until the data indicate convincingly that the null hypothesis is false, we usually maintain the status quo implied by its truth.

Figure 6.3

Location of test statistic when x=2,430

Table 6.1 summarizes the four possible outcomes (i.e., conclusions) of a test of hypothesis. The “true state of nature” columns in Table 6.1 refer to the fact that either the null hypothesis H0 is true or the alternative hypothesis Ha is true. Note that the true state of nature is unknown to the researcher conducting the test. The “decision” rows in Table 6.1 refer to the action of the researcher, assuming that he or she will conclude either that H0 is true or that Ha is true, based on the results of the sampling experiment. Note that a Type I error can be made only when the null hypothesis is rejected in favor of the alternative hypothesis, and a Type II error can be made only when the null hypothesis is accepted. Our policy will be to make a decision only when we know the probability of making the error that corresponds to that decision. Because α is usually specified by the analyst, we will generally be able to reject H0 (accept Ha) when the sample evidence supports that decision. However, because β is usually not specified, we will generally avoid the decision to accept H0, preferring instead to state that the sample evidence is insufficient to reject H0 when the test statistic is not in the rejection region.

Table 6.1 Conclusions and Consequences for a Test of Hypothesis

True State of Nature
Conclusion H0 True Ha True
Accept H0 (Assume H0 True) Correct decision Type II error (probability β)
Reject H0 (Assume Ha True) Type I error (probability α) Correct decision

Caution

Be careful not to “accept H0” when conducting a test of hypothesis because the measure of reliability, β=P (Type II error), is almost always unknown. If the test statistic does not fall into the rejection region, it is better to state the conclusion as “insufficient evidence to reject H0.*

The elements of a test of hypothesis are summarized in the following box. Note that the first four elements are all specified before the sampling experiment is performed. In no case will the results of the sample be used to determine the hypotheses; the data are collected to test the predetermined hypotheses, not to formulate them.

Elements of a Test of Hypothesis

  1. Null hypothesis (H0): A theory about the specific values of one or more population parameters. The theory generally represents the status quo, which we adopt until it is proven false. The theory is always stated as H0:parameter=value.

  2. Alternative (research) hypothesis (Ha)  : A theory that contradicts the null hypothesis. The theory generally represents that which we will adopt only when sufficient evidence exists to establish its truth.

  3. Test statistic: A sample statistic used to decide whether to reject the null hypothesis.

  4. Rejection region: The numerical values of the test statistic for which the null hypothesis will be rejected. The rejection region is chosen so that the probability is α that it will contain the test statistic when the null hypothesis is true, thereby leading to a Type I error. The value of α is usually chosen to be small (e.g., .01, .05, or .10) and is referred to as the level of significance of the test.

  5. Assumptions: Clear statement(s) of any assumptions made about the population(s) being sampled.

  6. Experiment and calculation of test statistic: Performance of the sampling experiment and determination of the numerical value of the test statistic.

  7. Conclusion:

    1. If the numerical value of the test statistic falls in the rejection region, we reject the null hypothesis and conclude that the alternative hypothesis is true. We know that the hypothesis-testing process will lead to this conclusion incorrectly (Type I error) only 100α% of the time when H0 is true.

    2. If the test statistic does not fall in the rejection region, we do not reject H0. Thus, we reserve judgment about which hypothesis is true. We do not conclude that the null hypothesis is true because we do not (in general) know the probability β that our test procedure will lead to an incorrect acceptance of H0 (Type II error).

As with confidence intervals, the methodology for testing hypotheses varies depending on the target population parameter. In this chapter, we develop methods for testing a population mean, a population proportion, and (optionally) a population variance. As a reminder, the key words and the type of data associated with these target parameters are again listed in the accompanying box.

Determining the Target Parameter

Parameter Key Words or Phrases Type of Data
μ Mean; average Quantitative
p Proportion; percentage; fraction; rate Qualitative
σ2 Variance; variability; spread Quantitative
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset