Chapter 25

Ten Distributions Worth Knowing

In This Chapter

arrow Delving into distributions that often describe your data

arrow Digging into distributions that arise during statistical significance testing

This chapter describes ten statistical distribution functions you’ll probably encounter in biological research. For each one I provide a graph of what that distribution looks like as well as some useful or interesting facts and formulas.

You find two general types of distributions here:

check.png Distributions that describe random fluctuations in observed data: Your experimental data will often conform to one of the first seven common distributions. These distributions have one or two adjustable parameters that let them “fit” the fluctuations in your observed data.

check.png Common test statistic distributions: The last three distributions — the Student t, chi-square, and Fisher F distributions — don’t describe your observed data; they describe how a test statistic (calculated as part of a statistical significance test) will fluctuate if the null hypothesis is true — that is, if the apparent effects in your data (differences between groups, associations between variables, and so on) are due only to random fluctuations. So, they’re used to obtain p values, which indicate the statistical significance of the apparent effects. (See Chapter 3 for more information on significance testing and p values.)

This chapter provides a very short table of critical values for the t, chi-square, and F distributions — the value that your calculated test statistic must exceed in order for you to declare significance at the p < 0.05 level. For example, the critical value for the normal distribution is 1.96 for the 0.05 significance level.

The Uniform Distribution

The uniform distribution is one of the simplest distributions — a continuous number between 0 and 1 or (more generally) between a and b, with all values within that range equally likely (see Figure 25-1). The uniform distribution has a mean value of ba and a standard deviation of 9781118553992-eq25001.eps. The uniform distribution arises in the following contexts:

check.png Round-off errors are uniformly distributed. For example, a weight recorded as 85 kilograms can be thought of as a uniformly distributed random variable between 84.5 and 85.5 kilograms, with a standard error of 0.29 kilogram.

check.png The p value from any exact significance test is uniformly distributed between 0 and 1 if, and only if, the null hypothesis is true.

9781118553992-fg2501.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-1: The uniform distribution.

tip.eps The Excel formula =RAND() generates a random number drawn from the standard uniform distribution.

The Normal Distribution

The normal distribution is the king of statistical distributions. It describes variables whose fluctuations are the combined result of many independent causes. Figure 25-2 shows the shape of the normal distribution for various values of the mean and standard deviation.

Many other distributions (binomial, Poisson, Student t, chi-square, Fisher F) become nearly normal-shaped for large samples.

tip.eps The Excel statement =NORMSINV(RAND()) generates a normally distributed random number, with mean = 0 and SD = 1.

9781118553992-fg2502.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-2: The normal distribution.

The Log-Normal Distribution

If a set of numbers (x) is log-normally distributed, then the logarithms of those numbers will be normally distributed (see the preceding section). Many enzyme and antibody concentrations are log-normally distributed. Hospital lengths of stay, charges, and costs are approximately log-normal.

You should suspect log-normality if the standard deviation of a set of numbers is comparable in magnitude to the mean of those numbers. Figure 25-3 shows the relationship between the normal and log-normal distributions.

9781118553992-fg2503.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-3: The log-normal distribution.

If a set of log-normal numbers has a mean A and standard deviation D, then the natural logarithms of those numbers will have a standard deviation s = Log[1 + (D/A)2], and a mean m = Log(A) – s2/2.

The Binomial Distribution

The binomial distribution tells the probability of getting x successes out of N independent tries when the probability of success on one try is p. (See Chapter 3 for an introduction to probability.) The binomial distribution describes, for example, the probability of getting x heads out of N flips of a fair (p = 0.5) or lopsided (p 0.5) coin. Figure 25-4 shows the frequency distributions of three binomial distributions, all having p = 0.7 but having different N values.

9781118553992-fg2504.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-4: The binomial distribution.

The formula for the probability of getting x successes in N tries when the probability of success on one try is p is Pr(x, N, p) = px(1 – p)N–xN!/[x!(Nx)!].

As N gets large, the binomial distribution’s shape approaches that of a normal distribution with mean = Np and standard deviation = 9781118553992-eq25002.eps. (I talk about the normal distribution later in this chapter.)

technicalstuff.eps The arc-sine of the square root of a set of proportions is approximately normally distributed, with a standard deviation of 9781118553992-eq25003.eps. Using this “transformation,” you can analyze data consisting of observed proportions (such as fraction of subjects responding to a treatment) with t tests, ANOVAs, regression models, and other methods designed for normally distributed data.

The Poisson Distribution

The Poisson distribution gives the probability of observing exactly N independent random events in some interval of time or region of space if the mean event rate is m. It describes, for example, fluctuations in the number of nuclear decay counts per minute and the number of pollen grains per square centimeter on a microscope slide. Figure 25-5 shows the Poisson distribution for three different values of the mean event rate.

9781118553992-fg2505.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-5: The Poisson distribution.

The formula is Pr (N, m) = mNe–m/N!

As m gets large, the Poisson distribution’s shape approaches that of a normal distribution (see the next section), with mean = m and standard deviation = 9781118553992-eq25004.eps.

technicalstuff.eps The square roots of a set of Poisson-distributed numbers are approximately normally distributed, with a standard deviation of 1/2.

The Exponential Distribution

If a set of events follows the Poisson distribution (which I discuss earlier in this chapter), the time intervals between consecutive events follow the exponential distribution and vice versa. Figure 25-6 shows the shape of two different exponential distributions.

9781118553992-fg2506.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-6: The exponential distribution.

tip.eps The Excel statement = –LN(RAND()) makes exponentially distributed random numbers with mean = 1.

The Weibull Distribution

This function describes failure times for people or devices (such as light bulbs), where the failure rate can be constant or can change over time depending on the shape parameter, k. The failure rate is proportional to time raised to the k – 1 power, as shown in Figure 25-7a.

check.png If k < 1, the failure rate declines over time (with lots of early failures).

check.png If k = 1, the failure rate is constant over time (corresponding to an exponential distribution).

check.png If k > 1, the failure rate increases over time (as items wear out).

Figure 25-7b shows the corresponding cumulative survival curves.

9781118553992-fg2507.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-7: The Weibull distribution.

This distribution leads to survival curves of the form 9781118553992-eq25005.eps, which are widely used in industrial statistics. But survival methods that don’t assume any particular formula for the survival curve are more common in biostatistics.

The Student t Distribution

This family of distributions is most often used when comparing means between two groups or between two paired measurements. Figure 25-8 shows the shape of the Student t distribution for various degrees of freedom. (See Chapter 12 for more info about t tests and degrees of freedom.)

9781118553992-fg2508.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-8: The Student t distribution.

As the degrees of freedom increase, the shape of the Student t distribution approaches that of the normal distribution that I discuss earlier in this chapter.

Table 25-1 shows the “critical” t value for various degrees of freedom.

tip.eps Random fluctuations cause t to exceed the critical t value (on either the positive or negative side) only 5 percent of the time. If the t value from your Student t test exceeds this value, the test is significant at p < 0.05.

Table 25-1 Critical Values of Student t for p = 0.05

Degrees of Freedom

tcrit

1

12.71

2

4.30

3

3.18

4

2.78

5

2.57

6

2.45

8

2.31

10

2.23

20

2.09

50

2.01

1.96

tip.eps For other p and df values, the Excel formula =TINV(p, df) gives the critical Student t value.

The Chi-Square Distribution

This family of distributions is used for testing goodness-of-fit between observed and expected event counts and for testing for association between categorical variables. Figure 25-9 shows the shape of the chi-square distribution for various degrees of freedom. (See Chapter 13 for more info about the chi-square test and degrees of freedom.)

9781118553992-fg2509.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-9: The chi-square distribution.

As the degrees of freedom increase, the shape of the chi-square distribution approaches that of the normal distribution that I discuss earlier in this chapter.

Table 25-2 shows the “critical” chi-square value for various degrees of freedom.

tip.eps Random fluctuations cause chi-square to exceed the critical chi-square value only 5 percent of the time. If the chi-square value from your test exceeds the critical value, the test is significant at p < 0.05.

Table 25-2 Critical Values of Chi-Square for p = 0.05

Degrees of Freedom

χ2Crit

1

3.84

2

5.99

3

7.81

4

9.49

5

11.07

6

12.59

7

14.07

8

15.51

9

16.92

10

18.31

tip.eps For other p and df values, the Excel formula =CHIINV(p, df) gives the critical χ2 value.

The Fisher F Distribution

This family of distributions is most frequently used to get p values from an analysis of variance (ANOVA). Figure 25-10 shows the shape of the Fisher F distribution for various degrees of freedom. (See Chapter 12 for more info about ANOVAs and degrees of freedom.)

9781118553992-fg2510.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-10: The Fisher F distribution.

Figure 25-11 shows the “critical” Fisher F value for various degrees of freedom.

tip.eps Random fluctuations cause F to exceed the critical F value only 5 percent of the time. If the F value from your ANOVA exceeds this value, the test is significant at p < 0.05.

9781118553992-fg2511.eps

Illustration by Wiley, Composition Services Graphics

Figure 25-11: Critical values of Fisher F for p = 0.05.

For other values of p, df1, and df2, the Excel formula =FINV(p, df1, df2) will give the critical F value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset