Chapter 25
Ten Distributions Worth Knowing
In This Chapter
Delving into distributions that often describe your data
Digging into distributions that arise during statistical significance testing
This chapter describes ten statistical distribution functions you’ll probably encounter in biological research. For each one I provide a graph of what that distribution looks like as well as some useful or interesting facts and formulas.
You find two general types of distributions here:
Distributions that describe random fluctuations in observed data: Your experimental data will often conform to one of the first seven common distributions. These distributions have one or two adjustable parameters that let them “fit” the fluctuations in your observed data.
Common test statistic distributions: The last three distributions — the Student t, chi-square, and Fisher F distributions — don’t describe your observed data; they describe how a test statistic (calculated as part of a statistical significance test) will fluctuate if the null hypothesis is true — that is, if the apparent effects in your data (differences between groups, associations between variables, and so on) are due only to random fluctuations. So, they’re used to obtain p values, which indicate the statistical significance of the apparent effects. (See Chapter 3 for more information on significance testing and p values.)
This chapter provides a very short table of critical values for the t, chi-square, and F distributions — the value that your calculated test statistic must exceed in order for you to declare significance at the p < 0.05 level. For example, the critical value for the normal distribution is 1.96 for the 0.05 significance level.
The Uniform Distribution
The uniform distribution is one of the simplest distributions — a continuous number between 0 and 1 or (more generally) between a and b, with all values within that range equally likely (see Figure 25-1). The uniform distribution has a mean value of b – a and a standard deviation of . The uniform distribution arises in the following contexts:
Round-off errors are uniformly distributed. For example, a weight recorded as 85 kilograms can be thought of as a uniformly distributed random variable between 84.5 and 85.5 kilograms, with a standard error of 0.29 kilogram.
The p value from any exact significance test is uniformly distributed between 0 and 1 if, and only if, the null hypothesis is true.
Illustration by Wiley, Composition Services Graphics
Figure 25-1: The uniform distribution.
The Normal Distribution
The normal distribution is the king of statistical distributions. It describes variables whose fluctuations are the combined result of many independent causes. Figure 25-2 shows the shape of the normal distribution for various values of the mean and standard deviation.
Many other distributions (binomial, Poisson, Student t, chi-square, Fisher F) become nearly normal-shaped for large samples.
Illustration by Wiley, Composition Services Graphics
Figure 25-2: The normal distribution.
The Log-Normal Distribution
If a set of numbers (x) is log-normally distributed, then the logarithms of those numbers will be normally distributed (see the preceding section). Many enzyme and antibody concentrations are log-normally distributed. Hospital lengths of stay, charges, and costs are approximately log-normal.
You should suspect log-normality if the standard deviation of a set of numbers is comparable in magnitude to the mean of those numbers. Figure 25-3 shows the relationship between the normal and log-normal distributions.
Illustration by Wiley, Composition Services Graphics
Figure 25-3: The log-normal distribution.
If a set of log-normal numbers has a mean A and standard deviation D, then the natural logarithms of those numbers will have a standard deviation s = Log[1 + (D/A)2], and a mean m = Log(A) – s2/2.
The Binomial Distribution
The binomial distribution tells the probability of getting x successes out of N independent tries when the probability of success on one try is p. (See Chapter 3 for an introduction to probability.) The binomial distribution describes, for example, the probability of getting x heads out of N flips of a fair (p = 0.5) or lopsided (p ≠ 0.5) coin. Figure 25-4 shows the frequency distributions of three binomial distributions, all having p = 0.7 but having different N values.
Illustration by Wiley, Composition Services Graphics
Figure 25-4: The binomial distribution.
The formula for the probability of getting x successes in N tries when the probability of success on one try is p is Pr(x, N, p) = px(1 – p)N–xN!/[x!(N – x)!].
As N gets large, the binomial distribution’s shape approaches that of a normal distribution with mean = Np and standard deviation = . (I talk about the normal distribution later in this chapter.)
The Poisson Distribution
The Poisson distribution gives the probability of observing exactly N independent random events in some interval of time or region of space if the mean event rate is m. It describes, for example, fluctuations in the number of nuclear decay counts per minute and the number of pollen grains per square centimeter on a microscope slide. Figure 25-5 shows the Poisson distribution for three different values of the mean event rate.
Illustration by Wiley, Composition Services Graphics
Figure 25-5: The Poisson distribution.
The formula is Pr (N, m) = mNe–m/N!
As m gets large, the Poisson distribution’s shape approaches that of a normal distribution (see the next section), with mean = m and standard deviation = .
The Exponential Distribution
If a set of events follows the Poisson distribution (which I discuss earlier in this chapter), the time intervals between consecutive events follow the exponential distribution and vice versa. Figure 25-6 shows the shape of two different exponential distributions.
Illustration by Wiley, Composition Services Graphics
Figure 25-6: The exponential distribution.
The Weibull Distribution
This function describes failure times for people or devices (such as light bulbs), where the failure rate can be constant or can change over time depending on the shape parameter, k. The failure rate is proportional to time raised to the k – 1 power, as shown in Figure 25-7a.
If k < 1, the failure rate declines over time (with lots of early failures).
If k = 1, the failure rate is constant over time (corresponding to an exponential distribution).
If k > 1, the failure rate increases over time (as items wear out).
Figure 25-7b shows the corresponding cumulative survival curves.
Illustration by Wiley, Composition Services Graphics
Figure 25-7: The Weibull distribution.
This distribution leads to survival curves of the form , which are widely used in industrial statistics. But survival methods that don’t assume any particular formula for the survival curve are more common in biostatistics.
The Student t Distribution
This family of distributions is most often used when comparing means between two groups or between two paired measurements. Figure 25-8 shows the shape of the Student t distribution for various degrees of freedom. (See Chapter 12 for more info about t tests and degrees of freedom.)
Illustration by Wiley, Composition Services Graphics
Figure 25-8: The Student t distribution.
As the degrees of freedom increase, the shape of the Student t distribution approaches that of the normal distribution that I discuss earlier in this chapter.
Table 25-1 shows the “critical” t value for various degrees of freedom.
Table 25-1 Critical Values of Student t for p = 0.05
Degrees of Freedom |
tcrit |
1 |
12.71 |
2 |
4.30 |
3 |
3.18 |
4 |
2.78 |
5 |
2.57 |
6 |
2.45 |
8 |
2.31 |
10 |
2.23 |
20 |
2.09 |
50 |
2.01 |
∞ |
1.96 |
The Chi-Square Distribution
This family of distributions is used for testing goodness-of-fit between observed and expected event counts and for testing for association between categorical variables. Figure 25-9 shows the shape of the chi-square distribution for various degrees of freedom. (See Chapter 13 for more info about the chi-square test and degrees of freedom.)
Illustration by Wiley, Composition Services Graphics
Figure 25-9: The chi-square distribution.
As the degrees of freedom increase, the shape of the chi-square distribution approaches that of the normal distribution that I discuss earlier in this chapter.
Table 25-2 shows the “critical” chi-square value for various degrees of freedom.
Table 25-2 Critical Values of Chi-Square for p = 0.05
Degrees of Freedom |
χ2Crit |
1 |
3.84 |
2 |
5.99 |
3 |
7.81 |
4 |
9.49 |
5 |
11.07 |
6 |
12.59 |
7 |
14.07 |
8 |
15.51 |
9 |
16.92 |
10 |
18.31 |
The Fisher F Distribution
This family of distributions is most frequently used to get p values from an analysis of variance (ANOVA). Figure 25-10 shows the shape of the Fisher F distribution for various degrees of freedom. (See Chapter 12 for more info about ANOVAs and degrees of freedom.)
Illustration by Wiley, Composition Services Graphics
Figure 25-10: The Fisher F distribution.
Figure 25-11 shows the “critical” Fisher F value for various degrees of freedom.
Illustration by Wiley, Composition Services Graphics
Figure 25-11: Critical values of Fisher F for p = 0.05.
For other values of p, df1, and df2, the Excel formula =FINV(p, df1, df2) will give the critical F value.