Chapter 3
Getting Statistical: A Short Review of Basic Statistics
In This Chapter
Getting a handle on probability, randomness, sampling, and inference
Tackling hypothesis testing
Knowing about nonparametric statistical tests
This chapter provides a brief overview of some basic concepts that are often taught in a one-semester introductory statistics course. They form a conceptual framework for topics that I cover in more depth throughout this book. Here, you get the scoop on probability, randomness, populations, samples, statistical inference, hypothesis testing, and nonparametric statistics.
Note: I can only summarize the concepts here; they’re covered in much more depth in Statistics For Dummies, 2nd Edition, and Statistics II For Dummies, both written by Deborah J. Rumsey, PhD, and published by Wiley. So you may want to skim through this chapter to get an idea of what topics you’re already comfortable with and which ones you need to brush up on.
Taking a Chance on Probability
Defining probability without using some word that means the same (or nearly the same) thing can be hard. Probability is the degree of certainty, the chance, or the likelihood that something will happen. Of course, if you then try to define chance or likelihood or certainty, you may wind up using the word probability in the definition.
Don’t worry; I clear up the basics of probability in the following sections. I explain how to define probability as a number, provide a few simple rules of probability, and compare probability to odds (these two terms are related but not the same thing).
Thinking of probability as a number
Probability describes the relative frequency of the occurrence of an event (like getting heads on a coin flip or drawing the ace of spades from a deck of cards). Probability is a number between 0 and 1, although in casual conversation, you often see probabilities expressed as percentages, often followed by the word chance instead of probability. For example: If the probability of rain is 0.7, you may hear someone say that there’s a 70 percent chance of rain.
A probability of 0 means that the event definitely won’t occur.
A probability of 1 (or 100 percent) means that the event definitely will occur.
A probability between 0 and 1 (like 0.7) means that the event will occur some part of the time (like 70 percent) in the long run.
The probability of one particular thing happening out of N equally likely things that could happen is 1/N. So with a deck of 52 different cards, the probability of drawing any one specific card (like the ace of spades) is 1/52.
Following a few basic rules
Here are three basic rules, or formulas, of probabilities — I call them the not rule, the and rule, and the or rule. In the formulas that follow, I use Prob as an abbreviation for probability, expressed as a fraction (between 0 and 1).
The not rule: The probability of some event X not happening is 1 minus the probability of X happening:
Prob(not X) = 1 – Prob(X)
So if the probability of rain tomorrow is 0.7, then the probability of no rain tomorrow is 1 – 0.7, or 0.3.
The and rule: For two independent events, X and Y, the probability of event X and event Y both happening is equal to the product of the probability of each of the two events:
Prob(X and Y) = Prob(X) × Prob(Y)
So, if you flip a fair coin and then draw a card from a deck, what’s the probability of getting heads on the coin flip and then drawing the ace of spades? The probability of getting heads in a fair coin flip is 1/2, and the probability of drawing the ace of spades from a deck of cards is 1/52, so the probability of having both of these things happen is (1/2)(1/52), or 1/104, or 0.0096 (approximately).
The or rule: For two independent events, X and Y, the probability of one or the other or both events happening is given by a more complicated formula, which can be derived from the preceding two rules.
Prob(X or Y) = 1 – (1 – Prob(X)) × (1 – Prob(Y))
Suppose you roll a pair of dice. What’s the probability of at least one of the dice coming up a 4? If the dice aren’t loaded, there’s a 1/6 chance (a probability of 0.167, approximately) of getting a 4 (or any other specified number) on any die you roll, so the probability of getting a 4 on at least one of the two dice is 1 – (1 – 0.167) × (1 – 0.167), which works out to 1 – 0.833 × 0.833, or 0.31, approximately.
Comparing odds versus probability
You see the word odds used a lot in this book, especially in Chapter 14 (on the fourfold cross-tab table) and Chapter 20 (on logistic regression). Odds and probability are related, but the two words are not synonymous.
Odds = Probability/(1 – Probability)
With a little algebra (which you don’t need to worry about), you can solve this formula for probability as a function of odds:
Probability = Odds/(1 + Odds)
Table 3-1 shows how probability and odds are related.
Table 3-1 The Relationship between Probability and Odds
Probability |
Odds |
Interpretation |
1.0 |
Infinity |
The event will definitely occur. |
0.9 |
9 |
The event will occur 90% of the time (is nine times as likely to occur as to not occur). |
0.75 |
3 |
The event will occur 75% of the time (is three times as likely to occur as to not occur). |
0.667 |
2 |
The event will occur two-thirds of the time (is twice as likely to occur as to not occur). |
0.5 |
1.0 |
The event will occur about half the time (is equally likely to occur or not occur). |
0.333 |
0.5 |
The event will occur one-third of the time (is only half as likely to occur as to not occur). |
0.25 |
0.3333 |
The event will occur 25% of the time (is one-third as likely to occur as to not occur). |
0.1 |
0.1111 |
The event will occur 10% of the time (is 1/9th as likely to occur as to not occur). |
0 |
0 |
The event definitely will not occur. |
Some Random Thoughts about Randomness
Like probability (which I cover earlier in this chapter), the word random is something we use all the time and something we all have some intuitive concept of, but find hard to put into precise language. You can talk about random events and random variables. Random is a term that applies to the data you acquire in your experiments. When talking about a sequence of random numbers, random means the absence of any pattern in the numbers that can be used to predict what the next number will be.
The first step in analyzing a set of data is to have a good idea of what the data looks like. This is the job of descriptive statistics — to show you how a set of numbers are spread around and to show you the relationship between two or more sets of data. The basic tool for describing the distribution of values for some variable in a sample of subjects is the histogram, or frequency distribution graph (I describe histograms in more detail in Chapter 8). Histograms help you visualize the distributions of two types of variables:
Categorical: For categorical variables (such as gender or race), a histogram is simply a bar chart showing how many observations fall into each category, like the distribution of race in a sample of subjects, as shown in Figure 3-1a.
Continuous: To make a histogram of a continuous variable (such as weight or blood hemoglobin), you divide the range of values into some convenient interval, count how many observations fall within each interval, and then display those counts in a bar chart, as shown in Figure 3-1b (which shows the distribution of hemoglobin for a sample of subjects).
Illustration by Wiley, Composition Services Graphics
Figure 3-1: Histograms of categorical (a) and continuous (b) data.
Picking Samples from Populations
The idea of sampling from a population is one of the most fundamental concepts in statistics — indeed, in all of science. For example, you can’t test how a chemotherapy drug will work in all people with lung cancer; you can study only a limited sample of lung cancer patients who are available to you and draw conclusions from that sample — conclusions that you hope will be valid for all lung cancer patients.
In the following sections, I explain how samples are only imperfect reflections of the populations they’re drawn from, and I describe the basics of probability distributions.
Recognizing that sampling isn’t perfect
Population: All individuals having a precisely defined set of characteristics (for example: human, male, age 18–65, with Stage 3 lung cancer)
Sample: A subset of a defined population, selected for experimental study
Any sample, no matter how carefully it is selected, is only an imperfect reflection of the population, due to the unavoidable occurrence of random sampling fluctuations. Figure 3-2, which shows IQ scores of a random sample of 100 subjects from the U.S. population, exhibits this characteristic. (IQ scores are standardized so that the average for the whole population is 100, with a standard deviation of 15.)
Illustration by Wiley, Composition Services Graphics
Figure 3-2: Distribution of IQ scores in a) the population, and b) a random sample of 100 subjects from that population.
The sample is distributed more or less like the population, but clearly it’s only an approximation to the true distribution. The mean and standard deviation (I define those terms precisely in Chapter 8) of the sample are close to, but not exactly equal to, the mean and standard deviation of the population, and the histogram doesn’t have a perfect bell shape. These characteristics are always true of any random sample.
Digging into probability distributions
Samples differ from populations because of random fluctuations. Statisticians understand quantitatively how random fluctuations behave by developing mathematical equations, called probability distribution functions, that describe how likely it is that random fluctuations will exceed any given magnitude. A probability distribution can be represented in several ways:
As a mathematical equation that gives the chance that a fluctuation will be of a certain magnitude. Using calculus, this function can be integrated — turned into another related function that tells the probability that a fluctuation will be at least as large as a certain magnitude.
As a graph of the distribution, which looks and works much like a histogram of observed data.
As a table of values telling how likely it is that random fluctuations will exceed a certain magnitude.
Over the years, hundreds of different probability distributions have been described, but most practical statistical work utilizes only a few of them. You encounter fewer than a dozen probability distributions in this book. In the following sections, I break down two types of distributions: those that describe fluctuations in your data and those that you encounter when performing statistical tests.
Distributions that describe your data
Some distributions describe the random fluctuations you see in your data:
Normal: The familiar, bell-shaped, normal distribution describes (at least approximately) an enormous number of variables you encounter.
Log-normal: The skewed, log-normal distribution describes many laboratory results (enzymes and antibody titers, for example), lengths of hospital stays, and related things like costs, utilization of tests, drugs, and so forth.
Binomial: The binomial distribution describes proportions, such as the fraction of subjects responding to treatment.
Poisson: The Poisson distribution describes the number of occurrences of sporadic random events, such as clicks in a gamma radiation counter or deaths during some period of time.
Chapter 25 describes these and other distribution functions in more detail, and you encounter them throughout this book.
Distributions that come up during statistical testing
Some frequency distributions don’t describe fluctuations in observed data, but rather describe fluctuations in numbers that you calculate as part of a statistical test (described in the later section Honing In on Hypothesis Testing). These distributions include the Student t, chi-square, and Fisher F distributions (see Chapter 25), which are used to obtain the p values (see the later section Getting the language down for a definition of p values) that result from the tests.
Introducing Statistical Inference
Statistical inference is the drawing (that is, inferring) of conclusions about a population based on what you see in a sample from that population. In keeping with the idea that statisticians understand how random fluctuations behave, we can say that statistical inference theory is concerned with how we can extract what’s real in our data, despite the unavoidable random noise that’s always present due to sampling fluctuations or measurement errors. This very broad area of statistical theory is usually subdivided into two topics: statistical estimation theory and statistical decision theory.
Statistical estimation theory
Statistical estimation theory focuses on the accuracy and precision of things that you estimate, measure, count, or calculate. It gives you ways to indicate how precise your measurements are and to calculate the range that’s likely to include the true value. The following sections provide the fundamentals of this theory.
Accuracy and precision
Accuracy refers to how close your measurement tends to come to the true value, without being systematically biased in one direction or another.
Precision refers to how close a bunch of replicate measurements come to each other — that is, how reproducible they are.
Figure 3-3 shows four shooting targets with a bunch of bullet holes from repeated rifle shots. These targets illustrate the distinction between accuracy and precision — two terms that describe different kinds of errors that can occur when sampling or measuring something (or, in this case, when shooting at a target).
Illustration by Wiley, Composition Services Graphics
Figure 3-3: The difference between accuracy and precision.
You see the following in Figure 3-3:
The upper-left target is what most people would hope to achieve — the shots all cluster together (good precision), and they center on the bull’s-eye (good accuracy).
The upper-right target shows that the shots are all very consistent with each other (good precision), so we know that the shooter was very steady (with no large random perturbations from one shot to the next), and any other random effects must have also been quite small. But the shots were all consistently high and to the right (poor accuracy). Perhaps the gun sight was misaligned or the shooter didn’t know how to use it properly. A systematic error occurred somewhere in the aiming and shooting process.
The lower-left target indicates that the shooter wasn’t very consistent from one shot to another (he had poor precision). Perhaps he was unsteady in holding the rifle; perhaps he breathed differently for each shot; perhaps the bullets were not all properly shaped, and had different aerodynamics; or any number of other random differences may have had an effect from one shot to the next. About the only good thing you can say about this shooter is that at least he tended to be more or less centered around the bull’s-eye — the shots don’t show any tendency to be consistently high or low, or consistently to the left or right of center. There’s no evidence of systematic error (or inaccuracy) in his shooting.
The lower-right target shows the worst kind of shooting — the shots are not closely clustered (poor precision) and they seem to show a tendency to be high and to the right (poor accuracy). Both random and systematic errors are prominent in this shooter’s shooting.
Sampling distributions and standard errors
Fortunately, you don’t have to repeat the entire experiment a large number of times to calculate the SE. You can usually estimate the SE using data from a single experiment. In Chapter 9, I describe how to calculate the standard errors for means, proportions, event rates, regression coefficients, and other quantities you measure, count, or calculate.
Confidence intervals
Confidence intervals provide another way to indicate the precision of an estimate or measurement of something. A confidence interval (CI) around an estimated value is the range in which you have a certain degree of certitude, called the confidence level (CL), that the true value for that variable lies. If calculated properly, your quoted confidence interval should encompass the true value a percentage of the time at least equal to the quoted confidence level.
Suppose you treat 100 randomly selected migraine headache sufferers with a new drug, and you find that 80 of them respond to the treatment (according to the response criteria you have established). Your observed response rate is 80 percent, but how precise is this observed rate? You can calculate that the 95 percent confidence interval for this 80 percent response rate goes from 70.8 percent to 87.3 percent. Those two numbers are called the lower and upper 95 percent confidence limits around the observed response rate. If you claim that the true response rate (in the population of migraine sufferers that you drew your sample from) lies between 70.8 percent and 87.3 percent, there’s a 95 percent chance that that claim is correct.
How did I get those confidence limits? In Chapter 10, I describe how to calculate confidence intervals around means, proportions, event rates, regression coefficients, and other quantities you measure, count, or calculate.
Statistical decision theory
Statistical decision theory is perhaps the largest branch of statistics. It encompasses all the famous (and many not-so-famous) significance tests — Student t tests (see Chapter 12), chi-square tests (see Chapter 13), analysis of variance (ANOVA; see Chapter 12), Pearson correlation tests (see Chapter 17), Wilcoxon and Mann-Whitney tests (see Chapter 12), and on and on.
The average value of something may be different in one group compared to another. For example, males may have higher hemoglobin values, on average, than females; the effect of gender on hemoglobin can be quantified by the difference in mean hemoglobin between males and females. Or subjects treated with a drug may have a higher recovery rate than subjects given a placebo; the effect size could be expressed as the difference in recovery rate (drug minus placebo) or by the ratio of the odds of recovery for the drug relative to the placebo (the odds ratio).
The average value of something may be different from zero (or from some other specified value). For example, the average change in body weight over 12 weeks in a group of subjects undergoing physical therapy may be different from zero.
Two numerical variables may be associated (also called correlated). For example, if obesity is associated with hypertension, then body mass index may be correlated with systolic blood pressure. This effect is often quantified by the Pearson correlation coefficient.
Homing In on Hypothesis Testing
The theory of statistical hypothesis testing was developed in the early 20th century and has been the mainstay of practical statistics ever since. It was designed to apply the scientific method to situations involving data with random fluctuations (and almost all real-world data has random fluctuations). In the following sections, I list a few terms commonly used in hypothesis testing; explain the steps, results, and possible errors of testing; and describe the relationships between power, sample size, and effect size in testing.
Getting the language down
Here are some of the most common terms used in hypothesis testing:
Null hypothesis (abbreviated H0): The assertion that any apparent effect you see in your data does not reflect any real effect in the population, but is merely the result of random fluctuations.
Alternate hypothesis (abbreviated H1 or HAlt): The assertion that there really is some real effect in your data, over and above whatever is attributable to random fluctuations.
Significance test: A calculation designed to determine whether H0 can reasonably explain what you see in your data.
Significance: The conclusion that random fluctuations alone can’t account for the size of the effect you observe in your data, so H0 must be false, and you accept HAlt.
Statistic: A number that you obtain or calculate from your data.
Test statistic: A number, calculated from your data, usually for the purpose of testing H0. It’s often — but not always — calculated as the ratio of a number that measures the size of the effect (the signal) divided by a number that measures the size of the random fluctuations (the noise).
p value: The probability that random fluctuations alone in the absence of any real effect (in the population) can produce an observed effect at least as large as what you observe in your sample. The p value is the probability of random fluctuations making the test statistic at least as large as what you calculate from your data (or, more precisely, at least as far away from H0 in the direction of HAlt).
Type I error: Getting a significant result when, in fact, no effect is present.
Alpha: The probability of making a Type I error.
Type II error: Failing to get a significant result when, in fact, some effect really is present.
Beta: The probability of making a Type II error.
Power: The probability of getting a significant result when some effect is really present.
Testing for significance
1. Boil your raw data down into a single number, called a test statistic.
Each test has its own formula, but in general, the test statistic represents the magnitude of the effect you’re looking for relative to the magnitude of the random noise in your data. For example, the test statistic for the unpaired Student t test for comparing means between two groups is calculated as a fraction:
The numerator is a measure of the effect you’re looking for — the difference between the two groups. And the denominator is a measure of the random noise in your data — the spread of values within each group. The larger the observed effect is, relative to the amount of random scatter in your data, the larger the Student t statistic will be.
2. Determine how likely (or unlikely) it is for random fluctuations to produce a test statistic as large as the one you actually got from your data.
The mathematicians have done the hard work; they’ve developed formulas (really complicated ones) that describe how much the test statistic bounces around if only random fluctuations are present (that is, if H0 is true).
Understanding the meaning of “p value” as the result of a test
The end result of a statistical significance test is a p value, which represents the probability that random fluctuations alone could have generated results that differed from the null hypothesis (H0), in the direction of the alternate hypothesis (HAlt), by at least as much as what you observed in your data.
If this probability is too small, then H0 can no longer explain your results, and you’re justified in rejecting it and accepting HAlt, which says that some real effect is present. You can say that the effect seen in your data is statistically significant.
Examining Type I and Type II errors
The outcome of a statistical test is a decision to either accept or reject H0 in favor of HAlt. Because H0 pertains to the population, it’s either true or false for the population you’re sampling from. You may never know what that truth is, but an objective truth is out there nonetheless.
The truth can be one of two things, and your conclusion is one of two things, so four different situations are possible; these are often portrayed in a fourfold table, as shown in Figure 3-4 (Chapter 14 has details on these tables).
Illustration by Wiley, Composition Services Graphics
Figure 3-4: Right and wrong conclusions from a statistical hypothesis test.
You can get a nonsignificant result when there is truly no effect present. This is correct — you don’t want to claim that a drug works if it really doesn’t. (See the upper-left corner of the outlined box in Figure 3-4.)
You can get a significant result when there truly is some effect present. This is correct — you do want to claim that a drug works when it really does. (See the lower-right corner of the outlined box in Figure 3-4.)
You can get a significant result when there’s truly no effect present. This is a Type I error — you’ve been tricked by random fluctuations that made the drug look effective. (See the lower-left corner of the outlined box in Figure 3-4.) Your company will invest millions of dollars into the further development of a drug that will eventually be shown to be worthless. Statisticians use the Greek letter alpha (α) to represent the probability of making a Type I error.
You can get a nonsignificant result when there truly is an effect present. This is a Type II error (see the upper-right corner of the outlined box in Figure 3-4) — you’ve failed to see that the drug really works, perhaps because the effect was obscured by the random noise in the data. Further development will be halted, and the miracle drug of the century will be consigned to the scrap heap, along with the Nobel prize you’ll never get. Statisticians use the Greek letter beta (β) to represent the probability of making a Type II error.
Why not use a small alpha level (like p < 0.000001) for your significance testing? Because then you’ll almost never get significance, even if an effect really is present. Researchers don’t like to go through life never making any discoveries. If a drug really is effective, you want to get a significant result when you test it. You need to strike a balance between Type I and Type II errors — between the alpha and beta error rates. If you make alpha too small, beta will become too large, and vice versa. Is there any way to keep both types of errors small? There is, and that’s what I describe next.
Grasping the power of a test
The power of any statistical test depends on several factors:
The alpha level you’ve established for the test — that is, the chance you’re willing to accept of making a Type I error
The actual magnitude of the effect in the population, relative to the amount of noise in the data
The size of your sample
Power, sample size, effect size relative to noise, and alpha level can’t all be varied independently; they’re interrelated — connected and constrained by a mathematical relationship involving the four quantities.
This relationship is often very complicated, and sometimes it can’t be written down explicitly as a formula, but it does exist. For any particular type of test, you can (at least in theory) determine any one of the four quantities if you know the other three. So there are four different ways to do power calculations, with each way calculating one of the four quantities from arbitrarily specified values of the other three. (I have more to say about this in Chapter 5, where I describe practical issues that arise during the design of research studies.) In the following sections, I describe the relationships between power, sample size, and effect size, and I briefly note how you can perform power calculations.
Power, sample size, and effect size relationships
Power versus sample size, for various effect sizes: For all statistical tests, power always increases as the sample size increases, if other things (such as alpha level and effect size) are held constant. This relationship is illustrated in Figure 3-5. “Eff” is the effect size — the between-group difference divided by the within-group standard deviation.
Very small samples very seldom produce significant results unless the effect size is very large. Conversely, extremely large samples (many thousands of subjects) are almost always significant unless the effect size is near zero. In epidemiological studies, which often involve hundreds of thousands of subjects, statistical tests tend to produce extremely small (and therefore extremely significant) p values, even when the effect size is so small that it’s of no practical importance.
Illustration by Wiley, Composition Services Graphics
Figure 3-5: The power of a statistical test increases as the sample size and the effect size increase.
Power versus effect size, for various sample sizes: For all statistical tests, power always increases as the effect size increases, if other things (such as alpha level and sample size) are held constant. This relationship is illustrated in Figure 3-6. “N” is the number of subjects in each group.
For very large effect sizes, the power approaches 100 percent. For very small effect sizes, you might think the power of the test would approach zero, but you can see from Figure 3-6 that it doesn’t go down all the way to zero; it actually approaches the alpha level of the test. (Keep in mind that the alpha level of the test is the probability of the test producing a significant result when no effect is truly present.)
Illustration by Wiley, Composition Services Graphics
Figure 3-6: The power of a statistical test increases as the effect size increases.
Sample size versus effect size, for various values of power: For all statistical tests, sample size and effect size are inversely related, if other things (such as alpha level and power) are held constant. Small effects can be detected only with large samples; large effects can often be detected with small samples. This relationship is illustrated in Figure 3-7.
Illustration by Wiley, Composition Services Graphics
Figure 3-7: Smaller effects need larger samples.
This inverse relationship between sample size and effect size takes on a very simple mathematical form (at least to a good approximation): The required sample size is inversely proportional to the square of the effect size that can be detected. Or, equivalently, the detectable effect size is inversely proportional to the square root of the sample size. So, quadrupling your sample size allows you to detect effect sizes only one-half as large.
How to do power calculations
Computer software: The larger statistics packages (such as SPSS, SAS, and R) provide a wide range of power calculations — see Chapter 4 for more about these packages. There are also programs specially designed for this purpose (nQuery, StatExact, Power and Precision, PS-Power & Sample Size, and Gpower, for instance).
Web pages: Many of the more common power calculations can be performed online using web-based calculators. A large collection of these can be found at StatPages.info
.
Hand-held devices: Apps for the more common power calculations are available for most tablets and smartphones.
Printed charts and tables: You can find charts and tables in textbooks (including this one; see Chapter 12 and this book's Cheat Sheet at www.dummies.com/cheatsheet/biostatistics
). These are ideal for quick and dirty calculations.
Rules of thumb: Some approximate sample-size calculations are simple enough to do on a scrap of paper or even in your head! You find some of these in Chapter 26 and on the Cheat Sheet: Go to www.dummies.com/cheatsheet/biostatistics
.
Going Outside the Norm with Nonparametric Statistics
All statistical tests are derived on the basis of some assumptions about your data, and most of the classical significance tests (such as Student t tests, analysis of variance, and regression tests) assume that your data is distributed according to some classical frequency distribution (most commonly the normal distribution; see Chapter 25). Because the classic distribution functions are all written as mathematical expressions involving parameters (like means and standard deviation), they’re called parametric distribution functions, and tests that assume your data conforms to a parametric distribution function are called parametric tests. Because the normal distribution is the most common statistical distribution, the term parametric test is most often used to mean a test that assumes normally distributed data.
But sometimes your data isn’t parametric. For example, you may not want to assume that your data is normally distributed because it may be very noticeably skewed, as shown in Figure 3-8a.
Sometimes, you may be able to perform some kind of transformation of your data to make it more normally distributed. For example, many variables that have a skewed distribution can be turned into normally distributed numbers by taking logarithms, as shown in Figure 3-8b. If, by trial and error, you can find some kind of transformation that normalizes your data, you can run the classical tests on the transformed data. (See Chapter 8.)
Illustration by Wiley, Composition Services Graphics
Figure 3-8: Skewed data (a) can sometimes be turned into normally distributed data (b) by taking logarithms.
But sometimes your data is stubbornly abnormal, and you can’t use the parametric tests. Fortunately, statisticians have developed special tests that don’t assume normally distributed data; these are (not surprisingly) called nonparametric tests. Most of the common classic parametric tests have nonparametric counterparts. As you may expect, the most widely known and commonly used nonparametric tests are those that correspond to the most widely known and commonly used classical tests. Some of these are shown in Table 3-2.
Table 3-2 Nonparametric Counterparts of Classic Tests
Classic Parametric Test |
Nonparametric Equivalent |
One-group or paired Student t test (see Chapter 12) |
Sign test; Wilcoxon signed-ranks test |
Two-group Student t test (see Chapter 12) |
Wilcoxon sum-of-ranks test; Mann-Whitney U test |
One-way ANOVA (see Chapter 12) |
Kruskal-Wallis test |
Pearson Correlation test (see Chapter 17) |
Spearman Rank Correlation test |
Most nonparametric tests involve first sorting your data values, from lowest to highest, and recording the rank of each measurement (the lowest value has a rank of 1, the next highest value a rank of 2, and so on). All subsequent calculations are done with these ranks rather than with the actual data values.
Although nonparametric tests don’t assume normality, they do make certain assumptions about your data. For example, many nonparametric tests assume that you don’t have any tied values in your data set (in other words, no two subjects have exactly the same values). Most parametric tests incorporate adjustments for the presence of ties, but this weakens the test and makes the results nonexact.