5

Measuring Errors of Measurement

Henry E. Kyburg, Jr.

University of Rochester

What is now considered classical measurement theory—which has nothing to do with the theory of making measurements—is concerned with specifying the homomorphisms of some “qualitative (or empirical) structure into a numerical one” (Narens 1985, p. 5). This view of classical measurement theory is referred to as the representational theory, because we are concerned with how to characterize the ways in which a given empirical structure can be represented in a numerical structure. This approach received a nearly definitive embodiment in Krantz, Luce, Suppes, and Tversky (1971). Narens (1984) provided further elegant mathematical elaborations and developments. However, one form of measurement has been conspicuously ignored in this modern view: the measurement of error. Narens’ book does not contain the word “error” in its index. Krantz et al. (1971) provided two entries in the index. One concerns approximation and measurement by inequality relations and gives us no quantitative handle on error, and the other contains the following cryptic remark: “Today (1971), however, few error theories exist; what we know about them is described in Chapters 15–17” (Krantz et al., 1971).

A treatment of error was promised in chapters 15–17 of the second volume. The second volume has now appeared (Krantz et al., 1989). Consistently, it treats error as something that other people are subject to, and explores the question from point of view of an experimenter seeking to characterize objectively the error-behavior of subjects. Berka (1983, pp. 196–198) is not much more helpful; he no more than pays his respects to “systematic” and “accidental” errors.

The opposite extreme is represented by Barford (1985): He is concerned with teaching students that an experimentalist must learn to deal with error, and he provides a convenient guide to the mathematics of doing this. Mann (1949) does this same thing at a more sophisticated level, as does E. Bright Wilson (1952). These approaches provide us with the machinery for dealing with errors of measurement, but no philosophical analysis of the relation between measurement and error.

Norman Campbell (1920), the classical early work on measurement, has very little to say about the representation of error, or about dealing with it as a perennial problem of measurement. Brian Ellis (1968) takes an unusually deep and philosophical approach to measurement, but the word “error” doesn’t even appear in his index.

Fred Roberts (1979) puts the matter this way: “In a situation of error or noise, some statistical discussion of goodness of fit of a measurement representation is called for. The literature of measurement theory has not been very helpful on the development of such statistical tests.” (1979, p. 104)

It is my contention that a representational view of measurement renders a quantitative treatment of errors of measurement impossible. In order to argue for this, I must first say what I take a quantitative treatment of errors to be. Clearly it is not a treatment according to which we can apply a veracity-meter to the result of a given measurement and conclude that that measurement is in error by (say) + .13 millimeters. (Berka, 1983, p. 197, almost says this: “[W]e can determine its [the error’s] approximate value!”) In the first place, the theory must be statistical: What we learn about the errors of measurement associated with a certain measurement procedure is that they follow a certain statistical distribution. Such learning is fraught with problems; we will deal with some of them later, but, for the moment, let us leave these problems to one side.

Let us suppose we have a specific statistical theory of errors of measurement associated with a certain procedure of measurement. This is, after all, roughly what those scientists and engineers for whom measurement is part of their daily round tend to suppose. What do we say about the particular measurement of length? Not that it is in error by +. 13 mm, for then we would just subtract. 13 to the result of measurement to get the true value. What we might say is that 0.13 represents the standard deviation of the distribution of errors generated by the method of measurement employed.

However, this is clearly something that is not amenable to reconstruction according to the representational view. According to that view, we must have a collection of errors and a procedure for mapping them into the reals. The object of the theory, on this view, is to characterize all the ways of providing a homomorphism from the set of errors into the reals. Yet the best we can have is a statistical characterization of a distribution of errors. The only sense in which we can think of ourselves as measuring errors of measurement is collectively—by determining, in given units, the statistical distribution of errors produced by a certain procedure of measurement.

The point here is that we do not and cannot measure a single “error.” There are exceptions to this—for example, when we are calibrating a measuring instrument; that will be discussed in due course, but it is not the common case of dealing with error. Furthermore, even in this case, we must already know something about the distribution of errors produced by comparisons with the standard.

I conclude that measuring error is not directly analogous to measuring length. There is no way in which we can say, “Here is a set of errors, and a collection of relations among them (some are larger than others); how best shall we represent this collection of objects and relations in the real numbers?” The representational theory simply does not apply without profound changes. And yet we know perfectly well that we make errors of measurement, and in fact we have a good sense of when we make large errors as opposed to small errors. How can this be?

WHAT ERRORS?

How do we know that any of our measurements are in error? In the first place, we don’t. There is no reason—no a priori, irrefutable reason—to suppose that any of our measurements are in error. Suppose we measure the table five times and get five different results? Well no two of those measurements were made at the same time; so what is to say that the table hasn’t changed length over time in such a way as to make all those measurements 100% accurate? What is to say? It is perfectly clear, once we ask the question. It is our vague and general (and perfectly well justified, within its limits) theory of physical objects, or our precise and technical theory of physical objects, if we have one. We know, and physics supports us in this, that objects like the table do not go around changing their lengths without reason. Without reason? Well except in response to such things as changes in temperature, humidity, physical stresses, and so on.

The most direct source of this knowledge that measurement admits of error is that we have all measured the same thing a number of times and obtained different measurements. Consider the following example: We measure the length of the table ten times, and we get seven or eight different results. “All within experimental error,” shall we say? Not so fast—we do not have a theory of error, much less any way of determining what is and what is not “within experimental error.”

This suggests a problem. If we construe our physics in a technical way, then the laws at issue—laws relating stress and strain, laws relating length to temperature or humidity—must be construed as “experimental laws.” (If we were to construe them as deductive consequences of theoretical laws—a most implausible proposal—we would fact the same difficulties as we face in the case of experimental laws.) These experimental laws are obtained from measurement. To obtain the coefficient of thermal expansion (or contraction) for a kind of stuff, we perform laboratory experiments that involve careful measurement. If it were not for measurement—and our understanding of errors of measurement—we would not have the experimental laws that inform us that sometimes our measurements are in error.

I think it is no help to retreat to informal physics. It is quite true that we can come to know that most things expand when heated, or even that they don’t change in length unless something happens to them (like being heated), but the procedure of measuring errors of measurement will be essentially the same: From vague and informal physical laws, we obtain vague and informal statistical theories of error. In general, large errors are rarer than small errors; there is no a priori limit on the size of an error; positive and negative errors both occur.

Many writers—Balzer, chapter 6, in the present volume; Berka (1983)—refer to “approximation,” “fuzziness,” and “inaccuracy” to paste over the difference between the crisp ideals of theory and the recalcitrant world of reality. Thank heaven the engineers who design our bridges are not content with such an informal treatment of error.

So it is our knowledge of physics, our knowledge of the way the world goes, that tells us that the table does not change its length, and, in consequence, that tells us that we have made some error or other when we measure the table a number of times and get different results. It is only through knowledge of the physical world, either in the form of common-sense knowledge or in the form of knowledge of the laws of physics that we know that there are any errors at all in our measurements.

However, “our knowledge of physics” is itself a rather vague notion. Exactly what it is about our knowledge of physics that tells us that our measurements are subject to error depends both on what we are trying to measure and what we are assuming belongs to our body of physical knowledge. We all know that the length of the table is essentially constant—that is, that its thermal expansions and contractions are orders of magnitude less than the variations in our measurements of the table. (We leave to one side expansions and contractions due to changes in pressure, magnetic fields, etc.)

It is only when we already have some physical knowledge that we can even begin to think of making measurements. So let us look at the foundations of elementary measurement and see what is required in the way of background knowledge before we can get the measurement business off the ground.

DERIVING ERROR DISTRIBUTIONS

To fix our ideas, let us look at the elementary measurement of length. We can see that some kinds of things change length easily, and other kinds of things do not. Worms, for example, change length easily and frequently. Ax handles do not. Trees change length (height), but they do so very slowly. This is where we start from. So let us look at things like ax handles. Some are longer than others (and stay that way!). Of some pairs of ax handles, one cannot say that one is longer than the other. Note that this is not the same as saying that they are the same length.

In fact, there is a tradeoff: If we are very cautious about judging that A is longer than B, there will be a large class of cases in which we suspend judgment about the relative lengths of things, but that class will not at all be an equivalence class. Just because I can’t distinguish between A and B, and can’t distinguish between B and C (in this cautious sense) doesn’t mean that I can’t distinguish between A and C. On the other hand, if I am less cautious about saying that A is longer than B, then there will be less things about whose relative lengths I suspend judgment, and we may expect that more of the pairs among the things indistinguishable in length from B will in turn be indistinguishable.

What is true of the relation “longer than?” We have just seen that its converse is not transitive. In order to assert that “A is not longer than B,” and “B is not longer than C,” imply “A is not longer than C,” we must be infinitely cautious about asserting such statements as “A is not longer than B.” Whether the relation of not being longer than is transitive in itself is another question.

We do not get measurement until we have equivalence classes corresponding to being (truly) the same length as, and these require the transitivity of not being longer than. The problems are the following two: How do we characterize such equivalence classes, and how do we come to have grounds for assigning objects to them? On the other hand, if we never have such grounds, how do we use these equivalence classes?

In the first place, it seems quite clear that we cannot characterize such equivalence classes in observational terms. As I point out (1984), the lengths that strictly satisfy the traditional formal axioms—especially the existential axioms such as the Archimedean axiom and the divisibility axiom—cannot be taken as empirical objects (or equivalently the functions that correspond to ideal measurement cannot be empirical functions). They are ideal objects. The theory of ideal length entails that every rigid body has a true length at a given time and further that that true length can be expressed in terms of whatever we have selected as the standard unit length. The selection of this standard, given the theory, gives all the equivalence classes we need.

It is not germane to our discussion of error to characterize the theory in general (although in fact it can have any one of a number of conventional formulations), nor to discuss the grounds for accepting such a theory, except to point out that it is only by admitting—”embracing”—the existence of error that we can possibly accept any such theory. Our problem is this: Given such a theory, and given the existence of error necessary to allow us to accept that theory, how can we go about measuring that error so that we can use the theory?

To begin with, let us consider two principles that I have defended elsewhere (1984). The first principle I call the minimization principle; the second I call the distribution principle. They can be illustrated in the very case I have just mentioned. Suppose we compare the lengths of a lot of objects. The transitivity principles only hold for rigid bodies; therefore, we had best include “x is a Rigid Body” among our predicates. So let us suppose that we have accumulated a number of statements of the following forms: “A is a rigid body;” “A is longer than B;” “A is not a rigid body;” and “A is not longer than B.” Suppose, as seems natural, that this collection of statements is inconsistent with the following generalizations: “If x, y, and z are rigid bodies, then, if x is longer than y, and y is longer than z, x is longer than z;” and “If x, y, and z are rigid bodies, then, if x is not longer than y, and y is not longer than z, x is not longer than z.” Both of course are required to generate equivalence classes of lengths. Further, it is also quite clear that we want to have the following as an a priori principle: “If x and y are rigid bodies, and x is longer than y, then y is not longer than x.”

It follows that some of these statements, however firmly supported by observation, are false—How many? Which ones? There is no reason (unless we are in the unlikely situation of having both “x is longer than y,” and “x is not longer than y” supported by observation) that all the statements cannot be false. But of course to reject them all, or to regard them all as unsupported, is gratuitous skepticism of the worst and most self-defeating sort.

“Which ones?” seems a question impossible to answer except arbitrarily. Fortunately we do not have to answer that question in order to use the conflicts as evidence for the frequency of error of judgments of the sorts considered. For that, we need only answer: “How many?”

“How many?” To avoid gratuitous skepticism, it seems quite clear that we should attribute no more error to our observations than we are required to attribute to them. So we should regard as erroneous the minimum number of statements that will render our body of observations consistent with the generalizations we have taken as a priori truths. (Note that this is different from minimizing the long run error rates; because we may plausibly regard the errors of observation of different sorts as stochastically independent, it is quite possible, as has been suggested, that the estimated long run error rates may differ from those that I have called minimal.)

This is the minimization principle: Do not regard any more statements as false than you need to in order to make your observations consistent with your body of knowledge.

But this principle doesn’t give you error rates for particular kinds of observations. To get them, we need a supplementary principle: the distribution principle. What are the kinds of statements involved in our example? “x is longer than y,” “x is not longer than y;” “x is a rigid body;” “x is not a rigid body.” The distribution principle tells us that the errors among these kinds of observation statements among those we have accepted as observational should be distributed as evenly as possible, given the satisfaction of the minimization principle.

This principle too, can be questioned. Perhaps negative judgments are intrinsically more (or less—I honestly do not know) reliable than positive judgments. But pending an argument to this effect, we had best stick to the even distribution.

This does not mean that each kind of observation will exhibit the same frequency of errors. We can have no grounds, in the situation described, to regard any of the statements of the form “x is not a rigid body” as false.

But some statements of the forms “x is a rigid body;” “x is longer than y,” and “x is not longer than y,” must be wrong. Given the minimal number that must be wrong, there may well be various ways of distributing error among statements of these three kinds. Our second principle tells us to perform this distribution in the most democratic way possible. Note that this still gives us no way to settle on any particular statement as erroneous.

Now we can use this data—that so and so many of each kind of statement are erroneous—as data for a statistical inference. Given that the necessary conditions of randomness are met, we can infer with high probability that the error rate of observation statements of the various sorts is close to the error frequency we have derived from our sample of observations using the two principles of minimization and distribution. Needless to say, justifying this inference is another story entirely. We just assume that we may identify the minimal and most evenly distributed rejection rate as an observed error frequency, and use this observed error frequency as a basis from which to infer a long run error rate. From a finite body of (inconsistent) observation statements, we may, thus, infer (at a given level of confidence) long run error rates.

Our two principles lead to error rates for complex judgments that do not always depend directly on the error rates of their components. Statements like “A is a rigid body, and B is a rigid body, and A > B, and B > A” are subject to rejection rates of 100%.

MEASURING

Of course these principles do not apply directly to quantitative measurement. If we reject the minimal number of our measurements of the table we must reject in order to ensure that the table has a real-valued length, we must reject all but one or all but a small number. But this would only be the case if we construed our measurements as, in a sense, claiming “objective validity”, and the true length of the table is not something we can claim to observe. Nevertheless we can apply quantitative analogs of the principles of minimum rejection and distribution.

First we must construct a meterstick. Consider Robinson Crusoe. He has all the physical knowledge needed to start measuring lengths, but he does not have a meterstick. However, Robinson can easily enough construct a stick that can be construed as the colinear juxtaposition of a large number of rigid bodies of the same length. Start with a handy rigid body. Mark what appears to be the mid point. Use the principle (it follows from the generalizations already cited) that, if x, y, and z are rigid bodies, and x is the same length as y, and y is the same length as z, then x is the same length as z. We can find (or construct) an object the same length (so far as we can tell) as one half of our original rigid body and see how it fits on the other half. We can adjust the midpoint until the left half, right half, and test object are all in the same indifference class—that is, each bears the relation not longer than to the other two, so far as observation is concerned.

It is a fact about the world—about rigid objects—that we can do this. We can adjust three objects (two of them being parts of a single rigid body) so that no two of the three objects stand, so far as we can tell by direct observation, in the relation longer than. Now we repeat the process—that is, we can find a rigid body that is congruent to each of the four segments of the original body obtained by dividing each of the first two divisions in half. Again this reflects a fact about the world. The process can be continued until we have 1024 segments making up our original bistick (binary meterstick).

Note that we are not suggesting that each of these segments is really the same length. They are only indistinguishable from each other, and the indistinguishability is that transmitted through an auxilliary rigid body. We will see later how to refine the accuracy of our bistick.

If we can guess the midpoint of a rigid body, we can reasonably suppose that we can roughly judge the tenths of the rigid body. Thus, we can measure things with our bistick to within a ten-thousandth part of the unit length.

Let us now measure a rigid body with the bistick. For simplicity, let us take the original bistick as the standard unit length. If we do this a number of times, we get a number of different results: .5421, .5407, .5416, and so on. We cannot construe these measurements as purporting to be real congruences; the rigid body cannot be congruent to different fractions of the bistick. So what is the relation between these congruences and the true length of the body?

We have true lengths. These are provided by our abstract theory. We can define true lengths perfectly well. The true length of Robinson Crusoe’s table is Image bisticks if and only if the difference in length between the collinear juxtaposition of k bisticks and the collinear juxtaposition of m congruent tables can be made arbitrarily small by increasing m. What the measuring process does is to associate an abstract length with an object. The error associated with a measurement is reflected by the difference between the result of that measurement and the true value of the quantity being measured.

Just as the minimization principle, applied to categorical judgments, told us not to assume that any more are in error than we need to assume to be in error, so an analogous principle, applied to quantitative judgments, will tell us not to assume that any measurements are more in error than we need to assume they are. How shall we quantify this error? There are relatively compelling reasons for measuring it as the square of the difference between the observed measurement and the true value. The compelling reasons have more to do with analytic tractability than anything philosophically profound.

So what do we do in the case of measurements of a single object? We minimize the mean squared error. What this amounts to is that we take, as our estimate of the true value of the length of the object, the estimate of the mean of the distribution of measurements. The estimate of the variance of the measurements is exactly the estimate of the variance of the errors of measurement. Given the mean and variance of the distribution of errors of measurement, we can use our observations to make probable assertions concerning the true value: If the probability of making an error of less than size e is 0.99, and we observe a length of L, then (assuming appropriate conditions of randomness) the probability is 0.99 that the true value lies in the interval L ± e. Of course, to implement these observations, we must have come to terms with statistical inference in general— but that is nothing special or peculiar to measurement. Jaech (1985) provides some practical guidance along these lines.

But we are not done yet. We have talked of the measurements of a single object. We have a distribution of errors of measurement for measurements of that object—we may suppose that we have inferred it to be roughly normal, with a mean of 0 and a variance roughly equal to the variance of the measurements we have made. How about other objects?

Well let us measure another object. We get another distribution of measurements. The mean may well be different, but the variance is quite likely to be about the same. Indeed, in accordance with the statistical principle of assuming no more populations than necessary, we may lump the second central moments of both sets of observations together to form an estimate of the variance of the measuring process. It could be the case of course that the numbers could render it practically certain that the two populations of measurements had different variances. In that case, the data could not be combined to obtain an estimate of the general variance. This again is a matter of statistical inference and not a matter that is peculiar to measurement.

It is part of our theory of measurement (a rather fundamental part of our theory of the measurement of length) that the true length of the collinear juxtaposition of x and y is the sum of the true length of x and the true length of y. How does this bear on our knowledge of the distribution of errors of measurement of length? It imposes a constraint on the true lengths, and that constraint in turn may require the attribution of more error to our measurements than would otherwise be required.

For example, suppose that measurements of x yield 3, 4, and 5 feet; measurements of y yield 7, 8, and 9 feet; and measurements of z yield 13, 14, and 15 feet. Minimum error would suggest that we have made six errors of one foot—the distribution of errors would have a mean close to 0 and a variance of close to ¾. But now let us also suppose that z is the collinear juxtaposition of x and y.

If x*, y*, and z* are the true values of the lengths of x, y, and z, and z is the collinear justaposition of x and y, then we have x* + y* = z*, and the total squared error of measurement is as follows:

E = (3 - x*)2 + (4 - x*)2 + (5 - x*)2 + (7 - y*)2 + (8 - y*)2 + (9 - y*)2 + (13 - (x* + y*))2 + (14 - (i* + y*))2 + (15 - (x*+ y*))2

Elementary calculus shows that the least squares estimates for x*, y*, and Z* are now 4⅔, 8⅔, 13⅓; the mean error in the sample is ⅔ (mean squared error ⅜), and the sample variance has increased from ¾ to 1¼ =

Image

Abstractly the statistical problem is the following: Construe our measurements as measurements of body1, body2, … bodyn. Consider the following constraints imposed by the axioms of our language: The transitivity of “longer than,” the transitivity of “not longer than,” the irreflexivity of “longer than” (but the last does not bear on our assessments of error directly). We want to estimate the mean of each subpopulation. We want to estimate the variance of this population (or these subpopulations). We find (at least within reasonably wide limits) that there is no reason to regard the variance as differing among the populations. We should also bear in mind the informational advantage in using a larger sample, which we can do if we lump each of the classes of measurement together.

Suppose we have this statistical problem solved. What have we then? We have precisely what we express only vaguely and approximately by saying that the distribution of errors is, within ordinary limits, normally distributed with variance s2 and mean m. Note that m is rather unlikely to be exactly zero—we can define the error in such a way that its mean value is zero, if we are concerned only with independent measurements of one particular object. But if there are relations that have to be respected among objects (such as additivity of length), we can no longer guarantee that the minimum error principle will lead us to a mean error of zero.

The variance also must be taken to be larger, if we take the objects we are measuring to satisfy such conditions as additivity. These “analytic” constraints on length lead to the attribution of greater errors to the measurement of length than we would have without those constraints.

GENERALIZING

One of the factors that affects our measurements of length is temperature. If we take account of temperature, we find that we can improve the accuracy of our measurements of length. What we need to do is to take account of the changes of length of our standard bistick and of the object being measured as the temperature changes. One way to do this is to take a look at the thermometer on the wall and observe the length of its mercury column.

This suggests that we can’t measure length accurately without measuring temperature, and simultaneously that we can’t measure temperature without measuring length. (Of course we can measure temperature by means of a bimetalic strip connected to a spring and a dial, but then we are measuring around the circumference of a circle, or we can measure by means of a digital readout thermometer, but this involves even more theory than the law of thermal expansion.) However, when we read the thermometer, are we really measuring length? I do not mean to raise labyrinthian questions of intentionality. Whatever my intentions, whatever is going on in my mind, I am comparing one object (the mercury column) with another (its holder) that is divided into collinear and contiguous parts. If such comparisons are infected with some metaphysical disease, measurements of lengths and of temperatures will both be affected.

Clearly such relations do not undermine the possibility of taking account of temperature in making accurate measurements of length. Equally clearly, however, both the theory of length (its additivity) and the theory of thermal expansion (to a first approximation, its linearity) must be called on in determining the errors of measurement of both temperature and length. The problem of sorting out the errors in several kinds of measurements is discussed to some extent in Theory and measurement. It suffices here to observe that, for example, the direct measurement of a temperature is taken to be more subject to error than the measurement of length. (In fact it is not clear what is error. There has to be some indication that the law of linear thermal expansion is reasonably close to being true before we can even construct a thermometer as a way of measuring temperature.) However, as soon as we have the law of linear thermal expansion (that is, as soon as we have accepted it as part of our working practical/theoretical machinery), we must allow temperature to play a role in determining the distribution of errors of measurement of length.

This is not hard, because that role will in general be to reduce errors: If we take account of the temperature of our bistick and reduce its results to standard temperature (and similarly for the objects we are measuring), we find the results to be more closely clustered than they were otherwise. That is, we have succeeded in reducing the variance of the error distribution of length measurements. But just as in the case of adopting the principle that length is additive, the addition of a new constraint requires that we take account of more sources of error. If we measure two temperatures and two lengths, and take for granted a coefficient of thermal expansion, and the results do not fit the law of thermal expansion, error must be imputed both to the temperature measurements and to the length measurements.

If this sounds strange—to say that we are increasing error by adding a constraint (in the form of the law of thermal expansion) and decreasing error (by taking account of temperature in our measurements of length)—perhaps a more detailed treatment of the example will be helpful.

Instruments: a mercury thermometer and our bistick.
Objects: an assortment of rigid bodies composed of the same substance (different from that of the bistick) and having appropriate shapes; some are collinear juxtapositions of others.

Procedure 1: Ignore temperature. Measure an object a number of times with the bistick. Statistically infer a distribution of measurements with mean approximately m1, and variance approximately s12; assume that we need not bother with higher moments.

Conclusion 1: To minimize the sum of the squares of the errors of observation already made, take the true length of this object to be m1, and the distribution of error to be given by the same distribution, with the exception that it is displaced by an amount m1, so that its mean is 0. We could of course find ways of dividing these observations into groups: for example, those made on Mondays, those made on Tuesdays, and so on. However, we will (presumably) find no statistical justification for the hypothesis that these groups of observations come from different samples. (We can no doubt find “unnatural” groups of observations that would suggest that we are sampling from distinct populations; knowing what is unnatural and how to disregard it is a general problem of statistical inference and presents nothing special in the way of a problem for measurement.)

Procedure 2: Still ignoring temperature, we measure a lot of objects a lot of times. We assume that we can faultlessly reidentify objects. Thus, our measurements fall into clearcut groups, each identified with an object. Our measurements now have a general mean gm2 and a very large general variance gs22.

Conclusion 2: But now we have excellent statistical justification for taking our measurements to fall into different populations according to the object being measured. (In general, this is the case, although some pairs of objects may be so nearly the same length that the hypothesis that their measurements constitute a single population cannot be rejected by the data.) If we sort the data this way and then perform our statistical inferences in accord with the principle of minimizing past observational error, then we find that the general population of errors of measurement of each object has a distribution very much like the distribution of errors of measurement we uncovered in procedure 1: mean 0, variance about s12. In fact, if we think of error, what we have is a set of distributions, all characterized by the same variance. We have no reason to regard any subsets of our general sample as coming from different populations so far as our inference regarding the variance (or higher order moments, for that matter) is concerned. (Of course we might have such reasons; if our original set of objects included very big ones and very tiny ones, and it were much more difficult to measure these accurately, we would have reasons, internal to our collected statistics, for dividing our sample of measurements to take account of different sample variances characterizing these cases.) This variance, s22, is, we may suppose, about the same size as s12 but, in virtue of the last observation, more accurately known.

Procedure 3: Now let us assume that some of the objects in our set are (faultlessly identifiable as) collinear juxtapositions of others, and let us impose the constraint that the length of the juxtaposition of two objects is the sum of the lengths of the two objects. As in Procedure 2, we make a lot of measurements of a lot of things—let us assume in fact that we make the same number of measurements of the same things. The observational basis of our conclusion will be the same as that of Conclusion 2.

Conclusion 3: From our assumptions, it follows now that the sum of the true values corresponding to the set of measurements of object x and the true value corresponding to the set of measurements of object y is the true value corresponding to the set of measurements of the collinear juxtaposition of x and y. This is to say that the means of the populations of the three kinds of measurement must bear the corresponding relation, if the mean errors are zero. As before,, we minimize the sum of the squared errors of our actual (past) observations and, subject to this constraint, distribute our errors evenly. Since we have imposed this constraint at least some mean error estimates will not be 0. If the mean estimate is not 0, the variance will be larger than it was when the mean estimate was 0. However, we may still be able to use the whole population of performed measurements to estimate the variance of the error distribution, and overall the mean error will still be very close to zero. Thus we get a distribution of error that is roughly centered on 0 and has a variance s32, a little bit larger than s22. Taking account of a new constraint has required us to admit a wider range of error.

Procedure 4: Let us now suppose that we have a thermometer and that we know that the law of linear thermal expansion is true. To simplify matters, let us suppose that we also know the coefficient of thermal expansion of the substance of which all our objects (except for the bistick and the thermometer) are made. We also suppose ourselves to know the coefficient of thermal expansion of our bistick. Again let us make the same measurements, but now let us include with each measurement a temperature measurement. Assume that temperature measurement is without error.

Conclusion 4: Take t0 as the reference temperature. Now the true value of the length of an object is to be construed as its length at t0, and the error of measurement is as follows:

k · (tt0)TLML,

where k is the coefficient of thermal expansion, t is the temperature, TL is the true length, and ML is the measured length (corrected for temperature—i.e., M0 · k’ · (tt0), where k’ is the coefficient of thermal expansion of our bistick). Now we choose the true lengths of the objects measured so as to minimize the squared error of our observations, subject to two constraints as follows: the additivity constraint and the thermal expansion constraint. However, in view of the fact that some of the variance s32 of the third procedure was due variations in temperature, we may suppose that the variance s42 is significantly smaller than s32. The corrected readings are more closely packed about the mean values.

Procedure 5: Finally let us take account of the errors of measurement of temperature, just to show that it can be done. The circumstances are the same as in Procedure 4, except that we no longer assume that the measurement of temperature is error-free. The other data are the same as in Procedure 4.

Conclusion 5: Now our observations are construed as being of the following form:

TL · K · (Tt - t0 + te) = μ0· k’ · (Tt - t0 + te) + le,

where le is the error in the length measurement, te the error in the temperature measurement, and M0 the observed measurement—the reading. Following the earlier principles, we choose TL and Tt to minimize squared error of past observations and to distribute the error as evenly as possible in the two categories.

This last exercise suggests a further one that might be made in the interest of realism, that we have also a class L of prior length measurements, from which we derive a prior distribution of errors of measurement of length. Let S be the set of observations that, in the absence of the law of thermal expansion as applied to our test object, gives us the distribution of errors of measurements of temperature. Thus, L and S give prior distributions of the respective errors, and when we combine observations of length and of temperature through the law of thermal expansion, the magnitude of the samples on which these error distributions are based is relevant. It is the whole set of observations whose observational error must be minimized and not just the joint ones we happen to make at the moment.

Of course we may no longer have individual records of these prior measurement observations. However, we can still compute them approximately, working backwards from interval estimates at given levels of confidence to approximate numbers and distributions. With a lot of history, we simply let the established error distributions dominate others: What we learn about new properties of matter will not (any longer) be taken to affect the distribution of errors of measurement with a meterstick.

The generalized lesson is that it is only accepted theory that gives us errors of measurement, where, in “accepted theory,” I include everything from the additivity of length to relativity and biochemistry and even quantum mechanics. It is, therefore, only relative to accepted theory that we can derive a quantitative statistical theory of errors of measurement. And of course what theories are accepted depends on the results of quantitative tests—that is, on measurements—that is, in turn on the theory of measurement. Although there is no vicious circularity here, there is certainly something to be unpacked, analysed, and explained: the relationship between theories of error and the acceptability of theories. However, that does not concern the “measurement of errors of measurement;” it represents yet another “new” problem of induction.

We should also note that, despite the fact that, in some sense, the theory of errors of measurement of a certain sort—measurements of length performed by method M, say—depends on a variety of other theories, any well established method of measurement will be accompanied by a theory of error that is largely independent of recent results.

This is true for the three following reasons: First, current tests of theories involve more than a single quantity, and, as we saw in connection with the law of thermal expansion, the errors of measurement revealed by tests may be distributed among a number of different kinds of measurements. We may know more, on the basis of historical statistics, about the distribution of some of these errors than about the distribution of others. Thus, there may be much more room to impute error to the results of some measurements than to the results of others.

Second, if the test of a current theory yielded results that would require an unusual (particularly a strongly biassed) theory of errors of measurement, we would first note that the sample of errors obtained from those tests should be construed as being drawn from a different population than that from which the classical sample was drawn. This is based on internal statistical tests. We would then choose between our theory of measurement by method μ that implies that μ yields a population of errors that is homogeneous with respect to partitions defined by the particular theory T under investigation and the theory T itself. It is not surprising that we usually hang onto μ and reject T—although the logical warrant for doing so has yet to be spelled out.

Third, by far the greatest mass of data relates to older and simpler theories than those currently being tested. It thus statistically swamps the results of current tests. This is so much the case that we base our estimates of error on classically established error distributions. Often we can obtain the parameters of the population of errors produced by a given method, or even a given instrument, by testing the method or instrument against certain standard quantities, and do so on the basis of relatively small samples.

CONCLUSIONS

The measurement of error is not like other measurement. Ordinarily, in measurement, we associate a particular number (or magnitude) with a particular object— for example, a “reading” of a measuring instrument is associated with the object measured. There is nothing corresponding to this in the measurement of error. The result of “measuring error” is a statistical distribution of errors to be associated with a method (or kind of instrument or instrument) μ of making measurements. We do not generally observe or measure particular errors.

We might (absurdly but consistently) claim that all of our measurements were 100% accurate. That is, we might deny the existence of error. To maintain this, we would have to repudiate almost every bit of quantitative knowledge we take ourselves to have: not merely ordinary physics and engineering, but even such seemingly a priori principles as the transitivity “longer than.” (Perhaps the simplest way to do this is to assume that the world is in a causal temporal flux, and that there is no measuring of time.) All of our physical laws would then have to be regarded as statistical laws relating varying quantities.) It should be noted that this procedure would render all representational theories of measurement mere superstition. We would have no reason to suppose any stable structure behind our statistics for the theories to “represent.”

It is certainly more natural—and in accord with our qualitative knowledge of the world—to construe our measurements as measurements of an underlying reality. To do this, we must suppose the following two things: (a) that our measurements are prone to error, and (b) that the underlying reality has the structure we attribute to it. In order to use measurement, we must have some quantitative theory of errors of measurement. However, at the same time, the results of measurement surely are relevant to the structure we impute to the underlying reality.

Furthermore, it is not merely the internal structure of particular quantities (such as length) that is relevant to the derivation of the distribution of errors of measurement, but the structure of multiple quantities that are related by law—for example temperature and length as related by the law of thermal expansion, or pressure, volume, and temperature as related by the ideal gas law. As those who pursue the purely mathematical problems of representational measurement theory have realized—although mainly in connection with psychological magnitudes— conjoint measurement and scientific theory are intimately tied together. See, for example Narens (1985), Berka (1983), Krantz et al. (1971).

The statistical theories of error we emerge with depend not only on the structure we attribute to the magnitudes being measured but on the general structure of the world as captured by our assumed laws and theories connecting these magnitudes to other magnitudes.

In the actual process of evaluating the distributions of errors of measurement, this is easily seen. (See Jaech, 1985, for example, or Wilson, 1952.) What has been lacking is any work that brings the problem of measurement error to bear on the foundations of measurement. It is only the existence of error that allows us to believe in measurement, and it is only the knowledge of the distribution of error that allows us to use measurements in investigating and controlling the world.

REFERENCES

Barford, N. C. (1985). Experimental measurements: Precision, error, and truth (2nd ed.). New York: Wiley.

Berka, K. (1983). Measurement. Dordrecht: Reidel.

Campbell, ν. R. (1952). What is science? New York: Dover. Original work published 1921.

Campbell, N. R. (1957). Foundations of physics. New York: Dover. Original work published 1920.

Ellis, B. (1968). Basic concepts of measurement. Cambridge: Cambridge University Press.

Jaech, J. L. (1985). Statistical analysis of measurement errors. New York: Wiley.

Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement. New York: Academic Press.

Krantz, D. H., Luce, D. R., Suppes, P., & Tversky, A. (1989). Foundations of Measurement, II. New York: Academic Press.

Kyburg, H. E., Jr. (1984). Theory and measurement. Cambridge: Cambridge University Press.

Mann, H. B. (1949). Analysis and design of experiments. New York: Dover.

Narens, L. (1985). Abstract measurement theory. Cambridge: MIT Press.

Roberts, F. S. (1979). Measurement Theory. Reading, Massachusetts: Addison Wesley.

Wilson, E. B. (1952). An introduction to scientific research. New York: McGraw-Hill.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset