3.8 The first digit problem; invariant priors

3.8.1 A prior in search of an explanation

The problem we are going to consider in this section is not really one of statistical inference as such. What is introduced here is another argument that can sometimes be taken into account in deriving a prior distribution – that of invariance. To introduce the notion, we consider a population which appears to be invariant in a particular sense.

3.8.2 The problem

The problem we are going to consider in this section has a long history going back to Newcomb (1881). Recent references include Knuth (1969, Section 4.2.4B), Raimi (1976) and Turner (1987).

Newcomb’s basic observation, in the days where large tables of logarithms were in frequent use, was that the early pages of such tables tended to look dirtier and more worn than the later ones. This appears to suggest that numbers whose logarithms we need to find are more likely to have 1 as their first digit than 9. If you then look up a few tables of physical constants, you can get some idea as to whether this is borne out. For example, Whitaker’s Almanack (1988, p. 202) quotes the areas of 40 European countries (in square kilometres) as

28 778; 453; 83 849; 30 513; 110 912; 9251; 127 869; 43 069; 1399; 337 032; 547 026; 108 178; 248 577; 6; 131 944; 93 030; 103 000; 70 283; 301 225; 157; 2586; 316; 1; 40 844; 324 219; 312 677; 92 082; 237 500; 61; 504 782; 449 964; 41 293; 23 623; 130 439; 20 768; 78 772; 14 121; 5 571 000; 0.44; 255 804.

The first significant digits of these are distributed as follows:

Unnumbered Table

We will see that there are grounds for thinking that the distribution should be approximately as follows:

Unnumbered Table

3.8.3 A solution

An argument for this distribution runs as follows. The quantities we measure are generally measured in an arbitrary scale, and we would expect that if we measured them in another scale (thus in the case of the aforementioned example, we might measure areas in square miles instead of square kilometres), then the population of values (or at least of their first significant figures) would look much the same, although individual values would of course change. This implies that if θ is a randomly chosen constant, then for any fixed c the transformation

Unnumbered Display Equation

should leave the probability distribution of values of constants alone. This means that if the functional form of the density of values of θ is

Unnumbered Display Equation

then the corresponding density of values of  will be

Unnumbered Display Equation

Using the usual change-of-variable rule, we know that   , so that we are entitled to deduce that

Unnumbered Display Equation

But if  is any function such that  for all c and θ, then we may take  to see that  , so that  . It seems, therefore, that the distribution of constants that are likely to arise in a scientific context should, at least approximately, satisfy

Unnumbered Display Equation

Naturally, the reservations expressed in Section 2.5 on locally uniform priors about the use of improper priors as representing genuine prior beliefs over a whole infinite range still apply. But it is possible to regard the prior  for such constants as valid over any interval (a, b) where  which is not too large. So consider those constants between

Unnumbered Display Equation

Because

Unnumbered Display Equation

the prior density for constants θ between a and b is

Unnumbered Display Equation

and so the probability that such a constant has first digit d, that is, that it lies between da and (d+1)a, is

Unnumbered Display Equation

Since this is true for all values of k, and any constant lies between 10k and 10k+1 for some k, it seems reasonable to conclude that the probability that a physical constant has first digit d is approximately

Unnumbered Display Equation

which is the density tabulated earlier. This is sometimes known as Benford’s Law because of the work of Benford (1938) on this problem.

This sub-section was headed ‘A solution’ rather than ‘The solution’ because a number of other reasons for this density have been adduced. Nevertheless, it is quite an interesting solution. It also leads us into the whole notion of invariant priors.

It has been noted that falsified data is rarely adjusted so as to comply with Benford’s Law and this has been proposed as a method of detecting such data in, for example, clinical trials (see Weir and Murray, 2011). Recently Rauch et al. (2011) pointed out that deficit data reported to Eurostat by Greece demonstrated that Greek data relevant to the euro deficit criteria showed the greatest deviation from Benford’s Law, and that this fact should have given rise to suspicion.

Benford’s Law can be related to another empirical law, Zipf’s Law, originally proposed by Zipf (1935), states that the relative frequency of the kth most common word in a list of n words is approximately proportional to 1/k, so that the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, and so on. The relationship between this law and Benford’s Law is explored in Pietronero et al. (2001).

3.8.4 Haar priors

It is sometimes the case that your prior beliefs about a parameter θ are in some sense symmetric. Now when a mathematician hears of symmetry, he or she tends immediately to think of groups, and the notions aforementioned generalize very easily to general symmetry groups. If the parameter values θ can be thought of as members of an abstract group ΘΘ, then the fact that your prior beliefs about θ are not altered when the values of θ are all multiplied by the same value c can be expressed by saying that the transformation

Unnumbered Display Equation

should leave the probability distribution of values of the parameter alone. A density which is unaltered by this operation for arbitrary values of c is known as a Haar measure or, in this context, as a Haar prior or an invariant prior. Such priors are, in general, unique (at least up to multiplicative constants about which there is an arbitrariness if the priors are improper). This is just the condition used earlier to deduce Benford’s Law, except that  is now to be interpreted in terms of the multiplicative operation of the symmetry group, which will not, in general, be ordinary multiplication.

This gives another argument for a uniform prior for the mean θ of a normal distribution  of known variance, since it might well seem that adding the same constant to all possible values of the mean would leave your prior beliefs unaltered – there seems to be a symmetry under additive operations. If this is so, then the transformation

Unnumbered Display Equation

should leave the functional form of the prior density for θ unchanged, and it is easy to see that this is the case if and only if  is constant. A similar argument about the multiplicative group might be used about an unknown variance when the mean is known to produce the usual reference prior  . A good discussion of this approach and some references can be found in Berger (1985, Section 3.3.2).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset