8.4 The Stein estimator

This section is about an aspect of classical statistics which is related to the aforementioned discussion, but an understanding of it is by no means necessary for developing a knowledge of Bayesian statistics per se. The Bayesian analysis of the hierarchical normal model is continued in Section 8.5.

One of the most puzzling and provocative results in classical statistics in the past half century was Stein’s startling discovery (see Stein, 1956, and James and Stein, 1961) that the ‘obvious’ estimator  of the multivariate normal mean is inadmissible if  . In fact if c is any constant with

Unnumbered Display Equation

then

Unnumbered Display Equation

dominates  . The best value of c is r–2, leading to the James–Stein estimator

Unnumbered Display Equation

Because it may be considered as a weighted mean of  and  , it is often called a shrinkage estimator which ‘shrinks’ the ordinary estimator  towards  , despite the fact that if S1< r–2 it ‘shrinks’ past  . Note, incidentally, that points which are initially far from  are little affected by this shrinkage. Of course, this ties in with the results of Section 8.3 because the James–Stein estimator  has turned out to be just the same as the empirical Bayes estimator  .

In fact, it can be shown that the risk of  is

Unnumbered Display Equation

The expectation on the right-hand side depends on  and  , but as  it must be non-negative, so that  dominates  , that is,

Unnumbered Display Equation

for all  . It turns out that S1 has a distribution which depends solely on r and the quantity

Unnumbered Display Equation

a fact which can be proved by considering an orthogonal transformation of the variates  to variates Wi such that

Unnumbered Display Equation

Evidently if  then  , and in general we say that S1 has a non-central chi-squared distribution on r degrees of freedom with non-centrality parameter  . We denote this by

Unnumbered Display Equation

It is fairly obvious that as  typical values of  will tend to infinity and we will get

Unnumbered Display Equation

whereas when  the variate S1 has a central  distribution on r degrees of freedom (no parameters are estimated), so

Unnumbered Display Equation

and hence

Unnumbered Display Equation

which, particularly for large values of r, is notably less than the risk of the obvious estimator.

In the particular case where the arbitrary origin is taken at 0 the James–Stein estimator takes the form

Unnumbered Display Equation

but it is important to note that this is only a special case.

Variants of the James–Stein estimator have been derived. For example, if c is any constant with

Unnumbered Display Equation

then

Unnumbered Display Equation

dominates  , this time provided  (loss of one dimension as a result of estimating a mean is something we are used to in statistics). The best value of c in this case is k–3, leading to the Efron–Morris estimator

Unnumbered Display Equation

In this case the ‘shrinkage’ is towards the overall mean.

In the case of the Efron–Morris estimator, it can be shown (see Lehmann, 1983, Section 4.6) that the risk of  is

Unnumbered Display Equation

Since S has a central  distribution on r–1 degrees of freedom,

Unnumbered Display Equation

and hence

Unnumbered Display Equation

which, particularly for large values of r, is again notably less than the risk of the obvious estimator.

When we consider using such estimates in practice we encounter the ‘speed of light’ rhetorical question,

Do you mean that if I want to estimate tea consumption in Taiwan, I will do better to estimate simultaneously the speed of light and the weight of hogs in Montana?

The question then arises as to why this happens. Stein’s own explanation was that the sample distance squared of  from  , that is  , overestimates the squared distance of  from  and hence that the estimator  could be improved by bringing it nearer  (whatever  is). Following an idea due to Lawrence Brown, the effect was illustrated as shown in Figure 8.1 in a paper by Berger (1980, Figure 2, p. 736).

Figure 8.1 Shrinkage estimators.

ch08fig001.eps

The four points  ,  ,   represent a spherical distribution centred at  .

Consider the effect of shrinking these points as shown. The points  and  move, on average, slightly further away from  , but the points  and  move slightly closer (while distant points hardly move at all). In three dimensions, there are a further two points (not on the line between  and  ) that are shrunk closer to  .

Another explanation that has been offered is that  can be viewed as a ‘pre-test’ estimator: if one performs a preliminary test of the hypothesis that  and then uses  or  depending on the outcome of the test, then the resulting estimator is a weighted average of  and  of which  is a smoothed version, although why this particular smoothing is to be used is not obvious from this chain of reasoning (cf. Lehmann, 1983, Section 4.5).

8.4.1 Evaluation of the risk of the James–Stein estimator

We can prove that the James–Stein estimator has the risk quoted earlier, namely

Unnumbered Display Equation

[An alternative approach can be found in Lehmann (1983, Sections 4.5 and 4.6).]  We proceed by writing

Unnumbered Display Equation

where the expectations are over repeated sampling for fixed  . The function g depends on  alone by spherical symmetry about  . Similarly, the function h depends on  alone since  . We note that because the unconditional distribution of S1 is  , we have

Unnumbered Display Equation

the expectation being taken over values of  or over values of  , that is,

Unnumbered Display Equation

using the result at the very end of Section 8.2 and bearing in mind that  . Now writing k=gh we have

Unnumbered Display Equation

and hence

Unnumbered Display Equation

for all  , which can happen only if k vanishes identically by the uniqueness of Laplace transforms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset