8.3 The baseball example
Efron and Morris’s example on baseball statistics was outlined in Section 8.1. As their primary data, they take the number of times hits Si or equivalently the batting averages Yi=Si/n of r=18 major league players as they were recorded after n=45 times at bat in the 1970 season. These were, in fact, all the players who happened to have batted exactly 45 times the day the data were tabulated. If Xi and are as in Section 8.1, so that approximately
then we have a case of the hierarchical normal model. With the actual data, we have
and so with
the empirical Bayes estimator for the takes the form
so giving estimates
We can test how well an estimator performs by comparing it with the observed batting averages. We suppose that the ith player had Ti hits and was at bat mi times, so that his batting average for the remainder of the season was pi=Ti/mi. If we write
we could consider a mean square error
or more directly
In either case, it turns out that the empirical Bayes estimator appears to be about three and a half times better than the ‘obvious’ (maximum likelihood) estimator which ignores the hierarchical model and just estimates each by the corresponding Xi. The original data and the resulting estimates are tabulated in Table 8.1.
So there is evidence that in at least some practical case, use of the hierarchical model and a corresponding empirical Bayes estimator is genuinely worth while.