6.1 Theory of the correlation coefficient

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.1.1 Definitions

The standard measure of association between two random variables, which was first mentioned in Section 1.5 on ‘Means and Variances’, is the correlation coefficient

It is used to measure the strength of linear association between two variables, most commonly in the case where it might be expected that both have, at least approximately, a normal distribution. It is most important in cases where it is not thought that either variable is dependent on the other. One example of its use would be an investigation of the relationship between the height and the weight of individuals in a population, and another would be in finding how closely related barometric gradients and wind velocities were. You should, however, be warned that it is very easy to conclude that measurements are closely related because they have a high correlation, when, in fact, the relationship is due to their having a common time trend or a common cause and there is no close relationship between the two (see the relationship between the growth of money supply and Scottish dysentery as pointed out in a letter to The Times dated 6 April 1977). You should also be aware that two closely related variables can have a low correlation if the relationship between them is highly non-linear.

We suppose, then, that we have a set of n ordered pairs of observations, the pairs being independent of one another but members of the same pair being, in general, not independent. We shall denote these observations (xi, yi) and, as usual, we shall write and . Further, suppose that these pairs have a bivariate normal distribution with

and we shall use the notation

(Sxx and Syy have previously been denoted Sx and Sy), and

It is also useful to define the sample correlation coefficient r by

so that .

We shall show that, with standard reference priors for λ, μ, and , a reasonable approximation to the posterior density of ρ is given by

where is its prior density. Making the substitution

we will go on to show that after another approximation

These results will be derived after quite a complicated series of substitutions [due to Fisher (1915, 1921)]. Readers who are prepared to take these results for granted can omit the rest of this section.

6.1.2 Approximate posterior distribution of the correlation coefficient

As before, we shall have use for the formulae

and also for a similar one not used before

Now the (joint) density function of a single pair (x, y) of observations from a bivariate normal distribution is

where

and hence the joint density of the vector is

Unnumbered Display Equation

where

It follows that the vector is sufficient for . For the moment, we shall use independent priors of a simple form. For λ, μ, and , we shall take the standard reference priors, and for the moment we shall use a perfectly general prior for ρ, so that

and hence

Unnumbered Display Equation

The last factor is evidently the (joint) density of λ and μ considered as bivariate normal with means and , variances and and correlation ρ. Consequently it integrates to unity, and so as the first factor does not depend on λ or μ

To integrate and out, it is convenient to define

so that and . The Jacobian is

and hence

Unnumbered Display Equation

where

The substitution (so that ) reduces the integral over to a standard gamma function integral, and hence we can deduce that

Finally, integrating over ω

By substituting for ω it is easily checked that the integral from 0 to 1 is equal to that from 1 to , so that as constant multiples are irrelevant, the lower limit of the integral can be taken to be 1 rather than 0.

By substituting , the integral can be put in the alternative form

The exact distribution corresponding to has been tabulated in David (1954), but for most purposes it suffices to use an approximation. The usual way to proceed is by yet a further substitution, in terms of u where , but this is rather messy and gives more than is necessary for a first-order approximation. Instead, note that for small t

while the contribution to the integral from values where t is large will, at least for large n, be negligible. Using this approximation

Unnumbered Display Equation

On substituting

the integral is seen to be proportional to

Since the integral in this last expression does not depend on ρ, we can conclude that

Although evaluation of the constant of proportionality would still require the use of numerical methods, it is much simpler to calculate the distribution of ρ using this expression than to have to evaluate an integral for every value of ρ. In fact, the approximation is quite good [some numerical comparisons can be found in Box and Tiao (1992, Section 8.4.8)].

6.1.3 The hyperbolic tangent substitution

Although the exact mode does not usually occur at , it is easily seen that for plausible choices of the prior , the approximate density derived earlier is greatest when ρ is near r. However, except when r = 0, this distribution is asymmetrical. Its asymmetry can be reduced by writing

so that and

It follows that

If n is large, since the factor does not depend on n, it may be regarded as approximately constant over the range over which is appreciably different from zero, so that

Finally put

and note that if ζ is close to z then . Putting this into the expression for and using the exponential limit

so that approximately , or equivalently

A slightly better approximation to the mean and variance can be found by using approximations based on the likelihood as in Section 3.10. If we take a uniform prior for ρ or at least assume that the prior does not vary appreciably over the range of values of interest, we get

Unnumbered Display Equation

We can now approximate ρ by r (we could write and so get a better approximation, but it is not worth it). We can also approximate by n, so getting the root of the likelihood equation as

Further

so that again approximating ρ by r, we have at

It follows that the distribution of ζ is given slightly more accurately by

This approximation differs a little from that usually given by classical statisticians, who usually quote the variance as (n–3)–1, but the difference is not of great importance.

6.1.4 Reference prior

Clearly, the results will be simplest if the prior used has the form

for some c. The simplest choice is to take c = 0, that is, a uniform prior with , and it seems quite a reasonable choice. It is possible to use the multi-parameter version of Jeffreys’ rule to find a prior for , though it is not wholly simple. The easiest way is to write for the covariance and to work in terms of the inverse of the variance–covariance matrix, that is, in terms of where

It turns out that , where Δ is the determinant , and that the Jacobian determinant

so that . Finally, transforming to the parameters that are really of interest, it transpires that

which corresponds to the choice and the standard reference priors for and .

6.1.5 Incorporation of prior information

It is not difficult to adapt the aforementioned analysis to the case where prior information from the conjugate family [i.e. inverse chi-squared for and and of the form for ρ] is available. In practice, this information will usually be available in the form of previous measurements of a similar type and in this case it is best dealt with by transforming all the information about ρ into statements about so that the theory we have built up for the normal distribution can be used.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6.1 Theory of the correlation coefficient

Create new playlist

Sign In

Sign Up

Table of Contents for
6.1 Theory of the correlation coefficient