6.4 Conjugate prior for the bivariate regression model

6.4.1 The problem of updating a regression line

In Section 6.3, we saw that with the regression line in the standard form  the joint posterior is

Unnumbered Display Equation

For reasons that will soon emerge, we denote the quantities derived from the data with a prime as  ,  ,  ,  ,  ,  ,  , etc. In the example on rainfall, we found that  and

Unnumbered Display Equation

Now suppose that we collect further data, thus

Unnumbered Display Equation

If this had been all the data available, we would have constructed a regression line based on data for  years, with

Unnumbered Display Equation

If, however, we had all 16 years data, then the regression line would have been based on data for n = 16 years resulting in

Unnumbered Display Equation

6.4.2 Formulae for recursive construction of a regression line

By the sufficiency principle it must be possible to find  ,  , b, etc., from  ,  ,  , etc., and  ,  ,  , etc. It is in fact not too difficult to show that n,  and  are given by

Unnumbered Display Equation

and that if we define

Unnumbered Display Equation

then Sxx, b and See are given by

Unnumbered Display Equation

Of these formulae, the only one that is at all difficult to deduce is the last, which is established thus

Unnumbered Display Equation

(it is easily checked that there is no term Scee). However,

Unnumbered Display Equation

giving the result.

With the data in the example, it turns out that  and  (a weighted mean of  and  ) is 60.3, and similarly  . Moreover

Unnumbered Display Equation

so that

Unnumbered Display Equation

in accordance with the results quoted earlier obtained by considering all 16 years together.

6.4.3 Finding an appropriate prior

In the aforementioned analysis, our prior knowledge could be summarized by saying that if the regression line is put in the form

Unnumbered Display Equation

then

Unnumbered Display Equation

We then had observations (denoted  and  ) that resulted in a posterior which is such that if the regression line is put in the form

Unnumbered Display Equation

then

Unnumbered Display Equation

This, of course, gives a way of incorporating prior information into a regression model provided that it can be put into the form which occurs above. It is, however, often quite difficult to specify prior knowledge about a regression line unless, as in the case above, it is explicitly the result of previous data. Appropriate questions to ask to fix which prior of the class to use are as follows:

1. What number of observations is my present knowledge worth? Write the answer as  .
2. What single point is the regression line most likely to go through? Write the answer as  .
3. What is the best guess as to the slope of the regression line? Write the answer as  .
4. What is the best guess as to the variance of the observation yi about the regression line? Write the answer as  and find  as  .
5. Finally make  such that the estimated variances for the slope β and the intercept α are in the ratio  to  .

As noted above, it is difficult to believe that this process can be carried out in a very convincing manner, although the first three steps do not present as much difficulty as the last two. However, the case where information is received and then more information of the same type is used to update the regression line can be useful.

It is of course possible (and indeed simpler) to do similar things with the correlation coefficient.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset