6.4 Conjugate prior for the bivariate regression model
6.4.1 The problem of updating a regression line
In Section 6.3, we saw that with the regression line in the standard form the joint posterior is
For reasons that will soon emerge, we denote the quantities derived from the data with a prime as , , , , , , , etc. In the example on rainfall, we found that and
Now suppose that we collect further data, thus
If this had been all the data available, we would have constructed a regression line based on data for years, with
If, however, we had all 16 years data, then the regression line would have been based on data for n = 16 years resulting in
6.4.2 Formulae for recursive construction of a regression line
By the sufficiency principle it must be possible to find , , b, etc., from , , , etc., and , , , etc. It is in fact not too difficult to show that n, and are given by
and that if we define
then Sxx, b and See are given by
Of these formulae, the only one that is at all difficult to deduce is the last, which is established thus
(it is easily checked that there is no term Scee). However,
giving the result.
With the data in the example, it turns out that and (a weighted mean of and ) is 60.3, and similarly . Moreover
so that
in accordance with the results quoted earlier obtained by considering all 16 years together.
6.4.3 Finding an appropriate prior
In the aforementioned analysis, our prior knowledge could be summarized by saying that if the regression line is put in the form
then
We then had observations (denoted and ) that resulted in a posterior which is such that if the regression line is put in the form
then
This, of course, gives a way of incorporating prior information into a regression model provided that it can be put into the form which occurs above. It is, however, often quite difficult to specify prior knowledge about a regression line unless, as in the case above, it is explicitly the result of previous data. Appropriate questions to ask to fix which prior of the class to use are as follows:
As noted above, it is difficult to believe that this process can be carried out in a very convincing manner, although the first three steps do not present as much difficulty as the last two. However, the case where information is received and then more information of the same type is used to update the regression line can be useful.
It is of course possible (and indeed simpler) to do similar things with the correlation coefficient.