For the modeling and evaluation step, we will focus on three tasks. The first is to produce a univariate forecast model applied to just the surface temperature. The second is developing a regression model of the surface temperature based on itself and carbon emissions. Finally, we will try and discover if emissions Granger-cause the surface temperature anomalies.
With this task, the objective is to produce a univariate forecast for the surface temperature, focusing on choosing either a Holt linear trend model or an ARIMA model. As discussed previously, the temperature anomalies start to increase around 1970. Therefore, I recommend looking at it from this point to the present. The following code creates the subset and plots the series:
> T2 = window(T, start=1970) > plot(T2)
Our train
and test
sets will be through 2007, giving us eight years of data to evaluate for the selection. Once again, the window()
function allows us to do this in a simple fashion, as follows:
> train = window(T2,end=2007) > test = window(T2,start=2008)
To build our smoothing model, we will use the holt()
function found in the forecast
package. We will build two models, one with and one without a damped trend. In this function, we will need to specify the time series, number of forecast periods as h=…, and method to select the initial state values, either "optimal"
or "simple"
, and if we want a damped trend, then damped=TRUE
. Optimal finds the values along with the smoothing parameters, while simple uses the first few observations. Now, in the forecast
package, you can use the ets()
function, which will find all the optimal parameters. However, in our case, let's stick with holt()
so that we can compare methods. Now, moving on to the holt
model without a damped trend, as follows:
> fit.holt=holt(train, h=8, initial="optimal") > summary(fit.holt) Forecast method: Holt's method Model Information: ETS(A,A,N) Call: holt(x = train, h = 8, initial = "optimal") Smoothing parameters: alpha = 0.0271 beta = 0.0271 Initial states: l = -0.1464 b = 0.0109 sigma: 0.0958 AIC AICc BIC -32.03529 -30.82317 -25.48495 Error measures: MAPE 87.56256 Forecasts: Point Forecast 2008 0.5701693 2009 0.5951016 2010 0.6200340 2011 0.6449664 2012 0.6698988 2013 0.6948311 2014 0.7197635 2015 0.7446959
This is quite a bit of output, and for brevity, I've even eliminated all the error measures other than MAPE
and deleted the 80 and 95 percent confidence intervals. You can see these along with the parameters and Initial states
. We can also plot forecast
and see how well it did, out-of-sample:
> plot(forecast(fit.holt)) > lines(test, type="o")
Looking at the plot, it seems that this forecast overshot the mark a little bit. Let's have a go by including the damped
trend, as follows:
> fit.holtd=holt(train, h=8, initial="optimal", damped=TRUE) > summary(fit.holtd) Forecast method: Damped Holt's method Model Information: ETS(A,Ad,N) Call: holt(x = train, h = 8, damped = TRUE, initial = "optimal") Smoothing parameters: alpha = 1e-04 beta = 1e-04 phi = 0.98 Initial states: l = -0.2277 b = 0.0266 sigma: 0.0986 AIC AICc BIC -27.86479 -25.98979 -19.67686 MAPE Training set 120.6198 Forecasts: Point Forecast 2008 0.4812987 2009 0.4931311 2010 0.5047266 2011 0.5160901 2012 0.5272261 2013 0.5381393 2014 0.5488340 2015 0.5593147
Notice in the output that it now includes the phi
parameter for the trend dampening. Additionally, you can see that the point forecasts are lower in the dampened method but MAPE
is higher. Let's examine again how it performs out-of-sample, as follows:
> plot(forecast(fit.holtd), "Holt Damped") > lines(test, type="o")
The following is the output of the preceding command:
Looking at the plot, you can see that the damped
method performed better on the test
set. Finally, for the ARIMA
models, you can use auto.arima()
again from the forecast
package. There are many options that you can specify in the function or you can just include your time series data and it will find the best ARIMA
fit:
> fit.arima = auto.arima(train) > summary(fit.arima) Series: train ARIMA(2,1,0) Coefficients: ar1 ar2 -0.5004 -0.2947 s.e. 0.1570 0.1556 sigma^2 estimated as 0.01301: log likelihood=27.65 AIC=-49.3 AICc=-48.58 BIC=-44.47 Training set error measures: MAPE Training set 115.9148
The output shows that the model selected is an AR-2, I-1, or ARIMA(2,1,0)
. The AR coefficients are produced, and again, I've abbreviated the output including only the error measure of MAPE
, which is slightly better than the damped
trend in the Holt model. We can examine the test
data in the same fashion; just remember to include the number of the forecast
periods, as follows:
> plot(forecast(fit.arima, h=8)) > lines(test, type="o")
Interestingly, the forecast shows a relatively flat trend. To examine MAPE
on the test
set, run the following code:
> mape1 = sum(abs((test-fit.holtd$mean)/test))/8 > mape1 [1] 0.1218026 > mape2 = sum(abs((test-forecast(fit.arima)$mean)/test))/8 > mape2 [1] 0.1312118
The forecast error is indeed slightly less for the Holt Damped trend model versus ARIMA(2,1,0)
. Notice that the code to pull in the forecast values is slightly different for the ARIMA models produced with auto.arima()
.
With the statistical and visual evidence, it seems that the best choice for a univariate forecast model is the Holt's method with a damped trend. The final thing that we can do is examine a plot with all the three forecasts side-by-side. To help with the visualization, we will start the actual data from 1990 onwards. The actual data will form the basis of the plots and the forecast will be added as lines with different line types (lty
) and different plot symbols (pch
). On the base plot, notice that the y axis limits (ylim
) have to be set, otherwise the Holt forecast will be off the chart:
> T3=window(T2, start=1990) > plot(T3, ylim=c(0.1,0.8)) > lines(forecast(fit.holt)$mean, type="o",pch=2,lty="dotted") > lines(forecast(fit.holtd)$mean, type="o",pch=5,lty=6) > lines(forecast(fit.arima,h=8)$mean, type="o",pch=7,lty="dashed") > legend("topleft", lty=c("solid","dotted","dashed"), pch=c(1,2,5,7), c("Data","Holt","HoltDamped","ARIMA"))
The output of the preceding code snippet is as follows:
With this, we completed the building of a univariate forecast model for the surface temperature anomalies and now we will move on to the next task.
In this second task of the modeling effort, we will apply the techniques to the climate change data. We will seek to predict the surface temperature anomalies using lags of itself and lags of emissions.
For starters, we will just build a linear model without using the lags in order to examine the serial correlation of the residuals with the lm()
function. The other thing to do is to create an appropriate timeframe to examine. Recall that we saw the CO2 emissions gradually increase around the end of World War II. Therefore, let's start the data in 1945, once again using the window()
function and applying it to the climate data:
> y = window(climate[,1],start=1945) > x = window(climate[,2],start=1945)
With this done, we can build the linear model and examine it:
> fit.lm = lm(y~x) > summary(fit.lm) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -0.35257 -0.08782 0.00224 0.09732 0.27931 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.786e-01 3.890e-02 -7.161 9.02e-10 *** x 8.082e-05 7.374e-06 10.960 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1357 on 65 degrees of freedom Multiple R-squared: 0.6489 Adjusted R-squared: 0.6435 F-statistic: 120.1 on 1 and 65 DF, p-value: < 2.2e-16
You can see that F-statistic
for the overall model is highly statistically significant (<2.2e-16
) and that the x
variable (CO2) is also highly significant. Adjusted R-squared
is 0.6435
. Our work here is not done as we still need to check the assumption of no correlation in the residuals. Two plots can provide the necessary insight; the first being plot.ts()
, which will provide a time series plot of the residuals and also the autocorrelation plot that we used previously:
> plot.ts(fit.lm$residuals)
The output of the preceding command is as follows:
Note that there is a possible cyclical pattern in the residuals over time. To show conclusively that the model violates the assumption of no serial correlation, let's have a look at the following autocorrelation plot:
> acf(fit.lm$residuals)
You can clearly see that the first eight lags have significant spikes (correlation) and we can dismiss the assumption that the residuals do not have serial correlation. This is a classic example of the problem of looking solely at linear relationships of two time series without considering the lagged values.
Another thing available is Durbin-Watson test
. This tests the null hypothesis that the residuals have zero autocorrelation. The test is available in the lmtest
package, which automatically gets loaded with the forecast
package. You can either specify your own linear model in the function or the object that contains a model; in our case, fit.lm
, which you can see leads us to reject the null hypothesis and conclude that true autocorrelation is greater than 0
, as follows:
> dwtest(fit.lm) Durbin-Watson test data: fit.lm DW = 0.8198, p-value = 1.73e-08 alternative hypothesis: true autocorrelation is greater than 0
Having done this, where do we start in constructing a meaningful lag structure for the building of the model? Probably our best bet is to look at the cross correlation structure again, which is as follows:
> ccf(x,y)
The following is the output of the preceding command:
We have significant correlations of the lags of x
through lag 15
. Applying some judgment here (along with much trial and error on my part), let's start by looking at six lags of x
and lag-1 and lag-4 of y
in the regression model. A convenient way to do this is to use the dynamic linear regression package called dynlm
. The only function available in the package is dynlm()
; however, it offers quite a bit of flexibility in building models. The syntax of dynlm()
follows the same procedure of lm()
, but allows the inclusion of lag terms, seasonal terms, trends, and even harmonic patterns. To incorporate the lag terms as we want in our model, it will be necessary to specify the lags using L()
in the function. For instance, if we wanted to regress the temperature by emissions and lag-1 emissions, the syntax would be y~x+L(x,1:6)
. Note that in L()
, the variable and its lag is all that you need to specify. Here is how to build and examine the model with the first six lags of x
and the first and fourth lag of y
:
> fit.dyn = dynlm(y~x+L(x,1:6)+L(y,c(1,4))) > summary(fit.dyn) Time series regression with "ts" data: Start = 1951, End = 2011 Call: dynlm(formula = y ~ x + L(x, 1:6) + L(y, c(1, 4))) Residuals: Min 1Q Median 3Q Max -0.241333 -0.049877 -0.000018 0.065519 0.155488 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.604e-02 5.417e-02 -1.034 0.305785 x -4.944e-05 1.235e-04 -0.400 0.690621 L(x, 1:6)1 -2.556e-05 2.056e-04 -0.124 0.901545 L(x, 1:6)2 3.110e-04 2.112e-04 1.472 0.147074 L(x, 1:6)3 -2.798e-04 2.296e-04 -1.218 0.228667 L(x, 1:6)4 2.086e-04 2.447e-04 0.852 0.398061 L(x, 1:6)5 -5.687e-04 2.393e-04 -2.377 0.021262 * L(x, 1:6)6 4.319e-04 1.382e-04 3.125 0.002928 ** L(y, c(1, 4))1 3.859e-01 1.079e-01 3.578 0.000769 *** L(y, c(1, 4))4 3.957e-01 1.085e-01 3.649 0.000619 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1009 on 51 degrees of freedom Multiple R-squared: 0.8377,Adjusted R-squared: 0.8091 F-statistic: 29.26 on 9 and 51 DF, p-value: < 2.2e-16
So, we have a highly significant p-value
for the overall model and Adjusted R-squared
of 0.8091
. Looking at the p-values for the coefficients, we have p-values less than 0.05
with both the lags of y
(temperature) and lag five and six of x
(emissions). We can adjust the model by dropping the insignificant x
lags and then test the assumption of no serial correlation, as follows:
> fit.dyn2 = dynlm(y~L(x,c(5,6))+L(y,c(1,4))) > summary(fit.dyn2) Time series regression with "ts" data: Start = 1951, End = 2011 Call: dynlm(formula = y ~ L(x, c(5, 6)) + L(y, c(1, 4))) Residuals: Min 1Q Median 3Q Max -0.220798 -0.054835 0.005527 0.079318 0.172035 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.0367531 0.0480225 -0.765 0.447288 L(x, c(5, 6))5 -0.0002963 0.0001217 -2.434 0.018157 * L(x, c(5, 6))6 0.0003224 0.0001222 2.638 0.010765 * L(y, c(1, 4))1 0.4238987 0.1043383 4.063 0.000153 *** L(y, c(1, 4))4 0.3735944 0.1047414 3.567 0.000749 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1006 on 56 degrees of freedom Multiple R-squared: 0.823,Adjusted R-squared: 0.8104 F-statistic: 65.1 on 4 and 56 DF, p-value: < 2.2e-16
Our overall model is again significant and Adjusted R-squared
has improved slightly as we dropped the irrelevant terms. The lagged coefficients are all positive with the exception of lag-5 of x
. Examining the residual plots, we end up with the following plots:
> plot(fit.dyn2$residuals)
And, then the ACF plot of the residuals:
> acf(fit.dyn2$residuals)
The ouput of the preceding command is as follows:
The first plot does not show any obvious pattern in the residuals. The acf
plot does show the slightest significant correlation at a couple of lags, but they are so minor that we can choose to ignore them. Now, if you exclude lag-4 of y
, the acf
plot will show a rather significant spike at lag-4. I'll let you try this out for yourself. Let's wrap this up with Durbin-Watson test
, which does not lead to a rejection of the null hypothesis:
> dwtest(fit.dyn2) Durbin-Watson test data: fit.dyn2 DW = 1.9525, p-value = 0.3018 alternative hypothesis: true autocorrelation is greater than 0
With this model, we can say that the linear model to predict the surface temperature anomalies is the following:
y = -0.37 - 0.000296*lag5(x) + 0.000322*lag6(x) + 0.424*lag1(y) + 0.374*lag4(y)
Interestingly, lag6
of x
has a slight positive effect on y
but is nearly cancelled out by lag5
of x
. Let's plot the Actual versus Predicted values to get a visual sense of how well the model performs:
> plot(y, ylab="Surface Temperature") > lines(fitted(fit.dyn2), pch=2, lty="dashed") > legend("topleft", lty=c("solid","dashed"), c("Actual","Predicted"))
The output of the preceding command is as follows:
I would like to point out one thing here. If we had started in 1970 as we did with the univariate forecast and incorporated a linear trend in the model, the emissions' lags would not have been significant. So, by starting earlier, we did capture significant lags of the emissions. But what does it all mean? For our purpose in trying to find a link between human CO2 emissions and global warming, it doesn't seem to amount to much of anything. Now, let's turn our attention to trying to prove the statistical causality between the two.
For this chapter, this is where I think the rubber meets the road and we will separate causality from mere correlation. Well, statistically speaking anyway. This is not the first time that this technique has been applied to the problem. Triacca (2005) found no evidence to suggest that atmospheric CO2 Granger-caused the surface temperature anomalies. On the other hand, Kodra (2010) concluded that there is a causal relationship but put forth the caveat that their data was not stationary even after a second-order differencing. While this effort will not settle the debate, it will hopefully inspire you to apply the methodology in your personal endeavors. The topic at hand certainly provides an effective training ground to demonstrate Granger causality.
The plan here is to go with the data starting in 1945 as we did with the bivariate regression. To explore the issues that Kodra (2010) had, we will need to see if and how we can make the data stationary. To do this, the forecast
package provides the ndiffs()
function, which provides you with an output that spells out the minimum number of differences needed to make the data stationary. In the function, you can specify which test out of the three available ones you would like to use: Kwiatkowski, Philips, Schmidt & Shin (KPSS), ADF, or Philips-Peron (PP). I will use KPSS in the following code, which has a null hypothesis that the data is stationary. If the null hypothesis is rejected, the function will return the number of differences in order to achieve stationarity. Note that adf
and pp
have the null hypothesis that the data is not stationary.
> ndiffs(x, test="kpss") [1] 1 > ndiffs(y, test="kpss") [1] 1
In both the cases, the first-order differencing will achieve stationarity, allowing us to perform Granger causality with a high degree of confidence. To get started, we will put both the time series into one dataset and then create the first-order differenced series, as follows:
> Granger = cbind(y,x) > dGranger = diff(Granger)
It is now a matter of determining the optimal lag structure based on the information criteria using vector autoregression. This is done with the VARselect
function in the vars
package. You only need to specify the data and number of lags in the model using lag.max=x
in the function. Let's use a maximum of 10
lags:
> lag=VARselect(dGranger, lag.max=10)
The information criteria can be called using lag$selection
. Four different criteria are provided including AIC, Hannan-Quinn Criterion (HQ), Schwarz-Bayes Criterion (SC), and FPE. Note that AIC and SC are covered in Chapter 2, Linear Regression – The Blocking and Tackling of Machine Learning, so I will not go over the criterion formulas or differences here. If you want to see the actual results, you can use lag$criteria
:
> lag$selection AIC(n) HQ(n) SC(n) FPE(n) 5 1 1 5
We can see that AIC
and FPE
have selected lag 5
as the optimal structure to a VAR
model. We can forgo lag-1 as it doesn't seem to make sense in the world of climate change while a lag of 5
years does. Therefore, we will examine a lag of 5
using the var()
function and the results:
> lag5 = VAR(dGranger, p=5) > summary(lag5) VAR Estimation Results: ========================= Endogenous variables: y, x Deterministic variables: const Sample size: 61 Log Likelihood: -310.683 Roots of the characteristic polynomial: 0.8497 0.8183 0.8183 0.8108 0.8108 0.7677 0.7499 0.7499 0.7076 0.7076 Call: VAR(y = dGranger, p = 5) Estimation results for equation y: ================================== y = y.l1 + x.l1 + y.l2 + x.l2 + y.l3 + x.l3 + y.l4 + x.l4 + y.l5 + x.l5 + const Estimate Std. Error t value Pr(>|t|) y.l1 -4.992e-01 1.272e-01 -3.925 0.000266 *** x.l1 -1.268e-04 1.245e-04 -1.019 0.313027 y.l2 -5.057e-01 1.409e-01 -3.589 0.000754 *** x.l2 2.570e-04 1.367e-04 1.879 0.066018 . y.l3 -4.174e-01 1.455e-01 -2.868 0.006030 ** x.l3 -7.257e-05 1.448e-04 -0.501 0.618358 y.l4 3.467e-02 1.417e-01 0.245 0.807735 x.l4 1.511e-04 1.489e-04 1.014 0.315328 y.l5 -2.015e-01 1.285e-01 -1.568 0.123245 x.l5 -4.041e-04 1.383e-04 -2.922 0.005208 ** const 4.762e-02 2.768e-02 1.720 0.091542 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1022 on 50 degrees of freedom Multiple R-Squared: 0.4481,Adjusted R-squared: 0.3378 F-statistic: 4.06 on 10 and 50 DF, p-value: 0.00041 Estimation results for equation x: ================================== x = y.l1 + x.l1 + y.l2 + x.l2 + y.l3 + x.l3 + y.l4 + x.l4 + y.l5 + x.l5 + const Estimate Std. Error t value Pr(>|t|) y.l1 -73.67538 141.97873 -0.519 0.60611 x.l1 0.35225 0.13896 2.535 0.01442 * y.l2 -221.78216 157.29843 -1.410 0.16475 x.l2 -0.06238 0.15267 -0.409 0.68457 y.l3 -121.46591 162.47887 -0.748 0.45822 x.l3 0.24408 0.16161 1.510 0.13725 y.l4 -251.22176 158.22613 -1.588 0.11865 x.l4 -0.21250 0.16627 -1.278 0.20714 y.l5 -170.93505 143.49020 -1.191 0.23917 x.l5 0.05856 0.15438 0.379 0.70604 const 87.31476 30.89869 2.826 0.00676 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 114.1 on 50 degrees of freedom Multiple R-Squared: 0.2281,Adjusted R-squared: 0.07366 F-statistic: 1.477 on 10 and 50 DF, p-value: 0.1759
The results shown are for both the models. You can see that the overall model to predict y
is significant with the lags 2 and 5 of x
having p-values less than 0.1. The model to predict x
is not significant. As we did in the previous section, we should check for the serial correlation. Here, the VAR
package provides the serial.test()
function for multivariate autocorrelation. It offers several different tests, but let's focus on Portmanteau Test
, which is the default. The null hypothesis is that autocorrelations are zero and the alternate is that they are not zero:
> serial.test(lag5,type="PT.asymptotic") Portmanteau Test (asymptotic) data: Residuals of VAR object lag5 Chi-squared = 36.4377, df = 44, p-value = 0.7839
With p-value
at 0.7839
, we do not have evidence to reject the null and can say that the residuals are not autocorrelated.
To do the Granger causality tests in R, you can use either the lmtest
package and the Grangertest()
function or the causality()
function in the vars
package. I'll demonstrate the technique using causality()
. It is very easy as you just need to create two objects, one for x
causing y
and one for y
causing x
; utilizing the lag5
object previously created:
> x2y = causality(lag5,cause="x") > y2x = causality(lag5,cause="y")
It is now just a simple matter to call the Granger test results:
> x2y$Granger Granger causality H0: x do not Granger-cause y data: VAR object lag5 F-Test = 2.0883, df1 = 5, df2 = 100, p-value = 0.07304 > y2x$Granger Granger causality H0: y do not Granger-cause x data: VAR object lag5 F-Test = 0.731, df1 = 5, df2 = 100, p-value = 0.6019
The p-value
value for x
Granger-causing y
is 0.07304
and for y
causing x
is 0.6019
. So what does all this mean? The first thing that we can say is y
does not cause x
. As for x
causing y
, we cannot reject the null at the 0.05 significance level and therefore, conclude that x
does not Granger-cause y
. However, is this the relevant conclusion here? Remember that p-value
evaluates how likely the effect is if the null hypothesis is true. Also remember that the test was never designed to be some binary Yay or Nay. If this were a controlled experiment, then likely we wouldn't hesitate to say that we had insufficient evidence to reject the null, for example, a phase-3 clinical trial. As this study is based on observational data, I believe we can say that it is highly probable that CO2 emissions Granger-cause the surface temperature anomalies. However, there is a lot of room for criticism in this conclusion. I mentioned upfront the controversy around the quality of the data. The thing that still concerns me is what year to start the analysis from. I chose 1945 because it looked about right; you could say that I applied proc eyeball, in SAS terminology. What year is chosen has a dramatic impact on the analysis: changing the lag structure and also leading to insignificant p-values. The other thing that I want to point out is the lag of five years and the coefficient, which is negative.
Now, the Granger causality test is not designed to use the coefficients from a vector autoregression in a forecast, so it is not safe to say that an increase in the CO2 emissions would lead to lower temperatures five years later. We are merely looking for a causal relationship, which seems to be based on a five-year lag. Assume that the real-world relationship was 20 or 30 years, then this technique would be irrelevant to the problem given the timeframe in question. It would also be interesting to include a third variable such as a measure of annual solar radiation, but this was beyond the scope of this chapter.
The last thing to show here is how to use vector autoregression in order to produce a forecast. With the following predict()
function, we can produce a point estimate and confidence intervals for a timeframe that we specify:
> predict(lag5, n.ahead=10, ci=0.95) $y fcst lower upper CI [1,] 0.081703785 -0.1186856 0.28209313 0.2003893 [2,] 0.017119634 -0.2076806 0.24191986 0.2248002 [3,] 0.096799604 -0.1424826 0.33608184 0.2392822 [4,] -0.149113923 -0.3888954 0.09066751 0.2397814 [5,] -0.011618877 -0.2611685 0.23793073 0.2495496 [6,] 0.054878791 -0.2137249 0.32348244 0.2686036 [7,] 0.021115281 -0.2479521 0.29018265 0.2690674 [8,] -0.035593173 -0.3056111 0.23442472 0.2700179 [9,] 0.045634059 -0.2251345 0.31640259 0.2707685 [10,] 0.003465538 -0.2687530 0.27568406 0.2722185 $x fcst lower upper CI [1,] 136.7695 -86.94633 360.4853 223.7158 [2,] 251.0853 13.06491 489.1057 238.0204 [3,] 112.0863 -130.39147 354.5642 242.4778 [4,] 108.7158 -140.90916 358.3408 249.6250 [5,] 158.7534 -93.41522 410.9221 252.1686 [6,] 122.3686 -131.04145 375.7787 253.4101 [7,] 127.3391 -126.76240 381.4405 254.1015 [8,] 155.3462 -99.09757 409.7899 254.4437 [9,] 156.2730 -98.52536 411.0713 254.7983 [10,] 137.2128 -118.36867 392.7943 255.5815
Finally, a plot is available to view the forecasts:
> plot(forecast(lag5))
When all is said and done, it clearly seems that more work needs to be done, but I believe that Granger causality has pointed us in the right direction. If nothing else, I hope it has stimulated your thinking on how to apply the technique to your own real-world problems or maybe even examine the climate change data in more detail. There should be a high bar when it comes to demonstrating the causality, and Granger causality is a great tool for assisting in this endeavor.