3. A Primer on Financial Time Series Analysis
This chapter serves two purposes: First, it gives a very brief refresher on the basic concepts in probability and statistics, and introduces the bivariate linear regression model. Second, it gives an introduction to time series analysis with a focus on the models most relevant for financial risk management. An important goal of the second part of the chapter is to ensure that the reader avoids some common pitfalls encountered by risk managers working with time series data such as prices and returns. These pitfalls can be summarized as: (1) Spurious detection of mean-reversion, (2) spurious significance in regressions, and (3) spurious detection of causality.
Keywords: Probability, distributions, moments, regression, ARMA, maximum likelihood, VAR.

1. Chapter Overview

This chapter serves two purposes: First, it gives a very brief refresher on the basic concepts in probability and statistics, and introduces the bivariate linear regression model. Second, it gives an introduction to time series analysis with a focus on the models most relevant for financial risk management. The chapter can be skipped by readers who have recently taken a course in time series analysis or in financial econometrics.
The material in the chapter is organized in the following four sections:
1. Probability Distributions and Moments
2. The Linear Model
3. Univariate Time Series Models
4. Multivariate Time Series Models
The chapter thus tries to cover a broad range of material that really would take several books to do justice. The section “Further Resources” at the end of the chapter therefore suggests books that can be consulted for readers who need to build a stronger foundation in statistics and econometrics and also for readers who are curious to tackle more advanced topics in time series analysis.
An important goal of the financial time series analysis part of the chapter is to ensure that the reader avoids some common pitfalls encountered by risk managers working with time series data such as prices and returns. These pitfalls can be summarized as
• Spurious detection of mean-reversion; that is, erroneously finding that a variable is mean-reverting when it is truly a random walk
• Spurious regression; that is, erroneously finding that a variable x is significant in a regression of y on x
• Spurious detection of causality; that is, erroneously finding that the current value of x causes (helps determine) future values of y when in reality it cannot
Before proceeding to these important topics in financial time series analysis we first provide a quick refresher on basic probability and statistics.

2. Probability Distributions and Moments

The probability distribution of a discrete random variable, x, describes the probability of each possible outcome of x. Even if an asset price in reality can only take on discrete values (for example $14.55) and not a continuum of values (for example $14.55555.....) we usually use continuous densities rather than discrete distributions to describe probability of various outcomes. Continuous probability densities are more analytically tractable and they approximate well the discrete probability distributions relevant for risk management.

2.1. Univariate Probability Distributions

Let the function F(x) denote the cumulative probability distribution function of the random variable x so that the probability of x being less than the value a is given by
B9780123744487000038/si1.gif is missing
Let f(x) be the probability density of x and assume that x is defined from −∞ to +∞. The probability of obtaining a value of x less that a can be had from the density via the integral
B9780123744487000038/si2.gif is missing
so that B9780123744487000038/si3.gif is missing. We also have that
B9780123744487000038/si4.gif is missing
Because the density is continuous the probability of obtaining any particular value a is zero. The probability of obtaining a value in an interval between b and a is
B9780123744487000038/si5.gif is missing
The expected value or mean of x captures the average outcome of a draw from the distribution and it is defined as the probability weighted average of x
B9780123744487000038/si6.gif is missing
The basic rules of integration and the property that B9780123744487000038/si7.gif is missing provides useful results for manipulating expectations, for example
B9780123744487000038/si8.gif is missing
where a and b are constants.
Variance is a measure of the expected variation of variable around its mean. It is defined by
B9780123744487000038/si9.gif is missing
Note that
B9780123744487000038/si10.gif is missing
which follows from B9780123744487000038/si11.gif is missing. From this we have that
B9780123744487000038/si12.gif is missing
The standard deviation is defined as the square root of the variance. In risk management, volatility is often used as a generic term for either variance or standard deviation.
From this note, if we define a variable B9780123744487000038/si13.gif is missing and if the mean of x is zero and the variance of x is one then
B9780123744487000038/si14.gif is missing
This is useful for creating variables with the desired mean and variance.
Mean and variance are the first two central moments. The third and fourth central moments, also known as skewness and kurtosis, are defined by:
B9780123744487000038/si15.gif is missing
Note that by subtracting B9780123744487000038/si16.gif is missing before taking powers and by dividing skewness by B9780123744487000038/si17.gif is missing and kurtosis by B9780123744487000038/si18.gif is missing we ensure that
B9780123744487000038/si19.gif is missing
and we therefore say that skewness and kurtosis are location and scale invariant.
As an example consider the normal distribution with parameters B9780123744487000038/si20.gif is missing and B9780123744487000038/si21.gif is missing. It is defined by
B9780123744487000038/si22.gif is missing
The normal distribution has the first four moments
B9780123744487000038/si23.gif is missing

2.2. Bivariate Distributions

When considering two random variables x and y we can define the bivariate density f(x, y) so that
B9780123744487000038/si24.gif is missing
Covariance is the most common measure of linear dependence between two variables. It is defined by
B9780123744487000038/si25.gif is missing
From the properties of integration we have the following convenient result:
B9780123744487000038/si26.gif is missing
so that the covariance depends on the magnitude of x and y but not on their means. Note also from the definition of covariance that
B9780123744487000038/si27.gif is missing
From the covariance and variance definitions we can define correlation by
B9780123744487000038/si28.gif is missing
Notice that the correlation between x and y does not depend on the magnitude of x and y. We have
B9780123744487000038/si29.gif is missing
A perfect positive linear relationship between x and y would exist if B9780123744487000038/si30.gif is missing, in which case
B9780123744487000038/si31.gif is missing
A perfect negative linear relationship between x and y exists if B9780123744487000038/si32.gif is missing, in which case
B9780123744487000038/si33.gif is missing
This suggests that correlation is bounded between −1 and +1, which is indeed the case. This fact is convenient when interpreting a given correlation value.

2.3. Conditional Distributions

Risk managers often want to describe a variable y using information on another variable x. From the joint distribution of x and y we can denote the conditional distribution of y given x, f(y|x). It must be the case that
B9780123744487000038/si34.gif is missing
which indirectly defines the conditional distribution as
B9780123744487000038/si35.gif is missing
This definition can be used to define the conditional mean and variance
B9780123744487000038/si36.gif is missing
Note that these conditional moments are functions of x but not of y.
If x and y are independent then B9780123744487000038/si37.gif is missing and so B9780123744487000038/si38.gif is missing and we have that the conditional moments
B9780123744487000038/si39.gif is missing
equal the corresponding unconditional moments.

2.4. Sample Moments

We now introduce the standard methods for estimating the moments introduced earlier.
Consider a sample of T observations of the variable x, namely B9780123744487000038/si40.gif is missing. We can estimate the mean using the sample average
B9780123744487000038/si41.gif is missing
and we can estimate the variance using the sample average of squared deviations from the average
B9780123744487000038/si42.gif is missing
Sometimes the sample variance uses B9780123744487000038/si43.gif is missing instead of B9780123744487000038/si44.gif is missing but unless T is very small then the difference can be ignored.
Similarly skewness and kurtosis can be estimated by
B9780123744487000038/si45.gif is missing
The sample covariances between two random variables can be estimated via
B9780123744487000038/si46.gif is missing
and the sample correlation between two random variables, x and y, is calculated as
B9780123744487000038/si47.gif is missing

3. The Linear Model

Risk managers often rely on linear models of the type
B9780123744487000038/si48.gif is missing
where B9780123744487000038/si49.gif is missing and x and B9780123744487000038/si50.gif is missing are assumed to be independent or sometimes just uncorrelated. If we know the value of x then we can use the linear model to predict y via the conditional expectation of y given x
B9780123744487000038/si51.gif is missing
In the linear model the unconditional expectations of x and y are linked via
B9780123744487000038/si52.gif is missing
so that
B9780123744487000038/si53.gif is missing
We also have that
B9780123744487000038/si54.gif is missing
so that
B9780123744487000038/si55.gif is missing
In the linear model the variances of x and y are linked via
B9780123744487000038/si56.gif is missing
Consider observation t in the linear model
B9780123744487000038/si57.gif is missing
If we have a sample of T observations then we can estimate
B9780123744487000038/si58.gif is missing
and
B9780123744487000038/si59.gif is missing
In the more general linear model with J different x-variables we have
B9780123744487000038/si60.gif is missing
Minimizing the sum of squared errors, B9780123744487000038/si61.gif is missing provides the ordinary least square (OLS) estimate of b:
B9780123744487000038/si62.gif is missing
The solution to this optimization problem is a linear function of y and x, which makes OLS estimation very easy to perform; thus it is built in to most common quantitative software packages such as Excel, where the OLS estimation function is called LINEST.

3.1. The Importance of Data Plots

While the linear model is useful in many cases, an apparent linear relationship between two variables can be deceiving. Consider the four (artificial) data sets in Table 3.1, which are known as Anscombe's quartet, named after their creator. All four data sets have 11 observations.
Consider now the moments of the data included at the bottom of the observations in Table 3.1. While the observations in the four data sets are clearly different from each other, the mean and variance of the x and y variables is exactly the same across the four data sets. Furthermore, the correlation between x and y are also the same across the four pairs of variables. Finally, the last two rows of Table 3.1 show that when regressing y on x using the linear model
B9780123744487000038/si63.gif is missing
we get the parameter estimates B9780123744487000038/si64.gif is missing and B9780123744487000038/si65.gif is missing in all the four cases. This data has clearly been reverse engineered by Anscombe to produce such striking results.
Table 3.1 Anscombe's quartet
Notes: The table contains the four bivariate data sets in Anscombe's quartet. Below each of the eight variables we report the mean and the variance. We also report the correlation between x and y in each of the four data sets. The parameter a denotes the constant and b denotes the slope from the regression of y on x.
IIIIIIIV
xyxyxyxy
108.04109.14107.4686.58
86.9588.1486.7785.76
137.58138.741312.7487.71
98.8198.7797.1188.84
118.33119.26117.8188.47
149.96148.1148.8487.04
67.2466.1366.0885.25
44.2643.145.391912.5
1210.84129.13128.1585.56
74.8277.2676.4287.91
55.6854.7455.7386.89
Moments
Mean9.07.59.07.59.07.59.07.5
Variance11.04.111.04.111.04.111.04.1
Correlation0.820.820.820.82
Regression
a3.003.003.003.00
b0.500.500.500.50
Figure 3.1 scatter plots y against x in the four data sets with the regression line included in each case. Figure 3.1 is clearly much more revealing than the moments and the regression results.
B9780123744487000038/f03-01-9780123744487.jpg is missing
Figure 3.1
Scatter plot of Anscombe's four data sets with regression lines. Notes: For each of the four data sets in Anscombe's quartet we scatter plot the variables and also report the regression line from fitting y on x.
We conclude that moments and regressions can be useful for summarizing variables and relationships between them but whenever possible it is crucial to complement the analysis with figures. When plotting your data you may discover:
• A genuine linear relationship as in the top-left panel of Figure 3.1
• A genuine nonlinear relationship as in the top-right panel
• A biased estimate of the slope driven by an outlier observation as in the bottom-left panel
• A trivial relationship, which appears as a linear relationship again due to an outlier as in the bottom-right panel of Figure 3.1
Remember: Always plot your variables before beginning a statistical analysis of them.

4. Univariate Time Series Models

Univariate time series analysis studies the behavior of a single random variable observed over time. Risk managers are interested in how prices and risk factors move over time; therefore time series models are useful for risk managers. Forecasting the future values of a variable using past and current observations on the same variable is a key topic in univariate time series analysis.

4.1. Autocorrelation

Correlation measures the linear dependence between two variables and autocorrelation measures the linear dependence between the current value of a time series variable and the past value of the same variable. Autocorrelation is a crucial tool for detecting linear dynamics in time series analysis.
The autocorrelation for lag τ is defined as
B9780123744487000038/si66.gif is missing
so that it captures the linear relationship between today's value and the value τ days ago.
Consider a data set on an asset return, B9780123744487000038/si67.gif is missing. The sample autocorrelation at lag τ measures the linear dependence between today's return, B9780123744487000038/si68.gif is missing, and the return τ days ago, B9780123744487000038/si69.gif is missing. Using the autocorrelation definition, we can write the sample autocorrelation as
B9780123744487000038/si70.gif is missing
In order to detect dynamics in a time series, it is very useful to first plot the autocorrelation function (ACF), which plots B9780123744487000038/si71.gif is missing on the vertical axis against τ on the horizontal axis.
The statistical significance of a set of autocorrelations can be formally tested using the Ljung-Box statistic. It tests the null hypothesis that the autocorrelation for lags 1 through m are all jointly zero via
B9780123744487000038/si72.gif is missing
where B9780123744487000038/si73.gif is missing denotes the chi-squared distribution with m degrees of freedom.
The critical value of B9780123744487000038/si74.gif is missing corresponding to the probability p can be found for example by using the CHIINV function in Excel. If p = 0.95 and m = 20, then the formula CHIINV(0.95,20) in Excel returns the value 10.85. If the test statistic LB(20) computed using the first 20 autocorrelations is larger than 10.85 then we reject the hypothesis that the first 20 autocorrelations are zero at the 5% significance level.
Clearly, the maximum number of lags, m, must be chosen in order to implement the test. Often the application at hand will give some guidance. For example if we are looking to detect intramonth dynamics in a daily return, we use m = 21 corresponding to 21 trading days in a month. When no such guidance is available, setting m = ln(T) has been found to work well in simulation studies.

4.2. Autoregressive (AR) Models

Once a pattern has been found in the autocorrelations then we want to build forecasting models that can match the pattern in the autocorrelation function.
The simplest and most used model for this purpose is the autoregressive model of order 1, AR(1), which is defined as
B9780123744487000038/si75.gif is missing
where B9780123744487000038/si76.gif is missingB9780123744487000038/si77.gif is missing and where we assume that B9780123744487000038/si78.gif is missing and B9780123744487000038/si79.gif is missing are independent for all τ > 0. Under these assumptions the conditional mean forecast for one period ahead is
B9780123744487000038/si80.gif is missing
By writing the AR(1) model for B9780123744487000038/si81.gif is missing and repeatedly substituting past values we get
B9780123744487000038/si82.gif is missing
The multistep forecast in the AR(1) model is therefore
B9780123744487000038/si83.gif is missing
If B9780123744487000038/si84.gif is missing then the (unconditional) mean of the model can be denoted by
B9780123744487000038/si85.gif is missing
which in the AR(1) model implies
B9780123744487000038/si86.gif is missing
The unconditional variance is similarly
B9780123744487000038/si87.gif is missing
because B9780123744487000038/si88.gif is missing when B9780123744487000038/si89.gif is missing.
Just as time series data can be characterized by the ACF then so can linear time series models. To derive the ACF for the AR(1) model assume without loss of generality that μ = 0. Then
B9780123744487000038/si90.gif is missing
This provides the ACF of the AR(1) model. Notice the similarity between the ACF and the multistep forecast earlier.
The lag order τ appears in the exponent of B9780123744487000038/si91.gif is missing and we therefore say that the ACF of an AR(1) model decays exponentially to zero as τ increases. The case when B9780123744487000038/si92.gif is missing is close to 1 but not quite 1 is important in financial economics. We refer to this as a highly persistent series.
Figure 3.2 shows examples of the ACF in AR(1) models with four different (positive) values of B9780123744487000038/si93.gif is missing. When B9780123744487000038/si94.gif is missing then the ACF decays to zero exponentially. Clearly the decay is much slower when B9780123744487000038/si95.gif is missing than when it is 0.5 or 0.1. When B9780123744487000038/si98.gif is missing then the ACF is flat at 1. This is the case of a random walk, which we will study further later.
B9780123744487000038/f03-02-9780123744487.jpg is missing
Figure 3.2
Autocorrelation functions for AR(1) models with positive B9780123744487000038/si9000.gif is missing. Notes: We plot the autocorrelation function for four AR(1) processes with different values of the autoregressive parameter B9780123744487000038/si9001.gif is missing. When B9780123744487000038/si9002.gif is missing then the ACF decays to 0 at an exponential rate.
Figure 3.3 shows the ACF of an AR(1) when B9780123744487000038/si99.gif is missing. Notice the drastically different ACF pattern compared with Figure 3.2. When B9780123744487000038/si100.gif is missing then the ACF oscillates around zero but it still decays to zero as the lag order increases. The ACFs in Figure 3.2 are much more common in financial risk management than are the ACFs in Figure 3.3.
B9780123744487000038/f03-03-9780123744487.jpg is missing
Figure 3.3
Autocorrelation functions for an AR(1) model with B9780123744487000038/si9003.gif is missing. Notes: We plot the autocorrelation function for an AR(1) model with B9780123744487000038/si9004.gif is missing against the lag order.
The simplest extension to the AR(1) model is the AR(2) defined as
B9780123744487000038/si101.gif is missing
The autocorrelation function of the AR(2) is
B9780123744487000038/si102.gif is missing
for example
B9780123744487000038/si103.gif is missing
so that
B9780123744487000038/si104.gif is missing
In order to derive the first-order autocorrelation note first that the ACF is symmetric around τ = 0 meaning that
B9780123744487000038/si105.gif is missing
We therefore get that
B9780123744487000038/si106.gif is missing
implies
B9780123744487000038/si107.gif is missing
so that
B9780123744487000038/si108.gif is missing
The general AR(p) model is defined by
B9780123744487000038/si109.gif is missing
The one-step ahead forecast in the AR(p) model is simply
B9780123744487000038/si110.gif is missing
The τ day ahead forecast can be built using
B9780123744487000038/si111.gif is missing
which is sometimes called the chain-rule of forecasting. Note that when B9780123744487000038/si112.gif is missing then
B9780123744487000038/si113.gif is missing
because B9780123744487000038/si114.gif is missing is known at the time the forecast is made when B9780123744487000038/si115.gif is missing.
The partial autocorrelation function (PACF) gives the marginal contribution of an additional lagged term in AR models of increasing order. First estimate a series of AR models of increasing order:
B9780123744487000038/si116.gif is missing
The PACF is now defined as the collection of the largest order coefficients
B9780123744487000038/si117.gif is missing
which can be plotted against the lag order just as we did for the ACF.
The optimal lag order p in the AR(p) can be chosen as the largest p such that B9780123744487000038/si118.gif is missing is significant in the PACF. For example, an AR(3) will have a significant B9780123744487000038/si119.gif is missing but it will have a B9780123744487000038/si120.gif is missing close to zero.
Note that in AR models the ACF decays exponentially whereas the PACF decays abruptly. This is why the PACF is useful for AR model order selection.
The AR(p) models can be easily estimated using simple OLS regression on observations B9780123744487000038/si121.gif is missing through T. A useful diagnostic test of the model is to plot the ACF of residuals from the model and perform a Ljung-Box test on the residuals using B9780123744487000038/si122.gif is missing degrees of freedom.

4.3. Moving Average (MA) Models

In AR models the ACF dies off exponentially, however, certain dynamic features such as bid-ask bounces or measurement errors die off abruptly and require a different type of model. Consider the MA(1) model in which
B9780123744487000038/si123.gif is missing
where B9780123744487000038/si124.gif is missing and B9780123744487000038/si125.gif is missing are independent of each other and where B9780123744487000038/si126.gif is missing. Note that
B9780123744487000038/si127.gif is missing
and
B9780123744487000038/si128.gif is missing
In order to derive the ACF of the MA(1) assume without loss of generality that B9780123744487000038/si129.gif is missing. We then have
B9780123744487000038/si130.gif is missing
Using the variance expression from before, we get the ACF
B9780123744487000038/si131.gif is missing
Note that the autocorrelations for the MA(1) are zero for B9780123744487000038/si132.gif is missing.
Unlike AR models, the MA(1) model must be estimated by numerical optimization of the likelihood function. We proceed as follows. First, set the unobserved B9780123744487000038/si133.gif is missing, which is its expected value. Second, set parameter starting values (initial guesses) for B9780123744487000038/si134.gif is missing, B9780123744487000038/si135.gif is missing, and B9780123744487000038/si136.gif is missing. We can use the average of B9780123744487000038/si137.gif is missing for B9780123744487000038/si138.gif is missing use 0 for B9780123744487000038/si139.gif is missing, and use the sample variance of B9780123744487000038/si140.gif is missing for B9780123744487000038/si141.gif is missing. Now we can compute the time series of residuals via
B9780123744487000038/si142.gif is missing
We are now ready to estimate the parameters by maximizing the likelihood function that we must first define. Let us first assume that B9780123744487000038/si143.gif is missing is normally distributed, then
B9780123744487000038/si144.gif is missing
To construct the likelihood function note that as the B9780123744487000038/si145.gif is missing s are independent over time we have
B9780123744487000038/si146.gif is missing
and we therefore can write the joint distribution of the sample as
B9780123744487000038/si147.gif is missing
The maximum likelihood estimation method chooses parameters to maximize the probability of the estimated model (in this case MA(1)) having generated the observed data set (in this case the set of B9780123744487000038/si148.gif is missing s).
In the MA(1) model we must perform an iterative search (using for example Solver in Excel) over the parameters B9780123744487000038/si149.gif is missing:
B9780123744487000038/si150.gif is missing
Once the parameters have been estimated we can use the model for forecasting. In the MA(1) model the conditional mean forecast is
B9780123744487000038/si151.gif is missing
The general MA(q) model is defined by
B9780123744487000038/si152.gif is missing
It has an ACF that is nonzero for the first q lags and then zero for lags larger than q.
Note that MA models are easily identified using the ACF. If the ACF of a data series dies off to zero abruptly after the first four (nonzero) lags then an MA(4) is likely to provide a good fit of the data.

4.4. Combining AR and MA into ARMA Models

Parameter parsimony is key in forecasting, and combining AR and MA models into ARMA models often enables us to model dynamics with fewer parameters.
Consider the ARMA(1,1) model, which includes one lag of B9780123744487000038/si153.gif is missing and one lag of B9780123744487000038/si154.gif is missing:
B9780123744487000038/si155.gif is missing
As in the AR(1), the mean of the ARMA(1,1) time series is given from
B9780123744487000038/si156.gif is missing
which implies that
B9780123744487000038/si157.gif is missing
when B9780123744487000038/si158.gif is missing. In this case B9780123744487000038/si159.gif is missing will tend to fluctuate around the mean, B9780123744487000038/si160.gif is missing, over time. We say that B9780123744487000038/si161.gif is missing is mean-reverting in this case.
Using the fact that B9780123744487000038/si162.gif is missing we can get the variance from
B9780123744487000038/si163.gif is missing
which implies that
B9780123744487000038/si164.gif is missing
The first-order autocorrelation is given from
B9780123744487000038/si165.gif is missing
in which we assume again that B9780123744487000038/si166.gif is missing. This implies that
B9780123744487000038/si167.gif is missing
so that
B9780123744487000038/si168.gif is missing
For higher order autocorrelations the MA term has no effect and we get the same structure as in the AR(1) model
B9780123744487000038/si169.gif is missing
The general ARMA(p, q) model is
B9780123744487000038/si171.gif is missing
Because of the MA term, ARMA models just as MA models must be estimated using maximum likelihood estimation (MLE). Diagnostics on the residuals can be done via Ljung-Box tests with degrees of freedom equal to B9780123744487000038/si172.gif is missing.

4.5. Random Walks, Units Roots, and ARIMA Models

The random walk model is a key benchmark in financial forecasting. It is often used to model speculative prices in logs. Let B9780123744487000038/si173.gif is missing be the closing price of an asset and let B9780123744487000038/si174.gif is missing so that log returns are immediately defined by B9780123744487000038/si175.gif is missing.
The random walk (or martingale) model for log prices is now defined by
B9780123744487000038/si176.gif is missing
By iteratively substituting in lagged log prices we can write
B9780123744487000038/si177.gif is missing
Because past B9780123744487000038/si178.gif is missing residual (or shocks) matter equally and fully for B9780123744487000038/si179.gif is missing regardless of τ we say that past shocks have permanent effects in the random walk model.
In the random walk model, the conditional mean and variance forecasts for the log price are
B9780123744487000038/si180.gif is missing
Note that the forecast for s at any horizon is just today's value, B9780123744487000038/si181.gif is missing. We therefore sometimes say that the random walk model implies that the series is not predictable. Note also that the conditional variance of the future value is a linear function of the forecast horizon, τ.
Equity returns typically have a small positive mean corresponding to a small drift in the log price. This motivates the random walk with drift model
B9780123744487000038/si182.gif is missing
Substituting in lagged prices back to time 0, we have
B9780123744487000038/si183.gif is missing
Notice that in this model the constant drift μ in returns corresponds to a coefficient on time, t, in the log price model. We call this a deterministic time trend and we refer to the sum of the ε s as a stochastic trend.
A time series, B9780123744487000038/si184.gif is missing, follows an ARIMA(p, 1, q) model if the first differences, B9780123744487000038/si186.gif is missing, follow a mean-reverting ARMA(p, q) model. In this case we say that B9780123744487000038/si188.gif is missing has a unit root. The random walk model has a unit root as well because in that model
B9780123744487000038/si189.gif is missing
which is a trivial ARMA(0,0) model.

4.6. Pitfall 1: Spurious Mean-Reversion

Consider the AR(1) model again:
B9780123744487000038/si190.gif is missing
Note that when B9780123744487000038/si191.gif is missing then the AR(1) model has a unit root and becomes the random walk model. The OLS estimator contains an important small sample bias in dynamic models. For example, in an AR(1) model when the true B9780123744487000038/si192.gif is missing coefficient is close or equal to 1, the finite sample OLS estimate will be biased downward. This is known as the Hurwitz bias or the Dickey-Fuller bias. This bias is important to keep in mind.
If B9780123744487000038/si193.gif is missing is estimated in a small sample of asset prices to be 0.85 then it implies that the underlying asset price is predictable and market timing thus feasible. However, the true value may in fact be 1, which means that the price is a random walk and so unpredictable.
The aim of technical trading analysis is to find dynamic patterns in asset prices. Econometricians are very skeptical about this type of analysis exactly because it attempts to find dynamic patterns in prices and not returns. Asset prices are likely to have a B9780123744487000038/si194.gif is missing very close to 1, which in turn is likely to be estimated to be somewhat lower than 1, which in turn suggests predictability. Asset returns have a B9780123744487000038/si195.gif is missing close to zero and the estimate of an AR(1) on returns does not suffer from bias. Looking for dynamic patterns in asset returns is much less likely to produce false evidence of predictability than is looking for dynamic patterns in asset returns. Risk managers ought to err on the side of prudence and thus consider dynamic models of asset returns and not asset prices.

4.7. Testing for Unit Roots

Asset prices often have a B9780123744487000038/si196.gif is missing very close to 1. But we are very interested in knowing whether B9780123744487000038/si197.gif is missing or 1 because the two values have very different implications for longer term forecasting as indicated by Figure 3.2. B9780123744487000038/si198.gif is missing implies that the asset price is predictable so that market timing is possible whereas B9780123744487000038/si199.gif is missing implies it is not. Consider again the AR(1) model with and without a constant term:
B9780123744487000038/si200.gif is missing
Unit root tests (also known as Dickey-Fuller tests) have been developed to assess the null hypothesis
B9780123744487000038/si201.gif is missing
against the alternative hypothesis that
B9780123744487000038/si202.gif is missing
This looks like a standard t-test in a regression but it is crucial that when the null hypothesis B9780123744487000038/si203.gif is missing is true, so that B9780123744487000038/si204.gif is missing, the unit root test does not have the usual normal distribution even when T is large. If you estimate B9780123744487000038/si205.gif is missing using OLS and test that B9780123744487000038/si206.gif is missing using the usual t-test with critical values from the normal distribution then you are likely to reject the null hypothesis much more often than you should. This means that you are likely to spuriously find evidence of mean-reversion, that is, predictability.

5. Multivariate Time Series Models

Multivariate time series analysis is relevant for risk management because we often consider risk models with multiple related risk factors or models with many assets. This section will briefly introduce the following important topics: time series regressions, spurious relationships, cointegration, cross correlations, vector autoregressions, and spurious causality.

5.1. Time Series Regression

The relationship between two (or more) time series can be assessed applying the usual regression analysis. But in time series analysis the regression errors must be scrutinized carefully.
Consider a simple bivariate regression of two highly persistent series, for example, the spot and futures price of an asset
B9780123744487000038/si207.gif is missing
The first step in diagnosing such a time series regression model is to plot the ACF of the regression errors, B9780123744487000038/si208.gif is missing.
If ACF dies off only very slowly (the Hurwitz bias will make the ACF look like it dies off faster to zero than it really does) then it is good practice to first-difference each series and run the regression
B9780123744487000038/si209.gif is missing
Now the ACF can be used on the residuals of the new regression and the ACF can be checked for dynamics. The AR, MA, or ARMA models can be used to model any dynamics in B9780123744487000038/si210.gif is missing. After modeling and estimating the parameters in the residual time series, B9780123744487000038/si211.gif is missing, the entire regression model including a and b can be reestimated using MLE.

5.2. Pitfall 2: Spurious Regression

Checking the ACF of the error term in time series regressions is particularly important due to the so-called spurious regression phenomenon: Two completely unrelated times series—each with a unit root—are likely to appear related in a regression that has a significant b coefficient.
Specifically, let B9780123744487000038/si212.gif is missing and B9780123744487000038/si213.gif is missing be two independent random walks
B9780123744487000038/si214.gif is missing
where B9780123744487000038/si215.gif is missing and B9780123744487000038/si216.gif is missing are independent of each other and independent over time. Clearly the true value of b is zero in the time series regression
B9780123744487000038/si217.gif is missing
However, in practice, standard t-tests using the estimated b coefficient will tend to conclude that b is nonzero when in truth it is zero. This problem is known as spurious regression.
Fortunately, as noted earlier, the ACF comes to the rescue for detecting spurious regression. If the relationship between B9780123744487000038/si218.gif is missing and B9780123744487000038/si219.gif is missing is spurious then the error term, B9780123744487000038/si220.gif is missing will have a highly persistent ACF and the regression in first differences
B9780123744487000038/si221.gif is missing
will not show a significant estimate of b. Note that Pitfall 1, earlier, was related to modeling univariate asset prices time series in levels rather than in first differences. Pitfall 2 is in the same vein: Time series regression on highly persistent asset prices is likely to lead to false evidence of a relationship, that is, a spurious relationship. Regression on returns is much more likely to lead to sensible conclusions about dependence across assets.

5.3. Cointegration

Relationships between variables with unit roots are of course not always spurious. A variable with a unit root, for example a random walk, is also called integrated, and if two variables that are both integrated have a linear combination with no unit root then we say they are cointegrated.
Examples of cointegrated variables could be long-run consumption and production in an economy, or the spot and the futures price of an asset that are related via a no-arbitrage condition. Similarly, consider the pairs trading strategy that consists of finding two stocks whose prices tend to move together. If prices diverge then we buy the temporarily cheap stock and short sell the temporarily expensive stock and wait for the typical relationship between the prices to return. Such a strategy hinges on the stock prices being cointegrated.
Consider a simple bivariate model where
B9780123744487000038/si222.gif is missing
Note that B9780123744487000038/si223.gif is missing has a unit root and that the level of B9780123744487000038/si224.gif is missing and B9780123744487000038/si225.gif is missing are related via b. Assume that B9780123744487000038/si226.gif is missing and B9780123744487000038/si227.gif is missing are independent of each other and independent over time.
The cointegration model can be used to preserve the relationship between the variables in the long-term forecasts
B9780123744487000038/si228.gif is missing
The concept of cointegration was developed by Rob Engle and Clive Granger. They together received the Nobel Prize in Economics in 2003 for this and many other contributions to financial time series analysis.

5.4. Cross-Correlations

Consider again two financial time series, B9780123744487000038/si229.gif is missing and B9780123744487000038/si230.gif is missing. They can be dependent in three possible ways: B9780123744487000038/si231.gif is missing can lead B9780123744487000038/si232.gif is missing(e.g., B9780123744487000038/si233.gif is missing), B9780123744487000038/si234.gif is missing can lag B9780123744487000038/si235.gif is missing(e.g., B9780123744487000038/si236.gif is missing), and they can be contemporaneously related (e.g., B9780123744487000038/si237.gif is missing). We need a tool to detect all these possible dynamic relationships.
The sample cross-correlation matrices are the multivariate analogues of the ACF function and provide the tool we need. For a bivariate time series, the cross-covariance matrix for lag τ is
B9780123744487000038/si238.gif is missing
Note that the two diagonal terms are the autocovariance function of B9780123744487000038/si239.gif is missing, and B9780123744487000038/si240.gif is missing, respectively.
In the general case of a k-dimensional time series, we have
B9780123744487000038/si241.gif is missing
where B9780123744487000038/si242.gif is missing is now a k by 1 vector of variables.
Detecting lead and lag effects is important, for example when relating an illiquid stock to a liquid market factor. The illiquidity of the stock implies price observations that are often stale, which in turn will have a spuriously low correlation with the liquid market factor. The stale equity price will be correlated with the lagged market factor and this lagged relationship can be used to compute a liquidity-corrected measure of the dependence between the stock and the market.

5.5. Vector Autoregressions (VAR)

The vector autoregression model (VAR), which is not to be confused with Value-at-Risk (VaR), is arguably the simplest and most often used multivariate time series model for forecasting. Consider a first-order VAR, call it VAR(1)
B9780123744487000038/si243.gif is missing
where B9780123744487000038/si244.gif is missing is again a k by 1 vector of variables.
The bivariate case is simply
B9780123744487000038/si245.gif is missing
Note that in the VAR, B9780123744487000038/si246.gif is missing and B9780123744487000038/si247.gif is missing are contemporaneously related via their covariance B9780123744487000038/si248.gif is missing. But just as in the AR model, the VAR only depends on lagged variables so that it is immediately useful in forecasting.
If the variables included on the right-hand-side of each equation in the VAR are the same (as they are above) then the VAR is called unrestricted and OLS can be used equation-by-equation to estimate the parameters.

5.6. Pitfall 3: Spurious Causality

We may sometimes be interested to see if the lagged value of B9780123744487000038/si249.gif is missing, namely B9780123744487000038/si250.gif is missing, is causal for the current value of B9780123744487000038/si251.gif is missing, in which case it can be used in forecasting. To this end a simple regression of the form
B9780123744487000038/si252.gif is missing
could be used. Note that it is the lagged value B9780123744487000038/si253.gif is missing that appears on the right-hand side. Unfortunately, such a regression may easily lead to false conclusions if B9780123744487000038/si254.gif is missing is persistent and so depends on its own past value, which is not included on the right-hand side of the regression.
In order to truly assess if B9780123744487000038/si255.gif is missing causes B9780123744487000038/si256.gif is missing(or vice versa), we should ask the question: Is past B9780123744487000038/si257.gif is missing useful for forecasting current B9780123744487000038/si258.gif is missing once the past B9780123744487000038/si259.gif is missing has been accounted for? This question can be answered by running a VAR model:
B9780123744487000038/si260.gif is missing
Now we can define Granger causality (as opposed to spurious causality) as follows:
B9780123744487000038/si261.gif is missing is said to Granger cause B9780123744487000038/si262.gif is missing if B9780123744487000038/si263.gif is missing
B9780123744487000038/si264.gif is missing is said to Granger cause B9780123744487000038/si265.gif is missing if B9780123744487000038/si266.gif is missing
In some cases several lags of B9780123744487000038/si267.gif is missing may be needed on the right-hand side of the equation for B9780123744487000038/si268.gif is missing and similarly we may need more lags of B9780123744487000038/si269.gif is missing in the equation for B9780123744487000038/si270.gif is missing.

6. Summary

The financial asset prices and portfolio values typically studied by risk managers can be viewed as examples of very persistent time series. An important goal of this chapter is therefore to ensure that the risk manager avoids some common pitfalls that arise because of the persistence in prices. The three most important issues are
• Spurious detection of mean-reversion; that is, erroneously finding that a variable is mean-reverting when it is truly a random walk
• Spurious regression; that is, erroneously finding that a variable x is significant when regressing y on x
• Spurious detection of causality; that is, erroneously finding that the current value of x causes (helps determine) future values of y when in reality it cannot
Several more advanced topics have been left out of the chapter including long memory models and models of seasonality. Long memory models give more flexibility in modeling the autocorrelation function (ACF) than do the traditional ARIMA and ARMA models studied in this chapter. In particular long-memory models allow for the ACF to go to zero more slowly than the AR(1) model, which decays to zero at an exponential decay as we saw earlier. Seasonal models are useful, for example, for the analysis of agricultural commodity prices where seasonal patterns in supply cause seasonal patterns in prices, in expected returns, and in volatility. These topics can be studied using the resources suggested next.

Further Resources

For a basic introduction to financial data analysis, see Koop (2006) and for an introduction to probability theory see Paollela (2006). Wooldridge (2002) and Stock and Watson (2010) provide a broad introduction to econometrics. Anscombe (1973) contains the data in Table 3.1 and Figure 3.1.
The univariate and multivariate time series material in this chapter is based on Chapters 2 and 8 in Tsay (2002), which should be consulted for various extensions including seasonality and long memory. See also Taylor (2005) for an excellent treatment of financial time series analysis focusing on volatility modeling.
Diebold (2004) gives a thorough introduction to forecasting in economics. Granger and Newbold (1986) is the classic text for the more advanced reader. Christoffersen and Diebold (1998) analyze long-horizon forecasting in cointegrated systems.
The classic references on the key time series topics in this chapter are Hurwitz (1950) on the bias in the AR(1) coefficient, Granger and Newbold (1974) on spurious regression in economics, Engle and Granger (1987) on cointegration, Granger (1969) on Granger causality, and Dickey and Fuller (1979) on unit root testing. Hamilton (1994) provides an authoritative treatment of economic time series analysis.
Tables with critical values for unit root tests can be found in MacKinnon, 1996 and MacKinnon, 2010. See also Chapter 14 in Davidson and MacKinnon (2004).
References
Anscombe, F.J., Graphs in statistical analysis, Am. Stat. 27 (1973) 1721.
Christoffersen, P.; Diebold, F., Cointegration and long horizon forecasting, J. Bus. Econ. Stat. 16 (1998) 450458.
Davidson, R.; MacKinnon, J.G., Econometric Theory and Methods. (2004) Oxford University Press, New York, NY.
Dickey, D.A.; Fuller, W.A., Distribution of the estimators for autoregressive time series with a unit root, J. Am. Stat. Assoc. 74 (1979) 427431.
Diebold, F.X., Elements of Forecasting. third ed (2004) Thomson South-Western, Cincinnati, Ohio.
Engle, R.F.; Granger, C.W.J., Co-integration and error correction: Representation, estimation and testing, Econometrica 55 (1987) 251276.
Granger, C.W.J., Investigating causal relations by econometric models and cross-spectral methods, Econometrica 37 (1969) 424438.
Granger, C.W.J.; Newbold, P., Spurious regressions in econometrics, J. Econom. 2 (1974) 111120.
Granger, C.W.J.; Newbold, P., Forecasting Economic Time Series. second ed (1986) Academic Press, Orlando, FL.
Hamilton, J.D., Time Series Analysis. (1994) Princeton University Press, Princeton, NJ.
Hurwitz, L., Least squares bias in time series, In: (Editor: Koopmans, T.C.) Statistical Inference in Econometric Models (1950) Wiley, New York, NY.
Koop, G., Analysis of Financial Data. (2006) Wiley, Chichester, West Sussex, England.
MacKinnon, J.G., Numerical distribution functions for unit root and cointegration tests, J. Appl. Econom. 11 (1996) 601618.
MacKinnon, J.G., 2010. Critical Values for Cointegration Tests, Queen's Economics Department. Working Paper no 1227. http://ideas.repec.org/p/qed/wpaper/1227.html.
Paollela, M., Fundamental Probability. (2006) Wiley, Chichester, West Sussex, England.
Stock, J.; Watson, M., Introduction to Econometrics. second ed (2010) Pearson Addison Wesley.
Taylor, S.J., Asset Price Dynamics, Volatility and Prediction. (2005) Princeton University Press, Princeton, NJ.
Tsay, R., Analysis of Financial Time Series. (2002) Wiley Interscience, Hoboken, NJ.
Wooldridge, J., Introductory Econometrics: A Modern Approach. Second Edition (2002) South-Western College Publishing, Mason, Ohio.
Open the Chapter3Data.xlsx file from the web site.
1. Using the data in the worksheet named Question 3.1 reproduce the moments and regression coefficients at the bottom of Table 3.1.
2. Reproduce Figure 3.1.
3. Reproduce Figure 3.2.
4. Using the data sets in the worksheet named Question 3.4, estimate an AR(1) model on each of the 100 columns of data. (Excel hint: Use the LINEST function.) Plot the histogram of the 100 B9780123744487000038/si271.gif is missing estimates you have obtained. The true value of B9780123744487000038/si272.gif is missing is one in all the columns. What does the histogram tell you?
5. Using the data set in the worksheet named Question 3.4, estimate an MA(1) model using maximum likelihood. Use the starting values suggested in the text. Use Solver in Excel to maximize the likelihood function.
Answers to these exercises can be found on the companion site.
For more information see the companion site at http://www.elsevierdirect.com/companions/9780123744487
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset