5. Volatility Modeling Using Intraday Data
This chapter explores the use of intraday prices for computing daily volatility and for forecasting future volatility. We first introduce the concept of realized variance (RV) and look at four stylized facts of RV. We then look at different ways to forecast realized variance, as well as different ways to estimate realized variance, and we briefly look at some of the challenges of working with large and messy intraday data sets. At the end of the chapter we consider range-based proxies of daily volatility and also volatility forecast evaluation using RV and range-based volatility. Range-based volatilities are much easier to construct than RVs but in highly liquid markets RV will be more precise.
Keywords: Realized volatility, range-based volatility, market microstructure noise, sampling frequency

1. Chapter Overview

The goal of this chapter is to harness the information in intraday prices for computing daily volatility. Consider first estimating the mean of returns using a long sample of daily observations:
B9780123744487000051/si1.gif is missing
Note that when estimating the mean of returns only the first and the last observations matter: All the intermediate terms cancel out and their values are therefore completely inconsequential to the estimate of the mean. This result in turn implies that when estimating the mean, having a long time span of data is what matters: having daily versus weekly versus monthly data does not matter. The start and end points B9780123744487000051/si2.gif is missing and B9780123744487000051/si3.gif is missing will be the same irrespective of the sampling frequency of returns. This is frustrating when we want to get a precise estimate of the return mean. The only solution is to wait for time to pass.
Consider now instead estimating variance on a sample of daily returns. We have
B9780123744487000051/si4.gif is missing
Notice a crucial difference between the sample mean and sample variance estimators: The intermediate prices do not cancel out in the variance estimator. All the return observations now matter because they are squared before they are summed in the average.
Imagine now having price observations at the end of every hour instead of every day and imagine that the market for the asset at hand (for example an FX rate) is open 24 hours a day. Now we would have 24 ⋅ T observations to estimate σ2 and we would get a much more precise estimate than when using just the T daily returns.
The dramatic implication for risk management of this high-frequency sampling idea is that just as we can use 21 daily prices to estimate a monthly volatility we can also use 24 hourly observations to estimate a daily volatility. If we have observations every minute then an even more precise estimate of daily volatility can be had and we can virtually treat daily volatility as an observed variable.
This chapter explores in detail the use of intraday prices for computing daily volatility and for forecasting future volatility. We first introduce the key concept of realized variance (RV) and look at four stylized facts of RV. We then look at ways to forecast RV and ways to estimate RV, and we briefly look at some of the challenges of working with large and messy intraday data sets. Toward the end of the chapter we look at range-based proxies of daily volatility and also at volatility forecast evaluation using RV and range-based volatility. Range-based volatilities are much easier to construct than RVs but in highly liquid markets RV will be more precise.

2. Realized Variance: Four Stylized Facts

Assume for simplicity that we are monitoring an asset that trades 24 hours per day and that is extremely liquid so that bid-ask spreads are virtually zero and new information is reflected in the price immediately. More realistic situations will be treated later. In an extremely liquid market with rapidly changing prices observed every second we can comfortably construct a time grid, for example, of 1-minute prices from which we can compute 1-minute log returns.
Let m be the number of observations per day on an asset. If we have 24 hour trading and 1-minute observations, then m = 24 ⋅ 60 = 1,440. Let the jth observation on day t + 1 be denoted St+j/m. Then the closing price on day t + 1 is St+m/m = St+1, and the jth 1-minute return is
B9780123744487000051/si16.gif is missing
Having m observations available within a day, we can calculate an estimate of the daily variance from the intraday squared returns simply as
B9780123744487000051/si18.gif is missing
This is the definition of RV. Notice that unlike the previous chapters where we computed the sample variance from daily returns, we do not divide the sum of squared returns by m here. If we did we would get a 1-minute variance. Omitting the m gives us a total variance for the 24-hour period. Notice also that we do not subtract the mean of the 1-minute returns. The mean of 1-minute returns is so small that it will not materially impact the variance estimate.
The top panel of Figure 5.1 shows the time series of daily realized S&P 500 variance computed from intraday squared returns. The bottom panel shows the daily close-to-close squared returns S&P 500 as well. Notice how much more jagged and noisy the squared returns in the bottom panel are compared with the realized variances in the top panel. Figure 5.1 illustrates the first stylized fact of RV: RVs are much more precise indicators of daily variance than are daily squared returns.
B9780123744487000051/f05-01-9780123744487.jpg is missing
Figure 5.1
Realized variance (top) and squared returns (bottom) of the S&P 500. Notes: We use daily realized variance (top panel) and the daily close-to-close squared returns (bottom panel) as proxies for daily variance in the S&P 500 index.
The top panel of Figure 5.2 shows the autocorrelation function (ACF) of the S&P 500 RV series from Figure 5.1. The bottom panel shows the corresponding ACF computed from daily squared returns as in Chapter 4. Notice how much more striking the evidence of variance persistence is in the top panel. Figure 5.2 illustrates the second stylized fact of RV: RV is extremely persistent, which suggests that volatility may be forecastable at horizons beyond a few months as long as the information in intraday returns is used.
B9780123744487000051/f05-02-9780123744487.jpg is missing
Figure 5.2
Autocorrelation of realized variance (top) and autocorrelation of squared returns (bottom) with Bartlett confidence intervals (dashed). Notes: We compute autocorrelations from the daily realized variance computed using the average RV method (top panel) and the daily close-to-close squared returns (bottom panel) from the S&P 500 index.
The top panel of Figure 5.3 shows a histogram of the RVs from Figure 5.1. The bottom panel of Figure 5.3 shows the histogram of the natural logarithm of RV. Figure 5.3 shows that the logarithm of RV is very close to normally distributed whereas the level of RV is strongly positively skewed with a long right tail.
B9780123744487000051/f05-03-9780123744487.jpg is missing
Figure 5.3
Histogram of realized variance (top) and log realized variance (bottom). Notes: We plot histograms of the daily realized variance computed using the average RV method (top panel) and the daily close-to-close squared returns (bottom panel) from the S&P 500 index.
Given that RV is a sum of squared returns it is not surprising that RV is not close to normally distributed but it is interesting and useful that a simple logarithmic transformation results in a distribution that is somewhat close to normal. The approximate log normal property of RV is the third stylized fact. We can write
B9780123744487000051/si21.gif is missing
The fourth stylized fact of RV is that daily returns divided by the square root of RV is very close to following an i.i.d. (independently and identically distributed) standard normal distribution. We can write
B9780123744487000051/si22.gif is missing
Notice that because B9780123744487000051/si23.gif is missing can only be computed at the end of day t + 1 this result is not immediately useful for forecasting purposes.
The fourth stylized fact suggests that if a good forecast of B9780123744487000051/si25.gif is missing, call it B9780123744487000051/si26.gif is missing, can be made using information available at time t then a normal distribution assumption of B9780123744487000051/si28.gif is missing will be a decent first modeling strategy. Approximately
B9780123744487000051/si29.gif is missing
where we have now standardized the return with the RV forecast, which by construction is known in advance. In this chapter we will rely on this assumption of normality for the returns standardized by the RV forecast. In the next chapter we will allow for more general distributions.
Constructing a good forecast for B9780123744487000051/si30.gif is missing is the topic to which we now turn. When doing so we will need to keep in mind the four stylized facts of RV:
• RV is a more precise indicator of daily variance than is the daily squared return.
• RV has large positive autocorrelations for many lags.
• The log of RV is approximately normally distributed.
• The daily return divided by the square root of RV is close to i.i.d. standard normal.

3. Forecasting Realized Variance

Realized variances are very persistent and so the main task at hand is to consider forecasting models that allow for current RV to matter for future RV.

3.1. Simple ARMA Models of Realized Variance

In Chapter 3 we introduced the AR(1) model as a simple way to allow for persistence in a time series. If we treat the estimated B9780123744487000051/si31.gif is missing as an observed time series, then we can assume the AR(1) forecasting model
B9780123744487000051/si32.gif is missing
where B9780123744487000051/si33.gif is missing is assumed to be uncorrelated over time and have zero mean. The parameters ϕ0 and ϕ1 can easily be estimated using OLS. The one-day-ahead forecast of RV is then constructed as
B9780123744487000051/si36.gif is missing
We are just showing the AR(1) model as an example. AR(2) or higher ordered AR models could, of course, be used as well.
Given that we observed in Figure 5.3 that the log of RV is close to normally distributed we may be better off modeling the RV in logs rather than levels. We can therefore assume
B9780123744487000051/si37.gif is missing
The normal property of B9780123744487000051/si38.gif is missing will make the OLS estimates of ϕ0 and ϕ1 better behaved than those in the AR(1) model for B9780123744487000051/si41.gif is missing where the AR(1) errors, B9780123744487000051/si42.gif is missing, are likely to have fat tails, which in turn yield noisy parameter estimates.
Because we have estimated it from intraday squared returns, the B9780123744487000051/si43.gif is missing series is not truly an observed time series but it can be viewed as the true RV observed with a measurement error. If the true RV is AR(1) but we observed true RV plus an i.i.d. measurement error then an ARMA(1,1) model is likely to provide a good fit to the observed RV. We can write
B9780123744487000051/si44.gif is missing
which due to the MA term must be estimated using maximum likelihood techniques.
Notice that these simple models are specified in logarithms, while for risk management purposes we are ultimately interested in forecasting the level of variance. As the exponential function is not linear, we have in the log RV model that
B9780123744487000051/si45.gif is missing
and we therefore have to be careful when calculating the variance forecast.
From the assumption of normality of the error term we can use the result
B9780123744487000051/si46.gif is missing
In the AR(1) model the forecast for tomorrow is
B9780123744487000051/si47.gif is missing
and for the ARMA(1,1) model we get
B9780123744487000051/si48.gif is missing
More sophisticated models such as long-memory (or fractionally integrated) ARMA models can be used to model realized variance. These models may yield better longer horizon variance forecasts than the short-memory ARMA models considered here. As a simple but powerful way to allow for more persistence in the variance forecasting model we next consider the so-called heterogeneous AR models.

3.2. Heterogeneous Autoregressions (HAR)

The question arises whether we can parsimoniously (that is with only few parameters) and easily (that is using OLS) model the apparent long-memory features of realized volatility. The mixed-frequency or heterogeneous autoregression model (HAR) we now consider provides an affirmative answer to this question. Define the h-day RV from the 1-day RV as follows:
B9780123744487000051/si50.gif is missing
where dividing by h makes B9780123744487000051/si52.gif is missing interpretable as the average total variance starting with day B9780123744487000051/si53.gif is missing and through day t.
Given that economic activity is organized in days, weeks, and months, it is natural to consider forecasting tomorrow's RV using daily, weekly, and monthly RV defined by the simple moving averages
B9780123744487000051/si55.gif is missing
where we have assumed five trading days in a week and 21 trading days in a month. The simplest way to forecast RV with these variables is via the regression
B9780123744487000051/si56.gif is missing
which defines the HAR model. Notice that HAR can be estimated by OLS because all variables are observed and because the model is linear in the parameters.
The HAR will be able to capture long-memory-like dynamics because 21 lags of daily RV matter in this model. The model is parsimonious because the 21 lags of daily RV do not have 21 different autoregressive coefficients: The coefficients are restricted to be B9780123744487000051/si57.gif is missing on today's RV, B9780123744487000051/si58.gif is missing on the past four days of RV, and B9780123744487000051/si59.gif is missing on the RVs for days t − 20 through t − 5.
Given the log normal property of RV we can also consider HAR models of the log transformation of RV:
B9780123744487000051/si62.gif is missing
The advantage of this log specification is again that the parameters will be estimated more precisely when using OLS. Remember though that forecasting involves undoing the log transformation so that
B9780123744487000051/si63.gif is missing
Note that the HAR idea generalizes to longer-horizon forecasting. If for example we want to forecast RV over the next K days then we can estimate the model
B9780123744487000051/si65.gif is missing
where
B9780123744487000051/si66.gif is missing
and where we still rely on daily, weekly, and monthly RVs on the right-hand side of the HAR model. Figure 5.4 shows the forecast of 1-day, 5-day, and 10-day volatility using three different log HAR models corresponding to each horizon of interest.
B9780123744487000051/f05-04-9780123744487.jpg is missing
Figure 5.4
Forecast of daily (top), weekly (middle), and monthly (bottom) S&P 500 volatility using HAR model specified in logs. Notes: We use the HAR model estimated in logs to forecast the level of variance over the next day, week, and month in the S&P 500 index.
In Chapter 4 we saw the importance of including a leverage effect in the GARCH model capturing that volatility rises more on a large positive return than on a large negative return. The HAR model can capture this by simply including the return on the right-hand side. In the daily log HAR we can write
B9780123744487000051/si67.gif is missing
which can also easily be estimated using OLS. Notice that because the model is written in logs we do not have to worry about the variance forecast going negative;
B9780123744487000051/si68.gif is missing
will always be a positive number.
The stylized facts of RV suggested that we can assume that
B9780123744487000051/si69.gif is missing
If we use this assumption then from Chapter 1 we can compute Value-at-Risk by
B9780123744487000051/si70.gif is missing
where B9780123744487000051/si71.gif is missing is provided by either the ARMA or HAR forecasting models earlier. Expected Shortfall is also easily computed via
B9780123744487000051/si72.gif is missing
which follows from Chapter 2.

3.3. Combining GARCH and RV

So far in Chapters 4 and 5 we have considered two seemingly very different approaches to volatility modeling: In Chapter 4 GARCH models were estimated on daily returns, and in Chapter 5 time-series models of daily RV have been constructed from intraday returns. We can instead try to incorporate the rich information in RV into a GARCH modeling framework. Consider the basic GARCH model from Chapter 4:
B9780123744487000051/si73.gif is missing
Given the information on daily RV we could augment the GARCH model with RV as follows:
B9780123744487000051/si74.gif is missing
This so-called GARCH-X model where RV is the explanatory variable can be estimated using the univariate MLE approach taken in Chapter 4.
A shortcoming of the GARCH-X approach is that a model for RV is not specified. This means that we cannot use the model to forecast volatility beyond one day ahead.
The more general so-called Realized GARCH model is defined by
B9780123744487000051/si75.gif is missing
where εt is the innovation to RV. This model can be estimated by MLE when assuming that Rt and εt have a joint normal distribution. The Realized GARCH model can be augmented to include a leverage effect as well. In the Realized GARCH model the VaR and ES would simply be
B9780123744487000051/si79.gif is missing
and
B9780123744487000051/si80.gif is missing
as in the regular GARCH model.

4. Realized Variance Construction

So far we have assumed that a grid of highly liquid 1-minute prices are available so that the corresponding 1-minute log returns are informative about the true volatility of the asset price. However, once various forms of illiquidity in the asset price are considered it becomes clear that we need to be much more clever about constructing the RVs from the intraday returns. This section is devoted to the construction of unbiased daily RVs from intraday returns under realistic assumptions about market liquidity.

4.1. The All RV Estimator

Remember that in the ideal but unfortunately unrealistic case with ultra-high liquidity we have m = 24 ⋅ 60 observations available within a day, and we can calculate an estimate of the daily variance from the intraday squared returns simply as
B9780123744487000051/si82.gif is missing
This estimator is sometimes known as the All RV estimator because it uses all the prices on the 1-minute grid.
Figure 5.5 uses simulated data to illustrate one of the problems caused by illiquidity when estimating asset price volatility. We assume the fundamental (but unobserved) asset price, SFund, follows the simple random walk process with constant variance
B9780123744487000051/si84.gif is missing
where σe = 0.001 in Figure 5.5. The observed price fluctuates randomly around the bid and ask quotes that are posted by the market maker. We observe
B9780123744487000051/si86.gif is missing
where B9780123744487000051/si87.gif is missing is the bid price, which we take to be the fundamental price rounded down to the nearest $1/10, and B9780123744487000051/si88.gif is missing is the ask price, which is the fundamental price rounded up to the nearest $1/10. B9780123744487000051/si89.gif is missing is an i.i.d. random variable, which takes the values 1 and 0 each with probability 1/2. B9780123744487000051/si90.gif is missing is thus an indicator variable of whether the observed price is a bid or an ask price.
B9780123744487000051/f05-05-9780123744487.jpg is missing
Figure 5.5
Fundamental price and quoted price with bid-ask bounces. Notes: We simulate a random walk for the fundamental log asset price (black) and add random noise from bid-ask bounces to get the observed price (red).
The challenge is that we observe B9780123744487000051/si91.gif is missing but want to estimate σ2, which is the variance of the unobserved B9780123744487000051/si93.gif is missing. Figure 5.5 shows that the observed intraday price can be very noisy compared with the smooth fundamental but unobserved price. The bid-ask spread adds a layer of noise on top of the fundamental price. If we compute B9780123744487000051/si94.gif is missing from the high-frequency B9780123744487000051/si95.gif is missing then we will get an estimate of σ2 that is higher than the true value because of the inclusion of the bid-ask volatility in the estimate.

4.2. The Sparse RV Estimator

The perhaps simplest way to address the problem shown in Figure 5.5 is to construct a grid of intraday prices and returns that are sampled less frequently than the 1-minute assumed earlier. Instead of a 1-minute grid we could use an s-minute grid (where s ≥ 1) so that our new RV estimator would be
B9780123744487000051/si99.gif is missing
which is sometimes denoted as the Sparse RV estimator as opposed to the previous All RV estimator.
Of course the important question is how to choose the parameter s? Should s be 5 minutes, 10 minutes, 30 minutes, or an even lower frequency? The larger the s the less likely we are to get a biased estimate of volatility, but the larger the s the fewer observations we are using and so the more noisy our estimate will be. We are faced with a typical variance-bias trade-off.
The choice of s clearly depends on the specific asset. For very liquid assets we should use an s close to 1 and for illiquid assets s should be much larger. If liquidity effects manifest themselves as a bias in the estimated RVs when using a high sampling frequency then that bias should disappear when the sampling frequency is lowered; that is, when s is increased.
The so-called volatility signature plots provide a convenient graphical tool for choosing s: First compute B9780123744487000051/si109.gif is missing for values of s going from 1 to 120 minutes. Second, scatter plot the average RV across days on the vertical axis against s on the horizontal axis. Third, look for the smallest s such that the average RV does not change much for values of s larger than this number.
In markets with wide bid–ask spreads the average RV in the volatility signature plot will be downward sloping for small s but for larger s the average RV will stabilize at the true long run volatility level. We want to choose the smallest s for which the average RV is stable. This will avoid bias and minimize variance.
In markets where trading is thin, new information is only slowly incorporated into the price, and intraday returns will have positive autocorrelation resulting in an upward sloping volatility signature plot. In this case, the rule of thumb for computing RV is again to choose the smallest s for which the average RV has stabilized.

4.3. The Average RV Estimator

Choosing a lower (sparse) frequency for the grid of intraday prices can solve the bias problem arising from illiquidity but it will also increase the noise of the RV estimator. When we are using sparse sampling we are essentially throwing away information, which seems wasteful. It turns out that there is an amazingly simple way to lower the noise of the Sparse RV estimator without increasing the bias.
Let us say that we have used the volatility signature plot to chose s = 15 in the Sparse RV so that we are using a 15-minute grid for prices and squared returns to compute RV. Note that if we have the original 1-minute grid (of less liquid prices) then we can actually compute 15 different (but overlapping) Sparse RV estimators. The first Sparse RV will use a 15-minute grid starting with the 15-minute return at midnight, call it B9780123744487000051/si119.gif is missing; the second will also use a 15-minute grid but this one will be starting one minute past midnight, call it B9780123744487000051/si120.gif is missing, and so on until the 15th Sparse RV, which uses a 15-minute grid starting at 14 minutes past midnight, call it B9780123744487000051/si121.gif is missing. We are thus using the fine 1-minute grid to compute 15 Sparse RVs at the 15-minute frequency.
We have now used all the information on the 1-minute grid but we have used it to compute 15 different RV estimates, each based on 15-minute returns, and none of which are materially affected by illiquidity bias. By simply averaging the 15 sparse RVs we get the so-called Average RV estimator
B9780123744487000051/si122.gif is missing
In simulation studies and in practice this Average RV estimator has been found to perform very well. The RVs plotted in Figure 5.1 were computed using the Average RV estimator.

4.4. RV Estimators with Autocovariance Adjustments

Instead of using sparse sampling to avoid RV bias we can try to model and then correct for the autocorrelations in intraday returns that are driving the volatility bias.
Assume that the fundamental log price is observed with an additive i.i.d. error term, u, caused by illiquidity so that
B9780123744487000051/si124.gif is missing
In this case the observed log return will equal the true fundamental returns plus an MA(1) error:
B9780123744487000051/si224.gif is missing
Due to the MA(1) measurement error our simple squared return All RV estimate will be biased.
The All RV in this case is defined by
B9780123744487000051/si125.gif is missing
Because the measurement error u has positive variance the B9780123744487000051/si127.gif is missing estimator will be biased upward in this case.
If we are fairly confident that the measurement error is of the MA(1) form then we know (see Chapter 3) that only the first-order autocorrelations are nonzero and we can therefore easily correct the RV estimator as follows:
B9780123744487000051/si128.gif is missing
where we have added the cross products from the adjacent intraday returns. The negative autocorrelation arising from the bid–ask bounce in observed intraday returns will cause the last two terms in B9780123744487000051/si129.gif is missing to be negative and we will therefore get that
B9780123744487000051/si130.gif is missing
as desired.
Positive autocorrelation caused by slowly changing prices would be at least partly captured by the first-order autocorrelation as well. It would be positive in this case and we would have
B9780123744487000051/si131.gif is missing
Much more general estimators have been developed to correct for more complex autocorrelation patterns in intraday returns. References to this work will be listed at the end of the chapter.

5. Data Issues

So far we have assumed the availability of a 1-minute grid of prices in a 24-hour market. But in reality several challenges arise. First, prices and quotes arrive randomly in time and not on a neat, evenly spaced grid. Second, markets are typically not open 24 hours per day. Third, intraday data sets are large and messy and often include price and quote errors that must be flagged and removed before estimating volatility. We deal with these three issues in turn as we continue.

5.1. Dealing with Irregularly Spaced Intraday Prices

The preceding discussion has assumed that a sample of regularly spaced 1-minute intraday prices are available. In practice, transaction prices or quotes arrive in random ticks over time and the evenly spaced price grid must be constructed from the raw ticks.
One of the following two methods are commonly used.
The first and simplest solution is to use the last tick prior to a grid point as the price observation for that grid point. This way the last observed tick price in an interval is effectively moved forward in time to the next grid point. Specifically, assume we have N observed tick prices during day t + 1 but that these are observed at irregular times B9780123744487000051/si134.gif is missing, B9780123744487000051/si135.gif is missing,…,B9780123744487000051/si136.gif is missing Consider now the j th point on the evenly spaced grid of m points for day t + 1, which we have called B9780123744487000051/si140.gif is missing. Grid point B9780123744487000051/si141.gif is missing will fall between two adjacent randomly spaced ticks, say the i th and the B9780123744487000051/si143.gif is missing th; that is, we have B9780123744487000051/si144.gif is missing and in this case we choose the B9780123744487000051/si145.gif is missing price to be
B9780123744487000051/si146.gif is missing
The second and slightly less simple solution uses a linear interpolation between St(i) and St(i + 1) so what we have
B9780123744487000051/si149.gif is missing
While the linear interpolation method makes some intuitive sense it has poor limiting properties: The smoothing implicit in the linear interpolation makes the estimated RV go to zero in the limit. Therefore, using the most recent tick on each grid point has become standard practice.

5.2. Choosing the Frequency of the Fine Grid of Prices

Notice that we still have to choose the frequency of the fine grid. We have used 1-minute as an example but this number is clearly also asset dependent. An asset with N = 2,000 new quotes on average per day should have a finer grid than an asset with N = 50 new quotes on average per day.
We ought to have at least one quote per interval on the fine grid. So we should definitely have that m < N. However, the distribution of quotes is typically very uneven throughout the day and so setting m close to N is likely to yield many intervals without new quotes. We can capture the distribution of quotes across time on each day by computing the standard deviation of B9780123744487000051/si155.gif is missing across i on each day.
The total number of new quotes, N, will differ across days and so will the standard deviation of the quote time intervals. Looking at the descriptive statistics of N and the standard deviation of quote time intervals across days is likely to yield useful guidance on the choice of m.

5.3. Dealing with Data Gaps from Overnight Market Closures

In risk management we are typically interested in the volatility of 24-hour returns (the return from the close of day t to the close on day t + 1) even if the market is only open, say, 8 hours per day. In Chapter 4 we estimated GARCH models on daily returns from closing prices. The volatility forecasts from GARCH are therefore by construction 24-hour return volatilities.
If we care about 24-hour return volatility and we only have intraday returns from market open to market close, then the RV measure computed on intraday returns, call it B9780123744487000051/si162.gif is missing, must be adjusted for the return in the overnight gap from close on day t to open on day t + 1. There are three ways to make this adjustment.
First, we can simply scale up the market-open RV measure using the unconditional variance estimated from daily squared returns:
B9780123744487000051/si165.gif is missing
Second, we can add to B9780123744487000051/si166.gif is missing the squared return constructed from the close on day t to the open on day t + 1:
B9780123744487000051/si169.gif is missing
Notice that this sum puts equal weight on the two terms and thus a relatively high weight on the close-to-open gap for which little information is available. Note also that B9780123744487000051/si170.gif is missing is simply the daily price observation that we denoted St in the previous chapters.
A third, but more cumbersome approach is to find optimal weights for the two terms. This can be done by minimizing the variance of the B9780123744487000051/si172.gif is missing estimator subject to having a bias of zero.
When computing optimal weights typically a much larger weight is found for B9780123744487000051/si173.gif is missing than for B9780123744487000051/si174.gif is missing. This suggests that scaling up the B9780123744487000051/si175.gif is missing may be the better of the two first approaches to correcting for the overnight gap.

5.4. Alternative RV Estimators Using Tick-by-Tick Data

There is an alternative set of RV estimators that avoid the construction of a time grid altogether and instead work directly with the irregularly spaced tick-by-tick data. Let the i th tick return on day t + 1 be defined by
B9780123744487000051/si178.gif is missing
Then the tick-based RV estimator is defined by
B9780123744487000051/si179.gif is missing
Notice that tick-time sampling avoids sampling the same observation multiple times, which could happen on a fixed grid if the grid is too fine compared with the number of available intraday prices.
The preceding simple tick-based RV estimator can be extended by allowing for autocorrelation in the tick-time returns.
The optimality of grid-based versus tick-based RV estimators depends on the structure of the market for the asset and on its liquidity. The majority of academic research relies on grid-based RV estimators.

5.5. Price and Quote Data Errors

The construction of the intraday price grid is perhaps the most challenging task when estimating and forecasting volatility using realized variance. The raw intraday price data contains observations randomly spaced in time and the sheer volume of data can be enormous when investigating many assets over long time periods.
The construction of the grid of prices is complicated by the presence of data errors. A data error is broadly defined as a quoted price that does not conform to the real situation of the market. Price data errors could take several forms:
• Decimal errors; for example, when a bid price changes from 1.598 to 1.603 but a 1.503 is reported instead of 1.603.
• Test quotes: These are quotes sent by a contributor at early mornings or at other inactive times to test the system. They can be difficult to catch since the prices may look plausible.
• Repeated ticks: These are sent automatically by contributors. If sent frequently, then they can obstruct the filtering of a few informative quotes sent by other contributors.
• Tick copying: Contributors automatically copy and resend quotes of other contributors to show a strong presence in the market. Sometimes random error is added so as to hide the copying aspect.
• Scaling problems: The scale of the price of an asset may differ by contributor and it may change over time without notice.
Given the size of intraday data sets it is impossible to manually check for errors. Automated filters must be developed to catch errors of the type just listed. The challenges of filtering intraday data has created a new business for data vendors. OlsenData.com and TickData.com are examples of data vendors that sell filtered as well as raw intraday data.

6. Range-Based Volatility Modeling

The construction of daily realized volatilities relies on the availability of intraday prices on relatively liquid assets. For markets that are not liquid, or for assets where historical information on intraday prices is not available the intraday range presents a convenient alternative.
The intraday price range is based on the intraday high and intraday low price. Casual browsing of the web (see for example finance.yahoo.com) reveals that these intraday high and low prices are easily available for many assets far back in time. Range-based variance proxies are therefore easily computed.

6.1. Range-Based Proxies for Volatility

Let us define the range of the log prices to be
B9780123744487000051/si180.gif is missing
where B9780123744487000051/si181.gif is missing and B9780123744487000051/si182.gif is missing are the highest and lowest prices observed during day t.
We can show that if the log return on the asset is normally distributed with zero mean and variance, σ2, then the expected value of the squared range is
B9780123744487000051/si185.gif is missing
A natural range-based estimate of volatility is therefore
B9780123744487000051/si186.gif is missing
The range-based estimate of variance is simply a constant times the average squared range. The constant is B9780123744487000051/si187.gif is missing
The range-based estimate of unconditional variance suggests that a range proxy for the daily variance can be constructed as
B9780123744487000051/si188.gif is missing
The top panel of Figure 5.6 plots RPt for the S&P 500 data.
B9780123744487000051/f05-06-9780123744487.jpg is missing
Figure 5.6
Range-based variance proxy (top) and squared returns (bottom). Notes: We use the daily range proxy for variance computed from the intraday high and low prices (top panel) and the daily close-to-close squared returns (bottom panel).
Notice how much less noisy the range is than the daily squared returns that are shown in the bottom panel.
Figure 5.7 shows the autocorrelation of RPt in the top panel. The first-order autocorrelation in the range-based variance proxy is around 0.60 (top panel) whereas it is only half of that in the squared-return proxy (bottom panel). Furthermore, the range-based autocorrelations are much smoother and thus give a much more reliable picture of the persistence in variance than do the squared returns in the bottom panel.
B9780123744487000051/f05-07-9780123744487.jpg is missing
Figure 5.7
Autocorrelation of the range-based variance proxy (top) and autocorrelation of squared returns (bottom) with Bartlett standard errors (dashed). Notes: We compute autocorrelations from the daily range proxy for variance computed using the intraday high and low prices (top panel) and from the daily close-to-close squared returns (bottom panel) using the S&P 500 index.
This range-based volatility proxy does not make use of the daily open and close prices, which are also easily available and which also contain information about the 24-hour volatility. Assuming again that the asset log returns are normally distributed with zero mean and variance, σ2, then a more accurate range-based proxy can be derived as
B9780123744487000051/si192.gif is missing
In the more general case where the mean return is not assumed to be zero the following range-based volatility proxy is available:
B9780123744487000051/si225.gif is missing
All of these proxies are derived assuming that the true variance is constant, so that, for example, 30 days of high, low, open, and close information can be used to estimate the (constant) volatility for that period. We instead want to use the range-based proxies as input into a dynamic forecasting model for volatility in line with the GARCH models in Chapter 4 and the HAR models in this chapter.

6.2. Forecasting Volatility Using the Range

Perhaps the simplest approach to using RPt in a forecasting model is to use it in place of RV in the earlier AR and HAR models. Although RPt may be more noisy than RV, the HAR approach should yield good forecasting results because the HAR model structure imposes a lot of smoothing.
Several studies have found that the log range is close to normally distributed as follows:
B9780123744487000051/si195.gif is missing
Recall that RV in logs is also close to normally distributed as well as we saw in Figure 5.3.
The strong persistence of the range as well as the log normal property suggest a log HAR model of the form
B9780123744487000051/si196.gif is missing
where we have that
B9780123744487000051/si197.gif is missing
The range-based proxy can also be used as a regressor in GARCH-X models, for example
B9780123744487000051/si198.gif is missing
A purely range-based model can be defined as
B9780123744487000051/si199.gif is missing
Finally, a Realized-GARCH style model (let us call it Range-GARCH) can be defined via
B9780123744487000051/si200.gif is missing
The Range-GARCH model can be estimated using bivariate maximum likelihood techniques using historical data on return, Rt, and on the range proxy, RPt.
ES and VaR can be constructed in the RP-based models in the same way as in the RV-based models by assuming that zt + 1 is i.i.d. normal where B9780123744487000051/si204.gif is missing in the GARCH-style models or B9780123744487000051/si205.gif is missing in the HAR model.

6.3. Range-Based versus Realized Variance

There is convincing empirical evidence that for very liquid securities the RV modeling approach is useful for risk management purposes. The intuition is that using the intraday returns gives a very reliable estimate of today's variance, which in turn helps forecast tomorrow's variance. In standard GARCH models on the other hand, today's variance is implicitly calculated using exponentially declining weights on many past daily squared returns, where the exact weighting scheme depends on the estimated parameters. Thus the GARCH estimate of today's variance is heavily model dependent, whereas the realized variance for today is calculated exclusively from today's squared intraday returns. When forecasting the future, knowing where you are today is key. Unfortunately in variance forecasting, knowing where you are today is not a trivial matter since variance is not directly observable.
While the realized variance approach has clear advantages it also has certain shortcomings. First of all it clearly requires high-quality intraday returns to be feasible. Second, it is very easy to calculate daily realized volatilities from 5-minute returns, but it is not at all a trivial matter to construct at 10-year data set of 5-minute returns.
Figure 5.5 illustrates that the observed intraday price can be quite noisy compared with the fundamental but unobserved price. Therefore, realized variance measures based on intraday returns can be noisy as well. This is especially true for securities with wide bid–ask spreads and infrequent trading. Notice on the other hand that the range-based variance measure discussed earlier is relatively immune to the market microstructure noise. The true maximum can easily be calculated as the observed maximum less one half of the bid–ask spread, and the true minimum as the observed minimum plus one half of the bid–ask price. The range-based variance measure thus has clear advantages in less liquid markets.
In the absence of trading imperfections, however, range-based variance proxies can be shown to be only about as useful as 4-hour intraday returns. Furthermore, as we shall see in Chapter 7, the idea of realized variance extends directly to realized covariance and correlation, whereas the range-based covariance and correlation measures are less obvious.

7. GARCH Variance Forecast Evaluation Revisited

In the previous chapter we briefly introduced regressions using daily squared returns to evaluate the GARCH model forecasts. But we quickly argued that daily returns are too noisy to proxy for observed daily variance. In this chapter we have developed more informative proxies based on RV and RP and they should clearly be useful for variance forecast evaluation.
The realized variance measure can be used instead of the squared return for evaluating the forecasts from variance models. If only squared returns are available then we can run the regression
B9780123744487000051/si206.gif is missing
where B9780123744487000051/si207.gif is missing is the forecast from the GARCH model.
If we have RV-based estimates available then we would instead run the regression
B9780123744487000051/si208.gif is missing
where we have used the Average RV estimator as an example.
The range-based proxy could of course also be used instead of the squared return for evaluating the forecasts from variance models. Thus we could run the regression
B9780123744487000051/si209.gif is missing
where B9780123744487000051/si210.gif is missing can be constructed for example using
B9780123744487000051/si211.gif is missing
Using B9780123744487000051/si212.gif is missing on the left-hand side of these regressions is likely to yield the finding that the volatility forecast is poor. The fit of the regression will be low but notice that this does not necessarily mean that the volatility forecast is poor. It could also mean that the volatility proxy is poor. If regressions using B9780123744487000051/si213.gif is missing or B9780123744487000051/si214.gif is missing yield a much better fit than the regression using B9780123744487000051/si215.gif is missing then the volatility forecast is much better than suggested by the noisy squared-return proxy.

8. Summary

Realized volatility and range-based volatility are likely to be much more informative about daily volatility than is the daily squared return. This fact has important implications for the evaluation of volatility forecasts but it has even more important implications for volatility forecast construction. If intraday information is available then it should be used to construct more accurate volatility forecasts than those that can be constructed from daily returns alone. This chapter has introduced a number of practical approaches to volatility forecasting using intraday information.

Further Resources

The classic references on realized volatility include Andersen et al., 2001 and Andersen et al., 2003 and Barndorff-Nielsen and Shephard (2002). See the survey in Andersen et al. (2010) for a thorough literature review.
The HAR model for RV was developed in Corsi (2009) and has been used in Andersen et al. (2007b) among others. Engle (2002) suggested RV in the GARCH-X model and the Realized GARCH model was developed in Hansen et al. (2011). See also the HEAVY model in Shephard and Sheppard (2010).
The crucial impact on RV of liquidity and market microstructure effects more generally has been investigated in Andersen et al. (2011), Bandi and Russell (2006) and Ait-Sahalia and Mancini (2008).
The choice of sampling frequency has been analyzed by Ait-Sahalia et al. (2005) and Bandi and Russell (2008). The volatility signature plot was suggested in Andersen et al. (1999). The Average RV estimator is discussed in Zhang et al. (2005). The RV estimates corrected for return autocorrelations were developed by Zhou (1996), Barndorff-Nielsen et al. (2008) and Hansen and Lunde (2006).
The use of RV in volatility forecast evaluation was pioneered by Andersen and Bollerslev (1998). See also Andersen et al., 2004 and Andersen et al., 2005 and Patton (2011).
The use of RV in risk management is discussed in Andersen et al. (2007a) and the use of RV in portfolio allocation is developed in Bandi et al. (2008) and Fleming et al. (2003).
For treating overnight gaps see Hansen and Lunde (2005) and for data issues in RV construction see Brownlees and Gallo (2006), Muller (2001) and Dacorogna et al. (2001).
Range-based estimates variance models are introduced in Parkinson (1980) and Garman and Klass (1980) and more recent contributions include Rogers and Satchell (1991) and Yang and Zhang (2000). Range-based models of dynamic variance are developed in Azalideh et al. (2002), Brandt and Jones (2006) and Chou (2005) and they are surveyed in Chou et al. (2009). Brandt and Jones (2006) use the range rather than the squared return as the fundamental innovation in an EGARCH model and find that the range improves the model's variance forecasts significantly.
References
Ait-Sahalia, Y.; Mancini, L., Out of sample forecasts of quadratic variation, J. Econom. 147 (2008) 1733.
Ait-Sahalia, Y.; Mykland, P.A.; Zhang, L., How often to sample a continuous-time process in the presence of market microstructure noise, Rev. Financ. Stud. 18 (2005) 351416.
Andersen, T.G.; Bollerslev, T., Answering the skeptics: Yes, standard volatility models do provide accurate forecasts, Int. Econ. Rev. 39 (1998) 885905.
Andersen, T.; Bollerslev, T.; Christoffersen, P.; Diebold, F.X., Practical volatility and correlation modeling for financial market risk management, In: (Editors: Carey, M.; Stulz, R.) The NBER Volume on Risks of Financial Institutions (2007) University of Chicago Press, Chicago, IL.
Andersen, T.G.; Bollerslev, T.; Diebold, F.X., Roughing it up: Including jump components in the measurement, modeling and forecasting of return volatility, Rev. Econ. Stat. 89 (2007) 701720.
Andersen, T.G.; Bollerslev, T.; Diebold, F.X., Parametric and nonparametric measurements of volatility, In: (Editors: Ait-Sahalia, Y.; Hansen, L.P.) Handbook of Financial Econometrics (2010) North-Holland, Amsterdam, The Netherlands, pp. 67138.
Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Labys, P., (Understanding, optimizing, using and forecasting) Realized volatility and correlation, published in revised form as “Great Realizations, .” Risk March 2000 (1999) 105108.
Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Labys, P., The distribution of exchange rate volatility, J. Am. Stat. Assoc. 96 (2001) 4255.
Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Labys, P., Modeling and forecasting realized volatility, Econometrica 71 (2003) 579625.
Andersen, T.G.; Bollerslev, T.; Meddahi, N., Analytic evaluation of volatility forecasts, Int. Econ. Rev. 45 (2004) 10791110.
Andersen, T.G.; Bollerslev, T.; Meddahi, N., Correcting the errors: Volatility forecast evaluation using high-frequency data and realized volatilities, Econometrica 73 (2005) 279296.
Andersen, T.; Bollerslev, T.; Meddahi, N., Realized volatility forecasting and market microstructure noise, J. Econom. 160 (2011) 220234.
Azalideh, S.; Brandt, M.; Diebold, F.X., Range-based estimation of stochastic volatility models, J. Finance 57 (2002) 10471091.
Bandi, F.; Russell, J., Separating microstructure noise from volatility, J. Financ. Econ. 79 (2006) 655692.
Bandi, F.; Russell, J., Microstructure noise, realized volatility, and optimal sampling, Rev. Econ. Stud. 75 (2008) 339369.
Bandi, F.; Russell, J.; Zhu, Y., Using high-frequency data in dynamic portfolio choice, Econom. Rev. 27 (2008) 163198.
Barndorff-Nielsen, O.E.; Hansen, P.; Lunde, A.; Shephard, N., Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise, Econometrica 76 (2008) 14811536.
Barndorff-Nielsen, O.E.; Shephard, N., Econometric analysis of realised volatility and its use in estimating stochastic volatility models, J. Royal Stat. Soc. B 64 (2002) 253280.
Brandt, M.; Jones, C., Volatility forecasting with range-based EGARCH models, J. Bus. Econ. Stat. 24 (2006) 470486.
Brownlees, C.; Gallo, G.M., Financial econometric analysis at ultra-high frequency: Data handling concerns, Comput. Stat. Data Anal. 51 (2006) 22322245.
Chou, R., Forecasting financial volatilities with extreme values: The conditional autoregressive range, J. Money Credit Bank. 37 (2005); 561–82.
Chou, R.; Chou, H.-C.; Liu, N., Range volatility models and their applications in finance, In: (Editors: Lee, C.-F.; Lee, J.) Handbook of Quantitative Finance and Risk Management (2009) Springer, New York, NY, pp. 12731281.
Corsi, F., A simple approximate long memory model of realized volatility, J. Financ. Econom. 7 (2009) 174196.
Dacorogna, M.; Gencay, R.; Muller, U.; Olsen, R.; Pictet, O., An Introduction to High-Frequency Finance. (2001) Academic Press, San Diego, CA.
Engle, R., New frontiers in ARCH models, J. Appl. Econom. 17 (2002) 425446.
Fleming, J.; Kirby, C.; Oestdiek, B., The economic value of volatility timing using ‘realized’ volatility, with Jeff Fleming and Chris Kirby, J. Financ. Econ. 67 (2003) 473509.
Garman, M.; Klass, M., On the estimation of securities price volatilities from historical data, J. Bus. 53 (1980) 6778.
Hansen, P.; Huang, Z.; Shek, H., Realized GARCH: A joint model for returns and realized measures of volatility, J. Appl. Econom. (2011); forthcoming.
Hansen, P.; Lunde, A., A realized variance for the whole day based on intermittent high-frequency data, J. Financ. Econom. 3 (2005) 525554.
Hansen, P.; Lunde, A., Realized variance and market microstructure noise, J. Bus. Econ. Stat. 24 (2006) 127161.
Koopman, S.J.; Jungbacker, B.; Hol, E., Forecasting daily variability of the S&P 100 stock index using historical, realized and implied volatility measures, J. Empir. Finance 12 (2005) 445475.
Maheu, J.; McCurdy, T., Do high-frequency measures of volatility improve forecasts of return distributions?J. Econom. 160 (2011) 6976.
Martens, M., Measuring and forecasting S&P 500 index futures volatility using high-frequency data, J. Futures Mark. 22 (2002) 497518.
Muller, U., The Olsen filter for data in finance. (2001) Working paper, O&A Research Group; Available from: http://www.olsendata.com.
Parkinson, M., The extreme value method for estimating the variance of the rate of return, J. Bus. 53 (1980) 6165.
Patton, A., Volatility forecast comparison using imperfect volatility proxies, J. Econom. 160 (2011) 246256.
Pong, S.; Shackleton, M.B.; Taylor, S.J.; Xu, X., Forecasting currency volatility: A comparison of implied volatilities and AR(FI)MA models, J. Bank. Finance 28 (2004) 25412563.
Rogers, L.; Satchell, S., Estimating variance from high, low and closing prices, Ann. Appl. Probab. 1 (1991) 504512.
Shephard, N.; Sheppard, K., Realizing the future: Forecasting with high-frequency-based volatility (HEAVY) models, J. Appl. Econom. 25 (2010) 197231.
Thomakos, D.D.; Wang, T., Realized volatility in the futures market, J. Empir. Finance 10 (2003) 321353.
Yang, D.; Zhang, Q., Drift-independent volatility estimation based on high, low, open, and close prices, J. Bus. 73 (2000) 477491.
Zhang, L.; Mykland, P.A.; Ait-Sahalia, Y., A tale of two time scales: Determining integrated volatility with noisy high-frequency data, J. Am. Stat. Assoc. 100 (2005) 13941411.
Zhou, B., High-frequency data and volatility in foreign exchange rates, J. Bus. Econ. Stat. 14 (1996) 4552.
Open the Chapter5 Data.xlsx file from the web site.
1. Run a regression of daily squared returns on the variance forecast from the GARCH model with a leverage term from Chapter 4. Include a constant term in the regression
B9780123744487000051/si216.gif is missing
(Excel hint: Use the function LINEST.) What is the fit of the regression as measured by the B9780123744487000051/si217.gif is missing? Is the constant term significantly different from zero? Is the coefficient on the forecast significantly different from one?
2. Run a regression using RP instead of the squared returns as proxies for observed variance; that is, regress
B9780123744487000051/si219.gif is missing
Is the constant term significantly different from zero? Is the coefficient on the forecast significantly different from one? What is the fit of the regression as measured by the R2? Compare your answer with the R2 from exercise 1.
3. Run a regression using RV instead of the squared returns as proxies for observed variance; that is, regress
B9780123744487000051/si223.gif is missing
Is the constant term significantly different from zero? Is the coefficient on the forecast significantly different from one? What is the fit of the regression as measured by the R2? Compare your answer with the R2 from 1. and 2..
4. Estimate a HAR model in logarithms on the RP data you constructed in exercise 2. Use the next day's RP on the left-hand side and use daily, weekly, and monthly regressors on the right-hand side. Compute the regression fit.
5. Estimate a HAR model in logarithms on the RV data. Use the next day's RV on the left-hand side and use daily, weekly, and monthly regressors on the right-hand side. Compare the regression fit from this equation with that from exercise 4.
The answers to these exercises can be found in the Chapter5 Results.xls file, which can be found on the companion site.
For more information see the companion site at http://www.elsevierdirect.com/companions/9780123744487
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset