CHAPTER 4
Statistical Foundations

This chapter provides foundational material regarding statistical methods for the study of alternative investments in general and for the subsequent material in this book in particular. The use of statistics in performing hypothesis tests is addressed in detail in Chapter 8.

4.1 Return Distributions

Risky assets experience unexpected value changes and therefore unexpected returns. If we assume that investors are rational, the more competitively traded an asset, the more these unexpected price changes may be random and unpredictable. Hence, asset prices and asset returns in competitively traded markets are typically modeled as random variables. Frequency and probability distributions therefore provide starting points for describing asset returns.

4.1.1 Ex Ante and Ex Post Return Distributions

Ex post returns are realized outcomes rather than anticipated outcomes. Future possible returns and their probabilities are referred to as expectational or ex ante returns. A crucial theme in understanding the analysis of alternative investments is to understand the differences and links between ex post and ex ante return data.

Often, predictions are formed partially or fully through analysis of ex post data. For example, the ex ante or future return distribution of a stock index such as the S&P 500 Index is often assumed to be well approximated by the ex post or historical return distribution. The direct use of past return behavior as a predictor of future potential return behavior requires two properties to be accurate. First, the return distribution must be stationary through time, meaning that the expected return and the dispersion of the underlying asset do not change. Second, the sample of past observations must be sufficiently large to be likely to form a reasonably accurate representation of the process. For example, equity returns were very high during the bull market decade of the 1990s and very low during the early years of the financial crisis (2007–08). Using either of these time periods in isolation would likely overstate or understate the realistic long-run equity market returns.

Taken together, the requirements for the past returns to be representative of the future returns raise a serious challenge. If the past observation period is long, the sample of historical returns will be large; however, it is likely that the oldest observations reflect different risks or other economic conditions than can be anticipated in the future. If the sample is limited to the most recent observations, the data may be more representative of future economic conditions, but the sample may be too small to draw accurate inferences from it.

For a traditional asset, such as the common stock of a large, publicly traded corporation, it may be somewhat plausible that the asset's past behavior is a reasonable indication of its future behavior. However, many alternative investments are especially problematic in this context. For example, historical data may not exist for venture capital investment in new firms or may be difficult to observe or to obtain in cases such as private equity, where most or all trades are not publicly observable. Especially in alternative investments such as hedge funds, return distributions are expected to change as the fund's investment strategies and use of leverage change through time. In these cases and many others, ex ante return distributions may need to be based on economic analysis and modeling rather than simply projected from ex post data.

Nevertheless, whether based on prior observations or on economic analysis, the return distribution is a central tool for understanding the characteristics of an investment. The normal distribution is the starting point for most statistical applications in investments.

4.1.2 The Normal Distribution

The normal distribution is the familiar bell-shaped distribution, also known as the Gaussian distribution. The normal distribution is symmetric, meaning that the left and right sides are mirror images of each other. Also, the normal distribution clusters or peaks near the center, with decreasing probabilities of extreme events.

Why is the normal distribution so central to statistical analysis in general and the analysis of investment returns in particular? One reason is empirical: The normal distribution tends to approximate many distributions observed in nature or generated as the result of human actions and interactions, including financial return distributions. Another reason is theoretical: The more a variable's change results from the summation of a large number of independent causes, the more that variable tends to behave like a normally distributed variable. Thus, the more competitively traded an asset's price is, the more we would expect that the price change over a small unit of time would be the result of hundreds or thousands of independent financial events and/or trading decisions. Therefore, the probability distribution of the resulting price change should resemble the normal distribution. The formal statistical explanation for the idea that a variable will tend toward a normal distribution as the number of independent influences becomes larger is known as the central limit theorem. Practically speaking, the normal distribution is relatively easy to use, which may explain some of its popularity.

4.1.3 Log Returns and the Lognormal Distribution

For simplicity, funds often report returns based on discrete compounding. However, log returns offer a distinct advantage, especially for modeling a return probability distribution. In a nutshell, the use of log returns allows for the modeling of different time intervals in a manner that is simple and internally consistent. Specifically, if daily log returns are normally distributed and independent through time, then the log returns of other time intervals, such as months and years, will also be normally distributed. The same cannot be said of simple returns. Let's take a closer look at why log returns have this property.

The normal distribution replicates when variables are added but not when they are multiplied. This means that if two variables, x and y, are normally distributed, then the sum of the two variables, x + y, will also be normally distributed. But because the normal distribution does not replicate multiplicatively, x × y would not be normally distributed. Aggregation of discretely compounded returns is multiplicative. Thus, if R1, R2, and R3 represent the returns for months 1, 2, and 3 using discrete compounding, then the product [(1 + R1)(1 + R2)(1 + R3)] − 1 represents the return for the calendar quarter that contains the three months. If the monthly returns are normally distributed, then the quarterly return is not normally distributed, and vice versa, since the normal distribution does not replicate multiplicatively. Therefore, modeling the distribution of discretely compounded returns as being normally distributed over a particular time interval (e.g., monthly) technically means that the model will not be valid for any other choice of time interval (e.g., daily, weekly, annually).

However, the use of log returns, discussed in Chapter 3, solves this problem. If Rm = ∞1, Rm = ∞2, and Rm = ∞3 are monthly log returns, then the quarterly log return is simply the sum of the three monthly log returns. The normal distribution replicates additively; thus, if the log returns over one time interval can be modeled as being normally distributed, then the log returns over all time intervals will be lognormal as long as they are statistically independent through time.

Further, log returns have another highly desirable property. The highest possible simple (non-annualized) return is theoretically + ∞, while the lowest possible simple return for a cash investment is a loss of −100%, which occurs if the investment becomes worthless. However, the normal distribution spans from − ∞ to + ∞, meaning that simple returns, theoretically speaking, cannot truly be normally distributed; a simple return of −200% is not possible. Thus, the normal distribution may be a poor approximation of the actual probability distribution of simple returns. However, log returns, like the normal distribution itself, can span from − ∞ to + ∞.

There are two equivalent approaches to model returns that address these problems: (1) use log returns and assume that they are normally distributed, or (2) add 1 to the simple returns and assume that it has a lognormal distribution. A variable has a lognormal distribution if the distribution of the logarithm of the variable is normally distributed. The two approaches are identical, since the lognormal distribution assumes that the logarithms of the specified variable (in this case, 1 + R) are normally distributed.

In summary, it is possible for returns to be normally distributed over a variety of time intervals if those returns are expressed as log returns (and are independent through time). If the log returns are normally distributed, then the simple returns (in the form 1 + R) are said to be lognormally distributed. However, if discretely compounded returns (R) are assumed to be normally distributed, they can only be normally distributed over one time interval, such as daily, since returns computed over other time intervals would not be normally distributed due to compounding.

4.2 Moments of the Distribution: Mean, Variance, Skewness, and Kurtosis

Random variables, such as an asset's return or the timing of uncertain cash flows, can be viewed as forming a probability distribution. Probability distributions have an infinite number of possible shapes, only some of which represent well-known shapes, such as a normal distribution.

The moments of a return distribution are measures that describe the shape of a distribution. As an analogy, in mathematics, researchers often use various parameters to describe the shape of a function, such as its intercept, its slope, and its curvature. Statisticians often use either the raw moments or the central moments of a distribution to describe its shape. Generally, the first four moments are referred to as mean, variance, skewness, and kurtosis. The formulas of these four moments are somewhat similar, differing primarily by the power to which the observations are raised: mean uses the first power, variance squares the terms, skewness cubes the terms, and kurtosis raises the terms to the fourth power.

4.2.1 The Formulas of the First Four Raw Moments

Statistical moments can be raw moments or central moments. Further, the moments are sometimes standardized or scaled to provide more intuitive measures, as will be discussed later. We begin with raw moments, discussing the raw moments of an investment's return, R. Raw moments have the simplest formulas, wherein each moment is simply the expected value of the variable raised to a particular power:

(4.1) numbered Display Equation

The most common raw moment is the first raw moment and is known as the mean, or expected value, and is an indication of the central tendency of the variable. With n = 1, Equation 4.1 is the formula for expected value:

(4.2) numbered Display Equation

The expected value of a variable is the probability weighted average of its outcomes:

(4.3) numbered Display Equation

where probi is the probability of Ri.

Equation 4.3 expresses the first raw moment in terms of probabilities and outcomes. Using historical data, for a sample distribution of n observations, the mean is typically equally weighted and is estimated by the following:

(4.4) numbered Display Equation

Thus, Equation 4.4 is a formula for estimating Equation 4.2 using historical observations. The historical mean is often used as an estimate of the expected value when observations from the past are assumed to be representative of the future. Other raw moments can be generated by inserting a higher integer value for n in Equation 4.1. But the raw moments for n > 1 are less useful for our purposes than the highly related central moments.

4.2.2 The Formulas of Central Moments

Central moments differ from raw moments because they focus on deviations of the variable from its mean (whereas raw moments are measured relative to zero). Deviations are defined as the value of a variable minus its mean, or expected value. If an observation exceeds its expected value, the deviation is positive by the distance by which it exceeds the expected value. If the observation is less than its expected value, the deviation is a negative number. Each central moment applies the following equation to the deviations:

where μ = the expected value of R.

The term inside the parentheses is the deviation of R from its mean, or expected value. The first central moment is equal to zero by definition, because the expected value of the deviation from the mean is zero. When analysts discuss statistical moments, it is usually understood that the first moment is a raw moment, meaning the mean, or expected value. But the second through fourth moments are usually automatically expressed as central moments because in most applications the moments are more useful when expressed in terms of deviations.

The variance is the second central moment and is the expected value of the deviations squared, providing an indication of the dispersion of a variable around its mean:

(4.6) numbered Display Equation

The variance is the probability weighted average of the deviations squared. By squaring the deviations, any negative signs are removed (i.e., any negative deviation squared is positive), so the variance [V(R)] becomes a measure of dispersion. In the case of probability weighted outcomes, this can be written as:

The variance shown in Equation 4.7 is often estimated with a sample of historical data. For a sample distribution, the variance with equally weighted observations is estimated as:

The mean in Equation 4.8, , is usually estimated using the same sample. The use of n − 1 in the equation (rather than n) enables a more accurate measure of the variance when the estimate of the expected value of the variable has been computed from the same sample. The square root of the variance is an extremely popular and useful measure of dispersion known as the standard deviation:

(4.9) numbered Display Equation

In investment terminology, volatility is a popular term that is used synonymously with the standard deviation of returns. Other central moments can be generated by inserting a higher integer value for n in Equation 4.5. But the central moments for n = 3 (skewness) and n = 4 (kurtosis) are typically less intuitive and less well-known than their scaled versions. In other words, rather than using the third and fourth central moments, slightly modified formulas are used to generate scaled measures of skewness and kurtosis. These two scaled measures are detailed in the next two sections.

4.2.3 Skewness

The third central moment is the expected value of a variable's cubed deviations:

(4.10) numbered Display Equation

A problem with the third central moment is that it is generally affected by the scale. Thus, a distribution's third central moment for a variable measured in daily returns differs dramatically if the daily returns are expressed as annualized returns. To provide this measure with a more intuitive scale, investment analysts typically use the standardized third moment (the relative skewness or simply the skewness). The skewness is equal to the third central moment divided by the standard deviation of the variable cubed and serves as a measure of asymmetry:

Skewness is dimensionless, since changes in the scale of the returns affect the numerator and denominator proportionately, leaving the fraction unchanged. By cubing the deviations, the sign of each deviation is retained because a negative value cubed remains negative. Further, cubing the deviations provides a measure of the direction in which the largest deviations occur, since the cubing causes large deviations to be much more influential than the smaller deviations. The result is that the measure of skewness in Equation 4.11 provides a numerical measure of the extent to which a distribution flares out in one direction or the other. A positive value indicates that the right tail is larger (the mass of the distribution is concentrated on the left side), and a negative value indicates that the left tail is larger (the mass of the distribution is concentrated on the right side). A skewness of zero can result from a symmetrical distribution, such as the normal distribution, or from any other distribution in which the tails otherwise balance out within the equation. The top illustration of Exhibit 4.1 depicts negatively skewed, symmetric, and positively skewed distributions.

images

Exhibit 4.1 Skewness and Kurtosis

4.2.4 Excess Kurtosis

The fourth central moment is the expected value of a variable's deviations raised to the fourth power:

(4.12) numbered Display Equation

As with the third central moment, a problem with the fourth central moment is that it is difficult to interpret its magnitude. To provide this measure with a more intuitive scale, investment analysts do two things. First, they divide the moment by the standard deviation of the variable raised to the fourth power (to make it dimensionless):

The resulting measure, known as kurtosis, is shown in Equation 4.13 and serves as an indicator of the peaks and tails of a distribution. In the case of a normally distributed variable, the estimated kurtosis has a value that approaches 3.0 (as the sample size is increased). The second adjustment that analysts often perform to create a more intuitive measure of kurtosis is to subtract 3.0 from the result to derive a measure, known as excess kurtosis. Excess kurtosis provides a more intuitive measure of kurtosis relative to the normal distribution because it has a value of zero in the case of the normal distribution:

(4.14) numbered Display Equation

Since 3.0 is the kurtosis of a normally distributed variable, after subtracting 3.0 from the kurtosis, a positive excess kurtosis signals a level of kurtosis that is higher than observed in a normally distributed variable, an excess kurtosis of 0.0 indicates a level of kurtosis similar to that of a normally distributed variable, and a negative excess kurtosis signals a level of kurtosis that is lower than that observed in a normally distributed variable.

Kurtosis is typically viewed as capturing the fatness of the tails of a distribution, with high values of kurtosis (or positive values of excess kurtosis) indicating fatter tails (i.e., higher probabilities of extreme outcomes) than are found in the case of a normally distributed variable. Kurtosis can also be viewed as indicating the peakedness of a distribution, with a sharp, narrow peak in the center being associated with high values of kurtosis (or positive values of excess kurtosis).

In summary, the mean, variance, skewness, and kurtosis of a return distribution indicate the location and shape of a distribution, and are often a key part of measuring and communicating the risks and rewards of various investments. Familiarity with each can be a critical component of a high-level understanding of the analysis of alternative investments.

4.2.5 Platykurtosis, Mesokurtosis, and Leptokurtosis

The level of kurtosis is sufficiently important in analyzing alternative investment returns that the statistical descriptions of the degree of kurtosis and the related terminology have become industry standards. If a return distribution has no excess kurtosis, meaning it has the same kurtosis as the normal distribution, it is said to be mesokurtic, mesokurtotic, or normal tailed, and to exhibit mesokurtosis. The tails of the distribution and the peakedness of the distribution would have the same magnitude as the normal distribution.

The middle illustration in Exhibit 4.1 depicts that kurtosis can be viewed by the fatness of the tails of a distribution. If a return distribution has negative excess kurtosis, meaning less kurtosis than the normal distribution, it is said to be platykurtic, platykurtotic, or thin tailed, and to exhibit platykurtosis. If a return distribution has positive excess kurtosis, meaning it has more kurtosis than the normal distribution, it is said to be leptokurtic, leptokurtotic, or fat tailed, and to exhibit leptokurtosis.

The bottom illustration in Exhibit 4.1 depicts leptokurtic, mesokurtic, and platykurtic distributions. A leptokurtic distribution (positive excess kurtosis) with fat tails and a peaked center is illustrated on the left. A platykurtic distribution (negative excess kurtosis) with thin tails and a rounded center is illustrated on the right. In the middle is a normal mesokurtic distribution (no excess kurtosis). The key to recognizing excess kurtosis visually is comparing the thickness of the tails of both sides of the distribution relative to the tails of a normal distribution.

4.3 Covariance, Correlation, Beta, and Autocorrelation

An important aspect of a return is the way that it correlates with other returns. This is because correlation affects diversification, and diversification drives the risk of a portfolio of assets relative to the risks of the portfolio's constituent assets. This section begins with an examination of covariance, then details the correlation coefficient. Much as standard deviation provides a more easily interpreted alternative to variance, the correlation coefficient provides a scaled and intuitive alternative to covariance. Finally, the section discusses the concepts of beta and autocorrelation.

4.3.1 Covariance

The covariance of the return of two assets is a measure of the degree or tendency of two variables to move in relationship with each other. If two assets tend to move in the same direction, they are said to covary positively, and they will have a positive covariance. If the two assets tend to move in opposite directions, they are said to covary negatively, and they will have a negative covariance. Finally, if the two assets move independently of each other, their covariance will be zero. Thus, covariance is a statistical measure of the extent to which two variables move together. The formula for covariance is similar to that for variance, except that instead of squaring the deviations of one variable, such as the returns of fund i, the formula cross multiplies the contemporaneous deviations of two different variables, such as the returns of funds i and j:

where Ri is the return of fund i, μi is the expected value or mean of Ri, Rj is the return of fund j, and μj is the expected value or mean of Rj.

The covariance is the expected value of the product of the deviations of the returns of the two funds. Covariance can be estimated from a sample using Equation 4.16:

where Rit is the return of fund i in time t, and i is the sample mean return of Rit, and analogously for fund j. T is the number of time periods observed.

The estimation of the covariance for a sample of returns from a market index fund and a real estate fund is shown in Exhibit 4.2. Column 8 multiplies the fund's deviation from its mean return by the index's deviation from its mean return. Each of the products of the deviations is then summed and divided by n − 1, where n is the number of observations. The result is the estimated covariance between the returns over the sample period, shown near the bottom right-hand corner of Exhibit 4.2.

Exhibit 4.2 Covariance, Correlation, and Beta

Market Index RE Fund
(1) (2) (3) (4) (5) (6) (7) (8)
Month Return Deviation Dev2 Return Deviation Dev2 Cross
1 −0.060 −0.062 0.004 −0.008 −0.018 0.000 0.001
2 −0.032 −0.034 0.001 −0.032 −0.042 0.002 0.001
3 −0.004 −0.006 0.000 0.065 0.055 0.003 0.000
. . . . . . . .
. . . . . . . .
. . . . . . . .
37 0.024 0.022 0.000 0.033 0.023 0.001 0.000
38 0.034 0.032 0.001 0.047 0.037 0.001 0.001
39 0.000 −0.001 0.000 −0.016 −0.026 0.001 0.000
40 0.030 0.028 0.001 0.057 0.047 0.002 0.001
Sum 0.075 0.000 0.146 0.402 0.000 0.468 0.215
Mean 0.002 0.000 0.010 0.000 0.012 0.005
Variance 0.37% Variance 1.20%
Autocorrelation of market 0.292 Std. dev. 6.11% Std. dev. 10.95%
Autocorrelation of fund 0.142 Cov. 0.006
Durbin-Watson of market 1.393 Cor. 0.822
Durbin-Watson of fund 1.697 Beta 1.474

Source: Bloomberg.

Because covariance is based on the products of individual deviations and not squared deviations, its value can be positive, negative, or zero. When the return deviations are in the same direction, meaning they have the same sign, the cross product is positive; when the return deviations are in opposite directions, meaning they have different signs, the cross product is negative. When the cross products are summed, the resulting sum generates an indication of the overall tendency of the returns to move either in tandem or in opposition. Note that the table method illustrated in Exhibit 4.2 simply provides a format for solving the formula, which can be easily solved by software. Covariance is used directly in numerous applications, such as in the classic portfolio theory work of Markowitz.

4.3.2 Correlation Coefficient

A statistic related to covariance is the correlation coefficient. The correlation coefficient (also called the Pearson correlation coefficient) measures the degree of association between two variables, but unlike the covariance, the correlation coefficient can be easily interpreted. The correlation coefficient takes the covariance and scales its value to be between +1 and −1 by dividing by the product of the standard deviations of the two variables. A correlation coefficient of −1 indicates that the two assets move in the exact opposite direction and in the same proportion, a result known as perfect linear negative correlation. A correlation coefficient of +1 indicates that the two assets move in the exact same direction and in the same proportion, a result known as perfect linear positive correlation. A correlation coefficient of zero indicates that there is no linear association between the returns of the two assets. Values between the two extremes of −1 and +1 indicate different degrees of association. Equation 4.17 provides the formula for the correlation coefficient based on the covariance and the standard deviations:

where ρij (rho) is the notation for the correlation coefficient between the returns of asset i and asset j; σij is the covariance between the returns of asset i and asset j; and σi and σj are the standard deviations of the returns of assets i and j, respectively.

Thus, ρij, the correlation coefficient, scales covariance, σij, through division by the product of the standard deviations, σi σj . The correlation coefficient can therefore be solved by computing covariance and standard deviation as in Exhibit 4.2 and inserting the values into Equation 4.17. The result is shown in Exhibit 4.2.

4.3.3 The Spearman Rank Correlation Coefficient

The Pearson correlation coefficient is not the only measure of correlation. There are some especially useful measures of correlation in alternative investments that are based on the ranked size of the variables rather than the absolute size of the variables. The returns within a sample for each asset are ranked from highest to lowest. The numerical ranks are then inserted into formulas that generate correlation coefficients that usually range between −1 and +1. The Spearman rank correlation coefficient is a popular example.

The Spearman rank correlation is a correlation designed to adjust for outliers by measuring the relationship between variable ranks rather than variable values. The Spearman rank correlation for returns is computed using the ranks of returns of two assets. For example, consider two assets with returns over a time period of three years, illustrated here:

Time Period Return of Asset #1 Return of Asset #2
1 61% 12%
2 −5% 6%
3 0% 4%

The first step is to replace the actual returns with the rank of each asset's return. The ranks are computed by first ranking the returns of each asset separately, from highest (rank = 1) to lowest (rank = 3), while keeping the returns arrayed according to their time periods:

Time Period Rank of Asset #1 Rank of Asset #2 Difference in Ranks (di)
1 1 1 0
2 3 2 1
3 2 3 −1

This table demonstrates the computation of di, the difference in the two ranks associated with time period i. The Spearman rank correlation, ρs, can be computed using those differences in ranks and the total number of time periods, n:

(4.18) numbered Display Equation

Using the data from the table, the numerator is 12, the denominator is 3 × 8 = 24, and ρs is 0.5. Rank correlation is sometimes preferred because of the way it handles the effects of outliers (extremely high or low data values). For example, the enormous return of asset 1 in the previous table is an outlier, which will have a disproportionate effect on a correlation statistic. Extremely high or very negative values of one or both of the variables in a particular sample can cause the computed Pearson correlation coefficient to be very near +1 or −1 based, arguably, on the undue influence of the extreme observation on the computation, since deviations are squared as part of the computation. Some alternative investments have returns that are more likely to contain extreme outliers. By using ranks, the effects of outliers are lessened, and in some cases it can be argued that the resulting measure of the correlation using a sample is a better indicator of the true correlation that exists within the population. Note that the Spearman rank correlation coefficient would be the same for any return that would generate the same rankings. Thus, any return in time period 1 for the first asset greater than 0% would still be ranked 1 and would generate the same ρs.

4.3.4 The Correlation Coefficient and Diversification

The correlation coefficient is often used to demonstrate one of the most fundamental concepts of portfolio theory: the reduction in risk found by combining assets that are not perfectly positively correlated. Exhibit 4.3 illustrates the results of combining varying portions of assets A and B under three correlation conditions: perfect positive correlation, zero correlation, and perfect negative correlation.

images

Exhibit 4.3 Diversification between Two Assets

The highest possible correlation and least diversification potential is when the assets' correlation coefficient is positive: perfect positive correlation. The straight line to the lower right between points A and B in Exhibit 4.3 plots the possible standard deviations and mean returns achievable by combining asset A and asset B under perfect positive correlation. The line is straight, meaning that the portfolio risk is a weighted average of the individual risks. This illustrates that there are no benefits to diversification when perfectly correlated assets are combined. The idea is that diversification occurs when the risks of unusual returns of assets tend to cancel each other out. This does not happen in the case of perfect positive correlation, because the assets always move in the same direction and by the same proportion.

The greatest risk reduction occurs when the assets' correlation coefficient is −1: perfect negative correlation. The two upper-left line segments connecting points A and B in Exhibit 4.3 plot the possible standard deviations and mean returns that would be achieved by combining asset A and asset B under perfect negative correlation. Notice that the line between A and B moves directly to the vertical axis, the point at which the standard deviation is zero. This illustrates ultimate diversification, in which two assets always move in opposite directions; therefore, combining them into a portfolio results in rapid risk reduction, or even total risk reduction. This zero-risk portfolio illustrates the concept of a perfect two-asset hedge and occurs when the weight of the investment in asset A is equal to the standard deviation of asset B divided by the sums of the standard deviations of A and B.

But the most realistic possibility is represented by the curve in the center of Exhibit 4.3. This is the more common scenario, in which the assets are neither perfectly positively nor perfectly negatively correlated; rather, they have some degree of dependent movement. The key point to this middle line in Exhibit 4.3 is that when imperfectly correlated assets are combined into a portfolio, a portion of the portfolio's risk is diversified away. The risk that can be removed through diversification is called diversifiable, nonsystematic, unique, or idiosyncratic risk.

In alternative investments, the concept of correlation is central to the discussion of portfolio implications. Further, graphs with standard deviation on the horizontal axis and expected return on the vertical axis are used as a primary method of illustrating diversification benefits. Assets that generate diversification benefits are shown to shift the attainable combinations of risk and return toward the benefit of the investor, meaning less risk for the same amount of return. The key point is that imperfect correlation leads to diversification that bends portfolio risk to the left, representing the improved investment opportunities afforded by diversification.

In the case of asset returns, true future correlations can only be estimated. Past estimated correlation coefficients not only are subject to estimation error but also are typically estimates of a moving target, since true correlations should be expected to change through time, as fundamental economic relationships change. Further, correlation coefficients tend to increase (offer less diversification across investments and asset classes) in times of market stress, just when an investor needs diversification the most.

4.3.5 Beta

The beta of an asset is defined as the covariance between the asset's returns and a return such as the market index, divided by the variance of the index's return, or, equivalently, as the correlation coefficient multiplied by the ratio of the asset volatility to market volatility:

where βi is the beta of the returns of asset i (Ri) with respect to a market index of returns, Rm. The numerator of the middle expression in Equation 4.19 measures the amount of risk that an individual stock brings into an already diversified portfolio. The denominator represents the total risk of the market portfolio. Beta therefore measures added systematic risk as a proportion of the risk of the index.

In the context of the capital asset pricing model (CAPM) and other single-factor market models, Rm is the return of the market portfolio, and the beta indicates the responsiveness of asset i to fluctuations in the value of the market portfolio. In the context of a single-factor benchmark, Rm would be the return of the benchmark portfolio, and the beta would indicate the responsiveness of asset i to fluctuations in the benchmark. In a multifactor asset pricing model, the beta indicates the responsiveness of asset i to fluctuations in the given risk factor, as is discussed in Chapter 6.

Exhibit 4.2 illustrates the computation of beta using a market index's return as a proxy for the market portfolio. Beta is similar to a correlation coefficient, but it is not bounded by +1 on the upside and −1 on the downside.

There are several important features of beta. First, it can be easily interpreted. The beta of an asset may be viewed as the percentage return response that an asset will have on average to a one-percentage-point movement in the related risk factor, such as the overall market. For example, if the market were to suddenly rise by 1% in response to particular news, a fund with a market beta of 0.95 would be expected on average to rise 0.95%, and a fund with a beta of 2.0 would be expected to rise 2%. If the market falls 2%, then a fund with a beta of 1.5 would have an expected decline of 3%. But actual returns deviate from these expected returns due to any idiosyncratic risk. The risk-free asset has a beta of zero, and its return would therefore not be expected to change with movements in the overall market. The beta of the market portfolio is 1.0.

The second feature of beta is that it is the slope coefficient in a linear regression of the returns of an asset (as the Y, or dependent variable) against the returns of the related index or market portfolio (as the X, or independent variable). Thus, the computation of beta in Exhibit 4.2 using Equation 4.19 may be viewed as having identified the slope coefficient of the previously discussed linear regression. Chapter 9 discusses linear regression.

Third, because beta is a linear measure, the beta of a portfolio is a weighted average of the betas of the constituent assets. This is true even though the total risk of a portfolio is not the weighted average of the total risk of the constituent assets. This is because beta reflects the correlation between an asset's return and the return of the market (or a specified risk factor) and because the correlation to the market does not diversify away as assets are combined into a portfolio.

Similar to the correlation coefficient between the returns of two assets, the beta between an asset and an index is estimated rather than observed. An estimate of beta formed with historical returns may differ substantially from an asset's true future beta for a couple of reasons. First, historical measures such as beta are estimated with error. Second, the beta of most assets should be expected to change through time as market values change and as fundamental economic relationships change. In fact, beta estimations based on historical data are often quite unreliable, although the most reasonable estimates of beta that are available may be based at least in part on historical betas.

4.3.6 Autocorrelation

The autocorrelation of a time series of returns from an investment refers to the possible correlation of the returns with one another through time. For example, first-order autocorrelation refers to the correlation between the return in time period t and the return in the previous time period (t − 1). Positive first-order autocorrelation is when an above-average (below-average) return in time period t − 1 tends to be followed by an above-average (below-average) return in time period t. Conversely, negative first-order autocorrelation is when an above-average (below-average) return in time period t − 1 tends to be followed by a below-average (above-average) return in time period t. Zero autocorrelation indicates that the returns are linearly independent through time. Positive autocorrelation is seen in trending markets; negative autocorrelation is seen in markets with price reversal tendencies.

We start here by assuming the simplest scenario: The returns on an investment are statistically independent through time, which means there is no autocorrelation. Further, we assume that the return distribution is stationary (i.e., the probability distribution of the return at each point in time is identical). Under these strict assumptions, the distribution of log returns over longer periods of time will tend toward being a normal distribution, even if the very short-term log returns are not normally distributed.

How do we know that log returns will be roughly normally distributed over reasonably long periods of time if the returns have no autocorrelation and if very short-term returns have a stationary distribution? One explanation is that the log return on any asset over a long time period such as a month is the sum of the log returns of the sub-periods. Even if the returns over extremely small units of time are not normally distributed, the central limit theorem indicates that the returns formed over longer periods of time by summing the independent returns of the sub-periods will tend toward being normally distributed.

Why might we think that returns would be uncorrelated through time? If a security trades in a highly transparent, competitive market with low transaction costs, the actions of arbitrageurs and other participants tend to remove pronounced patterns in security returns, such as autocorrelation. If this were not true, then arbitrageurs could make unlimited profits by recognizing and exploiting the patterns at the expense of other traders.

However, markets for securities have transaction costs and other barriers to arbitrage, such as restrictions on short selling. Especially in the case of alternative investments, arbitrage activity may not be sufficient to prevent nontrivial price patterns such as autocorrelation. The extent to which returns reflect nonzero autocorrelation is important because autocorrelation can impact the shape of return distributions. The following material discusses the relationships between the degree of autocorrelation and the shapes of long-period returns relative to short-period returns.

Autocorrelation of returns can be used as a general term to describe possible relationships or as a term to describe a specific correlation measure. Equation 4.20 describes autocorrelation in the context of a return series with constant mean:

where Rt is the return of the asset at time t with mean μ and standard deviation σt, Rt−k is the return of the asset at time t − k with mean μ and standard deviation σtk, and k is the number of time periods between the two returns.

Equation 4.20 is the same equation used to define the Pearson correlation coefficient in Equation 4.17) with substitution of Equation 4.15 for covariance) except that Equation 4.20 specifies that the two returns are from the same asset and are separated by k periods of time. Thus, autocorrelations, like correlation coefficients, range between −1 and +1, with +1 representing perfect correlation.

There are unlimited combinations of autocorrelations that could theoretically be nonzero in a time series; thus, in practice, it is usually necessary to specify the time lags separating the correlations between variables. One of the simplest and most popular specifications of the autocorrelation of a time series is first-order autocorrelation. The first-order autocorrelation coefficient is the case of k = 1 from Equation 4.20, which is shown in Equation 4.21:

Thus, first-order autocorrelation refers to the correlation between the return in time period t and the return in the immediately previous time period, t − 1. Note that in the case of first-order autocorrelation, the returns in time period t − 1 would also be correlated with the returns in time period t − 2; thus, the returns in time period t would also generally be correlated with the returns in time period t − 2, as well as those of earlier time periods. Because first-order autocorrelation is generally less than 1, the idea is that the autocorrelation between returns diminishes as the time distance between them increases.

While autocorrelation would be zero in a perfectly efficient market, substantial autocorrelation in returns can occur when there is a lack of competition, when there are substantial transaction costs or other barriers to trade, or when there are returns that are calculated based on nonmarket values, such as appraisals. Autocorrelation of reported returns due to the use of appraised valuations or valuations based on the discretion of fund managers raises important issues, especially in the analysis of alternative investments.

Autocorrelation in returns has implications for the relationship between the standard deviations of a return series computed over different time lengths. Specifically, if autocorrelation is positive (i.e., returns are trending), then the standard deviation of returns over T periods will be larger than the single-period standard deviation multiplied by the square root of T. If autocorrelation is zero, then the standard deviation of returns over T periods will be equal to the single-period standard deviation multiplied by the square root of T. Finally, if autocorrelation is negative (i.e., returns are mean-reverting), then the standard deviation of returns over T periods will be less than the single-period standard deviation multiplied by the square root of T.

An important task in the analysis of the returns of an investment is the search for autocorrelation. An informal approach to the analysis of the potential autocorrelation of a return series is through visual inspection of a scatter plot of Rt against Rt−1. Positive autocorrelation causes more observations in the northeast and southwest quadrants of the scatter plot, where Rt and Rt−1 share the same sign. Negative autocorrelation causes the southeast and northwest quadrants to have more observations, and zero autocorrelation causes balance among all four quadrants.

Another common approach when searching for autocorrelation is to estimate the first-order autocorrelation measure of Equation 4.20 directly, using sample data. Exhibit 4.2 shows the estimated autocorrelation coefficients for the two return series. For autocorrelations beyond first-order autocorrelation, an analyst can use a linear regression with Rt as the dependent variable and Rt−1, Rt−2, Rt−3, and so forth as independent variables.

4.3.7 The Durbin-Watson Test for Autocorrelation

A formal approach in searching for the presence of first-order autocorrelation in a time series is through the Durbin-Watson test. To test the hypothesis that there is no autocorrelation in a series involves calculating the Durbin-Watson statistic:

(4.22) numbered Display Equation

where et is the value in time period t of the series being analyzed for autocorrelation.

In alternative investments, the series being analyzed (et) may be returns or a portion of returns, such as the estimated active return. A DW value of 2 indicates no significant autocorrelation (i.e., fails to reject the hypothesis of zero autocorrelation). If DW is statistically greater than 2, then the null hypothesis may be rejected in favor of negative autocorrelation; and if DW is statistically less than 2, then the null hypothesis may be rejected in favor of positive autocorrelation. The magnitude of the difference from 2 required to reject zero autocorrelation is complex, but a rule of thumb is that zero autocorrelation is rejected when DW is greater than 3, which is negative autocorrelation, or less than 1, which is positive autocorrelation. The DW statistics for the market index and the real estate fund are reported in the bottom left-hand corner of Exhibit 4.2. Note that the reported DW statistics for both of the return series fail to reject zero autocorrelation, even though the estimated autocorrelation coefficients appear quite positive.

4.4 Interpreting Standard Deviation and Variance

Perhaps the most important single risk measure in investments is the standard deviation of returns, or volatility. Unfortunately, the complexity of its formula and its computation can lead to a belief that standard deviation is not easily interpreted. But the standard deviation of returns is almost as easy to interpret as the mean (expected value) of the returns. The purpose of the next two sections is to demonstrate the ease with which the standard deviation of returns can be intuitively understood.

4.4.1 Standard Deviation and Typical Deviations

The standard deviation of an investment's returns can be very roughly approximated as the typical amount by which an investment's actual return deviates from its average. Standard deviation, or volatility, is such a central concept in investments that we present an example here to encourage an intuitive grasp.

Let's start with applying the concept of standard deviation to basketball scores. Observers of basketball might estimate that an average number of points for one team to score in one game might be 100 and that a typical amount by which the outcomes tend to differ from this expectation might be 15 points. In other words, among the higher-than-average scores, a typical score would be 115 points, while among the lower-than-average scores, a typical score would be 85 points. In this case, 15 points would be a rough estimate of the standard deviation of the basketball score for one team.

The idea is that standard deviation (volatility) is a measure of dispersion that can be roughly viewed on an intuitive basis. In statistics, the average distance between a variable and its mean is known as the mean absolute deviation, but it is usually not very different from the standard deviation. The exact relationship between the standard deviation and the mean absolute deviation depends on the underlying distribution. In the case of the normal probability distribution, the standard deviation is approximately 1.25 times the mean absolute deviation, which probably somewhat understates the magnitude of the difference observed in distributions of most returns from modern financial markets with high kurtosis. However, in most cases of investment returns without extreme events, the concepts of standard deviation and mean absolute deviation are close enough that viewing them as being similar in magnitude facilitates a reasonably clear understanding.

Let's take a look at a portfolio that has an annual expected return of 5% and a standard deviation of 2%. We should be able to develop a quick and easy intuitive feel for the range of outcomes. In a year of average performance, this portfolio will earn 5%. However, among those years with below-average performance, a typically bad year would generate a 2% lower return, or about 3%. Sometimes the portfolio would do worse than a 3% return in a bad year and sometimes perhaps a little better. Of those years with above-average performance, a typically good year would generate a return of perhaps 7%.

If the standard deviation of the asset's return fell to 1%, then we would understand that the returns were clustered closer to 5%, with typically good years producing a return of about 6% and typically bad years producing a return of around 4%, each found by either adding or subtracting 1 standard deviation to or from the expected return. Of course, returns could be much higher or much lower, indicating highly unusual circumstances in which the outcomes are many standard deviations from the average.

Once we are familiar with the concept of standard deviation, we can use its mathematical properties to clarify the behavior of risk in a portfolio context and to sharpen our intuition. With a little practice, standard deviation becomes as easy to use as averages.

4.4.2 Standard Deviation of Normally Distributed Returns

If the return distribution were exactly normal, we could develop more precise indications of the range of values and their associated probabilities.

Exhibit 4.4 depicts the use of standard deviation to specify confidence intervals for normally distributed variables. The diagram at the top of Exhibit 4.4 illustrates the range of outcomes that could be expected within 1, 2, or 3 standard deviations from the mean of the distribution. The table at the bottom of Exhibit 4.4 indicates the probabilities that a normally distributed variable will lie inside a range of 1, 2, or 3 standard deviations (two tails) from the mean, or outside the range in a prespecified direction (single tail).

Exhibit 4.4 Confidence Intervals for the Normal Distribution Using Standard Deviation

Number of σ's Two-Tails Inside Probability One-Tail Outside Probability
1 68.27% 15.87%
2 95.45%  2.28%
3 99.73%  0.13%

If returns were normally distributed, the standard deviation of the returns would help an investor know with precision what the probabilities of every outcome would be relative to its mean. Very roughly, two-thirds of the time the returns should lie within 1 standard deviation of the mean. The diagram illustrates this case of a 1-standard-deviation range using the lightest shading on each side of the mean, and illustrates larger ranges using darker shading. The diagram does not illustrate a particular value for the mean and standard deviation. The horizontal axis may be labeled to reflect the value of the mean and standard deviation. In the top panel of Exhibit 4.4, the value of the mean would lie on the horizontal axis at the point labeled 0σ. For all other points on the horizontal axis, the value is found by multiplying the standard deviation by the indicated number of standard deviations and adding the value to the mean. For example, with a mean return of 5% and a standard deviation of 2%, two-thirds of the outcomes (more exactly, 68.27%) would tend to lie between 3% and 7% (found as −2% and +2% from the mean of +5%). Also, roughly 95% of the time, the returns should lie within 2 standard deviations (between 1% and 9%). The one-tail probabilities would inform an investor that in this same example, there would be about a 16% probability that the return would be less than 3%, and a 2.28% probability that the return would be less than 1%. The normal distribution is symmetric, so there would be a 0.13% probability that the return would be more than 11%.

As discussed previously, actual return distributions are usually non-normally distributed. However, the large differences between the normal distribution and the actual distributions of returns typically observed in financial markets tend to occur farther out into the tails, such as 4 or more standard deviations. So for many actual return distributions, the probabilities just given would serve as reasonable approximations.

However, for huge return aberrations, such as a move of 10 standard deviations, the normal distribution provides an astoundingly underestimated indication of the actual probabilities of tail events. Extreme tail events, such as the U.S. equity market's decline on October 14–19, 1987, can be hundreds or even thousands of times more likely than indicated by probabilities from the normal distribution and historical standard deviations.

Standard deviation is analyzed so often in the context of the normal distribution that it is sometimes easy to forget that statements such as “Roughly 95% of the outcomes lie within 2 standard deviations of a mean” implicitly assume that the distribution is normally or near-normally distributed. Care should be taken to understand the assumed underlying probability distribution before associating outcomes with probabilities.

4.4.3 Properties of Variance

There are useful properties of variance in the analysis of alternative investments. Variance works well in many formulas regarding risk. This section demonstrates an important property of the variance of the returns of an asset through time. The variance of an investment's return over a time interval of T periods can be expressed as T times the variance measured over a single period under particular assumptions.

We begin with the well-known formula for the variance of the return of a portfolio (p) of n assets as a weighted average of the variances and covariances of the returns of the assets in the portfolio:

where wi and wj are the weights of assets i and j in the portfolio.

Note that the covariance of any variable with itself is equal to its variance. Thus, Equation 4.23 contains n variances (one for each of the n assets) and n2n covariances (from the pairs of assets). The additivity of the formula assists in financial modeling, such as Markowitz's pioneering work on risk and return, in which variance measured risk. In the case of uncorrelated returns between securities, this formula is simplified because all of the covariances between nonidentical assets are zero:

(4.24) numbered Display Equation

where Rp is the portfolio's return, and ρ is the correlation coefficient between all individual assets.

An important analogous concept involves the computation of the variance of a multiperiod return. The multiperiod continuously compounded rate of return of any asset is the sum of the continuously compounded returns corresponding to the sub-periods, as noted earlier in Chapter 3. For instance, the weekly rate of return expressed as log return is the sum of the five daily log returns:

(4.25) numbered Display Equation

where Rm = ∞w represents the weekly log return.

If we assume that the returns are uncorrelated through time (i.e., there is no autocorrelation), all covariances vanish, and the variance of the weekly return is the sum of the variances of the daily returns:

(4.26) numbered Display Equation

If we make the further assumption that the variances of the periodic returns of an asset are constant (i.e., homoskedastic), then the variance of the returns for a T-period time interval can be expressed as:

Since uncorrelated returns through time are consistent with market efficiency, this equation can be viewed as a starting point for understanding variance across different time horizons for asset returns that are reasonably independent through time. If returns are positively correlated in time (i.e., trending, or positively autocorrelated), then the variance will be larger than specified in Equation 4.27. If returns are negatively correlated in time (i.e., mean-reverting, or negatively autocorrelated), then the variance will be smaller than specified in Equation 4.27.

4.4.4 Properties of Standard Deviation

The standard deviation has several especially useful properties in the study of the returns of alternative investments. One important property involves perfectly correlated cross-sectional returns. The standard deviation of a portfolio of perfectly correlated assets is a weighted average of the standard deviations of the assets in the portfolio:

(4.28) numbered Display Equation

Another important property of the standard deviation involves a situation in which a return or any random variable can be expressed as a linear combination of another variable:

(4.29) numbered Display Equation

where Yt is a random variable, such as the return of asset Y in time t; Xt is another random variable, such as the return of asset X at time t; m is a fixed slope coefficient; and b is a constant intercept. The standard deviation of Y is found as the product of the standard deviation of X and the slope coefficient:

(4.30) numbered Display Equation

There are three especially useful applications of this property for investments. First, returns of a levered position in an asset can typically be well approximated as a linear function of the returns of an unlevered position in the same asset. Therefore, the standard deviation of the levered position (σ1) can be approximated as the product of the leverage (L) and the standard deviation of the unlevered asset σu:

For example, if a fund is levered 2:1 (i.e., the fund has $2 of assets for every $1 of equity investment), then its standard deviation of returns is generally twice the standard deviation of an unlevered fund with the same assets. The second useful application involves a portfolio that is a combination of proportion w in a risky asset (with return Rm) and proportion 1 − w in a risk-free asset (with return Rf). The portfolio's return (Rp) can be expressed as a linear function of the returns of the risky asset:

(4.32) numbered Display Equation

Using the previous property of standard deviation and noting that the standard deviation of Rf is zero and that the correlation between risk-free and risky assets is zero, the standard deviation of the portfolio (σp) can be expressed as the product of the proportion invested in the risky asset, w, and the standard deviation of the risky asset, σm.

A third property of standard deviation involves the relationship between the standard deviations of single-period and multiple-period returns. Equation 4.27 in the previous section showed that the variance of a multiperiod log return is the number of periods multiplied by the single-period variance when the returns are homoskedastic and uncorrelated through time. Taking the square root of both sides of Equation 4.27 generates the relationship in terms of standard deviations:

Equation 4.34 requires that the returns are independent through time and that the variances of the single-period returns are equal (i.e., homoskedastic). Note that the standard deviation of returns grows through the factor as the time interval increases. Thus, a two-period return has times the standard deviation of a one-period return, and a four-period return has two times the standard deviation of a one-period return. A popular annualization factor in alternative investments is to find the annual standard deviation by multiplying the standard deviation of monthly returns by .

Finally, it was previously noted that the standard deviation of a portfolio of perfectly correlated assets is the weighted average of the standard deviation of the constituent assets. Analogously, the standard deviation of a multiperiod return can be approximated as the sum of the standard deviations of the return of each sub-period if the returns are perfectly correlated. If we further assume that the standard deviation of each sub-period is equal (the standard deviation of the asset is constant through time), then:

Perfect positive correlation of returns through time does not make economic sense, so Equation 4.35 should be viewed as an upper bound. Let's compare the cases of independent and perfectly correlated returns through time. We see that the standard deviation of a multiperiod return varies from being proportional to in the uncorrelated (independent) case to being proportional to T in the perfectly correlated case. If returns are mean-reverting, meaning negatively correlated through time, the standard deviation of the multiperiod return can be even less than indicated in Equation 4.34. Thus, comparing the standard deviations of an asset using different time intervals for computing returns (e.g., daily returns versus annual returns) provides insight into the statistical correlation of the returns through time (i.e., their autocorrelation). In other words, whether a return series is trending, independent, or mean-reverting drives the relationship between the asset's relative volatility over different time intervals. For example, if an asset's return volatility over four-week intervals is more than twice as large as its weekly return volatility, it may be that the weekly returns are positively autocorrelated.

4.5 Testing for Normality

If a return distribution is normally distributed, then analysts can use well-developed statistical methods available for normally distributed variables and can be confident in the likelihood of extreme events. In practice, however, most return distributions are not normal. Some return distributions have substantial skews. Most return distributions have dramatically higher probabilities of extreme events than are experienced with the normal distribution (i.e., are leptokurtic).

4.5.1 Why Are Some Returns Markedly Non-Normal?

There are three main reasons for the non-normality often observed in alternative investment returns: autocorrelation, illiquidity, and nonlinearity. The first two can be related to each other.

1. AUTOCORRELATION: Price changes through time for many alternative investments will not be statistically independent, in terms of both their expected direction and their level of dispersion. Autocorrelation is a major source of that statistical dependence. Short-term returns, such as daily returns, are sometimes positively autocorrelated if the assets are not rapidly and competitively traded. Many alternative investments, such as private equity and private real estate, cannot be rapidly traded at low cost. Further, when reported returns can be influenced by an investment manager, it is possible that the manager smooths the returns to enhance performance measures. Thus, autocorrelation of observed returns can exist and is often found.

Positive autocorrelation causes longer-term returns to have disproportionately extreme values relative to short-term returns. The idea is that one extreme short-term return tends to be more likely to be followed by another extreme return in the same direction, to the extent that the return series has positive autocorrelation. The autocorrelated short-term returns can generate highly dispersed longer-term returns, such as the returns that appear to be generated in speculative bubbles on the upside and panics on the downside.

2. ILLIQUIDITY: Illiquidity of alternative investments refers to the idea that many alternative investments are thinly traded. For example, a typical real estate property or private equity deal might be traded only once every few years. Further, the trades might be based on the decisions of a very limited number of market participants. Observed market prices might therefore be heavily influenced by the liquidity needs of the market participants rather than driven toward an efficient price by the actions of numerous well-informed buyers and sellers. With a small number of potentially large factors affecting each trade, there is less reason to believe that the outcomes will be normally distributed and more reason to believe that extreme outcomes will be relatively common.

In illiquid markets, prices are often estimated by models and professional judgments rather than by competitive market prices. Evidence indicates that prices generated by models or professional judgments, such as those of appraisers, tend to be autocorrelated. The resulting returns are smoothed and tend to exhibit less volatility than would be indicated if true prices could be observed.

3. NONLINEARITY: A simple example of an asset with returns that are a nonlinear function of an underlying return factor is a short-term call option. As the underlying asset's price changes, the call option experiences a change in its sensitivity to future price changes in the underlying asset. Therefore, the dispersion in the call option's return distribution changes through time as the underlying asset's price changes, even if the volatility of the underlying asset remains constant. This is why a call option offers asymmetric price changes: A call option has virtually unlimited upside price change potential but is limited in downside price change potential to the option premium. The result is a highly nonsymmetric return distribution over long time intervals. A similar phenomenon occurs for highly active trading strategies (such as many hedge funds or managed futures accounts), which cause returns to experience different risk exposures through time, such as when a strategy varies its use of leverage.

Thus, many alternative investments tend to have markedly non-normal log returns over medium- and long-term time intervals. The shape of an investment's return distribution is central to an understanding of its risk and return. The following sections detail the analysis of return distributions through their statistical moments, which help describe and analyze return distributions even if they are not normally shaped.

Typically, the true underlying probability distribution of an asset's return cannot be observed directly but must be inferred from a sample. A classic issue that arises is whether a particular sample from a return distribution tends to indicate that the underlying distribution is normal or non-normal. The process is always one of either rejecting that the underlying distribution is normal or failing to reject that it is normal at some level of statistical confidence.

There are numerous types of tests for normality. Some methods are informal, such as plotting the frequency distribution of the sample and eyeballing the shape of the distribution or performing some informal statistical analysis. However, the human mind can be inaccurate when guessing about statistical relationships. Therefore, formal statistical testing is usually appropriate. The most popular formal tests use the moments of the sample distribution.

4.5.2 Moments-Based Tests for Normality with Data Samples

The statistical moments reviewed earlier in the chapter and statistics related to those moments, such as skewness and kurtosis, provide useful measures from which to test a sample for normality. The normal distribution has a skewness equal to zero and an excess kurtosis equal to zero. Even if a sample is drawn from observations of a normally distributed variable, the sample would virtually never have a sample skewness of exactly zero or an excess kurtosis exactly equal to zero. By chance, the observations included in the sample would tend to skew in one direction or the other, and the tails would tend to be fatter or skinnier than in the truly normal underlying distribution. Thus, tests are necessary to examine the level of departure of the sample statistics from the parameters of the normal distribution. Normality tests attempt to ascertain the probability that the observed skewness and kurtosis would occur if the sample had been drawn from an underlying distribution that was normal.

4.5.3 The Jarque-Bera Test for Normality

Numerous formal tests for normality have been developed. One of the most popular and straightforward tests for normality is the Jarque-Bera test. The Jarque-Bera test involves a statistic that is a function of the skewness and excess kurtosis of the sample:

(4.36) numbered Display Equation

where JB is the Jarque-Bera test statistic, n is the number of observations, S is the skewness of the sample, and K is the excess kurtosis of the sample.

Both the sample skewness and the kurtosis are computed as detailed in the previous sections. The null hypothesis is that the underlying distribution is normal and that JB is equal to zero (since the skewness and excess kurtosis of the normal distribution are both zero).

While the Jarque-Bera test statistic is relatively easy to compute given the skewness and kurtosis, its interpretation is a little more complicated. Notice that S and K in the formula for the Jarque-Bera test statistic are both squared. Thus, the Jarque-Bera test will always be nonnegative. If the test did not square S and K, a negative skewness would offset a positive excess kurtosis, which would wrongly suggest normality. As a sample exhibits more of the tendencies of a normal distribution (less skewness and less excess kurtosis), the Jarque-Bera test statistic will tend to be closer to zero (holding n constant). Thus, the Jarque-Bera test for normality is whether the test statistic is large enough to reject the null hypothesis of normality. The Jarque-Bera test is more powerful when the number of observations is larger.

If the underlying distribution is normal, the value of JB generated from a sample will exceed zero with the known magnitudes and probabilities given by the chi-squared distribution (with two degrees of freedom). Also, if the underlying distribution is normal, the size of the Jarque-Bera test statistic will tend to be small, since a sample drawn from a normal distribution will tend to have a low skewness and low excess kurtosis. The higher the JB statistic, the less likely it is that the distribution is normal. The probability that the Jarque-Bera test statistic will exceed particular values can be found from a chi-squared distribution table, shown here with the required two degrees of freedom. These critical values for the Jarque-Bera test are formed through simulations.

Confidence interval 0.90 0.95 0.975 0.99 0.999
Critical value 4.61 5.99 7.38 9.21 13.82

The analyst should perform the Jarque-Bera test in these four steps:

  1. Select a confidence interval (e.g., 90%, 95%, 97.5%, 99%, or 99.9%).
  2. Locate the corresponding critical value (e.g., 5.99 for 95% confidence).
  3. Compute the JB statistic (using formula 4.36 and the sample skewness and excess kurtosis).
  4. Compare the JB statistic to the critical value.

If the JB statistic exceeds the critical value, then the null hypothesis of normality is rejected using the stated level of confidence. If the JB statistic is less than the critical value, then the null hypothesis is not rejected, and the underlying distribution is assumed to be normal. The interpretation of this type of hypothesis test and the level of statistical confidence is actually quite complex and is discussed in detail in Chapter 8.

4.5.4 An Example of the Jarque-Bera Test

Assume that the sample skewness and excess kurtosis are computed as −0.577 and −0.042, respectively. The sample size, n, is 40. The Jarque-Bera test statistic is therefore given by:

(4.37) numbered Display Equation

Using a statistical confidence of 95%, the critical value for the test is 5.99. Since the Jarque-Bera test statistic, 2.219, is less than 5.99, we cannot reject the null hypothesis of normality.

4.6 Time-Series Return Volatility Models

The previous sections often focused on the use of the past or historical standard deviation to express or measure risk. In most cases, however, analysts are concerned more with forecasting future risk than with estimating past risk. This section briefly reviews an approach to estimating future volatility based on past data.

Time-series models are often used in finance to describe the process by which price levels move through time. However, the analysis of how price variation moves through time is increasingly studied. Time-series models of how risk evolves through time are numerous and diverse. We will briefly summarize one of the most popular methods. GARCH (generalized autoregressive conditional heteroskedasticity) is an example of a time-series method that adjusts for varying volatility.

Let's examine generalized autoregressive conditional heteroskedasticity one word at a time. Heteroskedasticity is when the variance of a variable changes with respect to a variable, such as itself or time. Homoskedasticity is when the variance of a variable is constant. Clearly, equity markets and other markets go through periods of high volatility and low volatility, wherein each day's volatility is more likely to remain near recent levels than to immediately revert to historical norms. Thus, risky assets appear at least at times to exhibit heteroskedastic return variation. The GARCH method allows for heteroskedasticity and can be used when it is believed that risk is changing over time.

Autoregressive refers to when subsequent values to a variable are explained by past values of the same variable. In this case, autoregressive means that the next level of return variation is being explained at least in part by modeling the past variation, in addition to being determined by randomness. Casual observation of equity markets and other financial markets appears to support the idea that one day's variation, or volatility, can at least partially determine the next day's variation.

The term conditional in GARCH refers to a particular lack of predictability of future variation. Some securities have return variation that is somewhat predictable. For example, a default-free zero-coupon bond (e.g., a Treasury bill) can be expected to decline in return variation and price variation as it approaches maturity and as its price approaches face value. Conditioned on the time to maturity, the variance of a Treasury bill is at least somewhat predictable. Hence, the Treasury bill might only be unpredictable on an unconditional basis. Other financial values, however, do not exhibit a pattern like the default-free zero-coupon bond. For example, there is no apparent pattern to the volatility of the price of a barrel of oil or the value of an equity index.

When a financial asset exhibits a clear pattern of return variation, such as in the example of a Treasury bill near maturity, its variation is said to be unconditionally heteroskedastic. Most financial market prices are conditionally heteroskedastic, meaning that they have different levels of return variation even when specified conditions are similar (e.g., when they are viewed at similar price levels).

An example of conditional heteroskedasticity is as follows. Perhaps a major equity index reaches a similar price level, such as 800, several times in the course of a decade. There is no reason to believe, however, that the index will experience similar levels of return variation each time it nears that 800 level. Sometimes the index might be quite volatile at the 800 level, and other times the index might be quite stable at the same level, as a result of, for example, different macroeconomic environments. Thus, the asset's return variation is heteroskedastic even when such conditions as price levels are held constant. Hence, the index, like most financial assets, is conditionally heteroskedastic because its return variation is heteroskedastic even under similar conditions (i.e., even when conditioned on another variable).

Finally, generalized refers to the model's ability to describe wide varieties of behavior, also known as robustness. A less robust time-series model of volatility is ARCH (autoregressive conditional heteroskedasticity), a special case of GARCH that allows future variances to rely only on past disturbances, whereas GARCH allows future variances to depend on past variances as well. Developed subsequently to ARCH, GARCH is now generally the more popular approach in most financial asset applications.

Now we can summarize all of the terms in GARCH together. In the context of financial returns, GARCH is a robust method that can model return variation through time in a way that allows that variation to change based on the variable's past history and even when some conditions, such as price level, have not changed.

It has parameters that the researcher can set to allow closer fitting of the model to various types of patterns. The GARCH model is usually specified by two parameters like this: GARCH (p,q). The first parameter in the parentheses, p, defines the number of time periods for which past return variations are included in the modeling equation, and the second, q, defines the number of time periods for which autoregressive terms are included.

Review Questions

  1. Describe the difference between an ex ante return and an ex post return in the case of a financial asset.

  2. Contrast the kurtosis and the excess kurtosis of the normal distribution.

  3. How would a large increase in the kurtosis of a return distribution affect its shape?

  4. Using statistical terminology, what does the volatility of a return mean?

  5. The covariance between the returns of two financial assets is equal to the product of the standard deviations of the returns of the two assets. What is the primary statistical terminology for this relationship?

  6. What is the formula for the beta of an asset using common statistical measures?

  7. What is the value of the beta of the following three investments: a fund that tracks the overall market index, a riskless asset, and a bet at a casino table?

  8. In the case of a financial asset with returns that have zero autocorrelation, what is the relationship between the variance of the asset's daily returns and the variance of the asset's monthly return?

  9. In the case of a financial asset with returns that have autocorrelation approaching +1, what is the relationship between the standard deviation of the asset's monthly returns and the standard deviation of the asset's annual return?

  10. What is the general statistical issue addressed when the GARCH method is used in a time-series analysis of returns?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset