CHAPTER 8
Alpha, Beta, and Hypothesis Testing

Chapter 6 discussed a number of measures of the price risk of options using Greek letters, such as delta, theta, and gamma. Greek letters and other similar-sounding words, such as vega, are not limited to option analysis. This chapter begins with a detailed discussion of alpha and beta. Alpha and beta are central concepts within alternative investment analysis. Consider the following hypothetical example of a discussion of investment performance:

During an investment committee meeting, the chief investment officer (CIO) comments on the performance of a convertible arbitrage fund named MAK Fund: “MAK generated an alpha of 8% last year and 10% two years ago. I think we can expect an alpha of 4% next year.” A portfolio manager debates the point: “MAK Fund takes positions in convertible bonds with high credit risk. I think that MAK's alpha during the last two years was really beta.” The CIO replies: “But MAK is delta hedged. And even though the fund is long gamma, is there really any beta in being long gamma?”

8.1 Overview of Beta and Alpha

The preceding example illustrates how Greek letters are often used in investments to represent key concepts. This chapter focuses on alpha and beta, two critical concepts in the area of alternative investments. In a nutshell, alpha represents, or measures, superior return performance; and beta represents, or measures, systematic risk. A primary purpose of this chapter is to explore their meanings and nuances. The second purpose of this chapter is to discuss hypothesis testing, since alpha and beta are generally estimated rather than observed.

8.1.1 Beta

In the CAPM (capital asset pricing model), the concept of beta is precisely identified: Each asset has one beta, and the beta is specified as the covariance of the asset's return with the return of the market portfolio, divided by the variance of the returns of the market portfolio. This is also the definition for a regression coefficient in a simple linear regression of an asset's returns on the returns of the market portfolio. Intuitively, beta is the proportion by which an asset's excess return moves in response to the market portfolio's excess return (the return of the asset minus the return of the riskless asset). If an asset has a beta of 0.95, its excess return can be expected, on average, to increase and decrease by a factor of 0.95 relative to the excess return of the market portfolio.

But beta has a more general interpretation outside the CAPM, both within traditional investment analysis and especially within alternative investment analysis. Beta refers to a measure of risk, or the bearing of risk, wherein the underlying risk is systematic (shared by at least some other investments and usually unable to be diversified or fully hedged without cost) and is potentially rewarded with expected return. Outside the CAPM model, assets can have more than one beta, and a beta does not have to be a measure of the response of an asset to fluctuations in the entire market portfolio.

Chapter 6 detailed the idea of multiple betas in multifactor asset pricing models. For example, when a particular investment, such as private equity, locks the investor into the position for a considerable length of time, is this illiquidity a risk that is rewarded with extra expected return? If so, then a benchmark should reflect that risk and reward, and a beta measuring that illiquidity and a term reflecting its expected reward should be included as an additional factor in an ex ante asset pricing model.

In alternative investments, the term beta can be used to refer to any systematic risk for which an investor might be rewarded. The term can apply to a specific systematic risk, from a single-factor or a multifactor model, or to the combined effects of multiple systematic risks from multiple factors. Beta is commonly used in phrases such as “This strategy has no beta,” “Half of the manager's return was (from) beta,” and “That product is a pure beta bet.”

Bearing beta risk is generally viewed as a source of higher expected return. The attempt to earn consistently higher returns without taking additional systematic risk leads to the topic of the next section: alpha.

8.1.2 Alpha

Alpha refers to any excess or deficient investment return after the return has been adjusted for the time value of money (the risk-free rate) and for the effects of bearing systematic risk (beta). For an investment strategy, alpha refers to the extent to which the skill, information, and knowledge of an investment manager generate superior risk-adjusted returns (or inferior risk-adjusted returns in the case of negative alpha).

The measurement of alpha, and even the existence of alpha, is an important issue in investments in general and in alternative investments in particular. One person may believe that a high return was generated by skill (alpha), whereas another person may argue that the same return was a reward for taking high risks (beta) or a result of being lucky (idiosyncratic risk). Therefore, the concept of alpha and the estimation of alpha are inextricably linked to the view of how financial assets and financial markets function. Asset pricing models, discussed in detail in Chapters 6 and 7, are expressions of asset and market behavior. The demarcation between return from alpha, beta, and idiosyncratic risk depends on one's view of the return-generating process (or asset pricing model) as implicitly or explicitly expressed. If the return-generating process is misspecified and relevant beta risks are excluded from the analysis, then manager skill may be overstated, because the perceived alpha may include compensation for beta risks omitted from a benchmark or asset pricing model.

The concept of alpha originated with Jensen's work in the context of the CAPM. Jensen's analysis was a seminal empirical application of the single-factor market model. Jensen measured the net returns from mutual funds after accounting for the funds' returns based on the single-factor market model. He subtracted the single-factor market model's estimated return from the actual returns, and what was left over (either positive or negative) was labeled alpha. However, the term alpha is not limited to the context of the CAPM. Regressions based on single-factor or multifactor market models are commonly performed with the value of the intercept referred to as alpha to reflect the common notation of the intercept of a linear regression.

8.2 Ex Ante versus Ex Post Alpha

Although in a very general sense there is consensus in the alternative investment community regarding the general meaning of alpha as superior risk-adjusted performance, the term is often used interchangeably for two very distinct concepts. Sometimes alpha is used to describe any high risk-adjusted returns, and sometimes it is used to describe superior returns generated through skill alone. This section distinguishes these two views of alpha using the terms ex ante alpha and ex post alpha. Considerable confusion regarding alpha originates from the failure to distinguish between these different uses of the term alpha.

8.2.1 Ex Ante Alpha

Ex ante alpha is a term that is not commonly used in industry or academics; rather, it is used in this book to denote an issue of critical importance in understanding alpha. Ex ante alpha is the expected superior return if positive (or inferior return if negative) offered by an investment on a forward-looking basis after adjusting for the riskless rate and for the effects of systematic risks (beta) on expected returns. Ex ante alpha is generated by a deliberate over- or underallocation to mispriced assets based on investment management skill. Simply put, ex ante alpha indicates the extent to which an investment offers a consistent superior risk-adjusted investment return.

In the context of the single-factor market model, ex ante alpha may be viewed as the first term on the right-hand side of the following equation:

where αi is the ex ante alpha of asset i.

In a perfectly efficient market, αi (alpha) in this equation would be zero for all assets. The use of a single-factor market model in Equation 8.1 and throughout most of this chapter is for simplicity. A multifactor model would simply insert a set of beta terms and factor returns in Equation 8.1 in addition to or in place of the market beta and market factor.

Equation 8.1 is described as representing a single-factor market model rather than the CAPM because the CAPM implies that no competitively priced asset would offer a positive or negative ex ante alpha, since every asset would trade at a price such that its expected return would be commensurate with its risk. In practice, market participants often seek expected returns that exceed the expected return based on systematic risk, a goal that is illustrated in Equation 8.1 using the term αi.

In practice, ex ante alpha is typically a concept rather than an observable variable. This can be seen from Equation 8.1 in a number of ways. First, βi is a sensitivity that must be estimated with approximation. If the true value of βi is not clear, then the true value of αi cannot be known. Second, all of the expected returns in Equation 8.1 except the risk-free rate, which is a constant, are unobservable and must be estimated. Thus, ex ante alpha can only be estimated or predicted. A positive ex ante alpha is an expression of the belief that a particular investment will offer an expected return higher than investments of comparable risk in the next time period.

As an illustration, consider the manager of an equity market-neutral hedge fund who desires to maximize ex ante alpha while maintaining a beta close to zero. The manager's strategy creates a hedge against systematic risk factors while attempting to exploit abnormal performance of individual stocks within the same sector or industry. Once the ex ante alpha of each stock is estimated, the portfolio is built using an optimization process seeking to maximize the positive alpha of long positions and the negative alpha of short positions, while requiring the systematic risk exposures of the long portfolio to match the short portfolio. The intended result is a zero-beta, or market-neutral, portfolio with a high ex ante alpha.

8.2.2 Ex Post Alpha

The ex ante alpha discussed in the previous section is a common interpretation of the term alpha. This section provides details about another potential interpretation of the term: the ex post alpha. As in the case of ex ante alpha, ex post alpha is a term used primarily for the purposes of this book.

Ex post alpha is the return, observed or estimated in retrospect, of an investment above or below the risk-free rate and after adjusting for the effects of beta (systematic risks). Whereas ex ante alpha may be viewed as expected idiosyncratic return, ex post alpha is realized idiosyncratic return. Simply put, ex post alpha is the extent to which an asset outperformed or underperformed its benchmark in a specified time period. Ex post alpha can be the result of luck or skill. Unlike ex ante alpha, ex post alpha can usually be estimated with a reasonable degree of confidence.

Considerable and valid disagreement exists with describing the concept of ex post alpha as being a type of alpha. The reason is that alpha is sometimes associated purely with skill, whereas ex post alpha can be generated by luck. Nevertheless, the use of the term to describe past superior performance is so common that it is labeled as such throughout this book. In the context of the single-factor market model, ex post alpha may be viewed as the last term on the right-hand side of the following equation (ϵit):

Note that Equation 8.2 refers to theoretical values rather than actual values estimated using a linear equation or other statistical technique. Some analysts would correctly refer to ϵit as the idiosyncratic return or the abnormal return and might object to having the return labeled as any type of alpha, because there might be no reason to think of the return as being generated by anything other than randomness or luck. Nevertheless, many other analysts use the term alpha synonymously with idiosyncratic return or abnormal return; therefore, the term ex post alpha is used here to distinguish the concept from the other interpretation of alpha (ex ante alpha).

In this example, Trim Fund must have been lucky, because the fund outperformed its benchmark by 125 basis points despite the managers being unskilled. Alpha-based analysis typically involves two steps: (1) ascertaining abnormal return performance (ex post alpha) by controlling for systematic risk, and (2) judging the extent to which any superior performance was attributable to skill (i.e., was generated by ex ante alpha). The more problematic issue can often be in the second step of the analysis, differentiating between the potential sources of the ex post alpha: luck or skill.

A key difference between ex ante and ex post alpha is that ex ante alpha reflects skill, whereas ex post alpha can be a combination of both luck and skill. For example, a manager might have enough skill to select a portfolio that is 1% underpriced but that happens to experience some completely unexpected good news that results in the portfolio outperforming other assets of similar risk by 11%. The manager had an ex ante alpha of 1% (purely skill) and an ex post alpha of 11% (1% from skill plus 10% from luck).

When discussing alpha, many analysts do not explicitly differentiate between the ex ante and ex post views. If an analyst identifies an alpha of 5% because a fund's risk-adjusted returns were 5% higher the previous year than the risk-adjusted returns of other funds, then in this book's terminology, the alpha is an ex post alpha. However, if the analyst expects that a fund will have a 5% higher expected return than other funds of similar risk, then in this book's terminology, the analyst believes that the fund has an ex ante alpha of 5% and that the fund's superior return is probably attributable to the better skill of the manager in selecting superior investment opportunities.

8.3 Inferring Ex Ante Alpha from Ex Post Alpha

One of the most central functions of alternative investment analysis is the process of attempting to identify ex ante alpha. Ex ante alpha estimation would be simplified if the expected returns of all assets could be observed or accurately estimated. In practice, expectations of returns on risky assets vary from market participant to market participant. In fact, the existence of ex ante alpha comes from different investors having different expectations of risk-adjusted return.

A key method of identifying ex ante alpha for a particular investment fund is a thorough and rigorous analysis of the manager and the manager's processes and methods. Analysis of historical data should typically also play a role, though not too large a role, in identification of ex ante alpha. In this section, these empirical methods are discussed. Empirical methods estimate ex ante alpha through attempting to differentiate between the roles of luck and skill in generating past risk-adjusted returns. The objective of these empirical analyses is to understand how much, if any, of an investment's past returns are attributable to skill and might be predicted to recur.

8.3.1 Two Steps to Empirical Analysis of Ex Ante Alpha

Two critical steps are used to identify ex ante alpha from historical performance. First, an asset pricing model or benchmark must be used to divide the historical returns into the portions attributable to systematic risks (and the risk-free rate) and those attributable to idiosyncratic effects. Second, the remaining returns, meaning the idiosyncratic returns (i.e., ex post alpha), should be statistically analyzed to estimate the extent, if any, to which the superior returns may be attributable to skill rather than luck.

The first step, identifying ex post alpha, requires the specification of an ex post asset pricing model or benchmark and can be challenging. Ex post alpha estimation is the process of adjusting realized returns for risk and the time value of money. Ex post alpha is not perfectly and unanimously measured, because it relies on accurate specification of systematic risks and estimation of the effects of those systematic risks on ex post returns.

Given estimates of ex post alpha (idiosyncratic returns), the second step is the statistical analysis of the superior or inferior returns to differentiate between random luck and persistent skill. The second part of this chapter (starting with Section 8.7) discusses hypothesis testing and statistical inference. The idea is that, given a set of assumptions with regard to the statistical behavior of idiosyncratic returns, historical returns can be used to infer central tendencies. If historical risk-adjusted returns are very consistently positive or negative, the analyst can become increasingly certain that the underlying investment offered a positive or negative alpha.

8.3.2 Lessons about Alpha Estimation from a Fair Casino Game

To frame the discussion of the role of idiosyncratic risk and model misspecification in alpha estimation, we discuss a hypothetical scenario in which skill is clearly not a factor, such as in the casino game roulette. This simplified scenario enables a clearer illustration of the challenges raised by model misspecification. Model misspecification is any error in the identification of the variables in a model or any error in identification of the relationships between the variables. Model misspecification inserts errors in the interpretation and estimation of relationships.

For example, assume that there is a perfectly balanced roulette wheel in a casino with perfectly honest employees and guests. For simplicity, the payouts of all bets are assumed to be fair gambles rather than gambles offering the house an advantage. In other words, every possible gamble has an expected payout equal to the amount wagered, meaning an expected profit or loss of zero. Gamblers use a variety of strategies, and they wager different amounts of money.

Based on these assumptions, a model can be derived that states that the expected gain or loss to each gambler should be $0 and 0%. By assumption, any realized gambling returns that differ from zero will be based purely on luck. When the actual profits and losses to the gamblers at the end of a day are observed, some gamblers ended up winning large amounts of money, some gamblers lost a lot of money, and many gamblers won or lost smaller amounts.

Based on the assumption that the roulette wheel is perfectly balanced, all of the observed profits and losses are idiosyncratic (i.e., all ex post alphas were generated by luck, since all ex ante alphas were zero).

Let's assume that there is a researcher who believes that some gamblers have skill in predicting the outcomes of the roulette wheel. That researcher would hypothesize that some or all of the observed profits were due to that skill and should thus be viewed as ex ante alpha. The researcher decides to perform statistical tests to identify the skilled gamblers.

Even in this simplified example, it would be easy for the researcher to make incorrect inferences. For example, assume that thousands of gamblers were observed. The researcher might focus on the gambler who won the most money, conclude that the odds were extraordinarily low that a gambler could win so much money in one night, and therefore falsely conclude that the chosen gambler was skilled. Another researcher might expand the search to multiple nights and multiple casinos and find a gambler with even higher winnings. But in this example, no level of winnings can prove that skill was involved, because skill was eliminated by assumption.

Unfortunately, some financial analysts use the analogous approach to analyze investment opportunities. They examine the past returns from a large set of investment pools and conclude that the top-performing funds must have achieved that success through skill. This example highlights the challenges faced in investment analysis. Does ex ante alpha exist in a particular market? Do we have models that can accurately separate ex post alpha from systematic risk bearing? Finally, will our statistical tests enable us to differentiate between idiosyncratic outcomes (luck) and ex ante alpha (skill)?

8.4 Return Attribution, Alpha, and Beta

Return attribution (performance attribution) was introduced in Chapter 7. This section focuses on return attribution and distinguishing between the effects of systematic risk (beta), the effects of skill (ex ante alpha), and the effects of idiosyncratic risk (luck).

8.4.1 A Numerical Example of Alpha

For simplicity, consider an example that uses a single-factor market model and for which expected returns are known. Assume that Fund A trades unlisted securities that are not efficiently priced, has a beta of 0.75, and has an expected return of 9%. Additionally, assume that the expected return of the market is 10% and that the risk-free rate is 2%. During the next year, the market earns 18%, and Fund A earns 17%.

Given these assumptions, we can answer the following questions:

  • What was the fund's ex ante alpha?
  • What was the fund's ex post alpha?
  • What was the portion of the ex post alpha that was luck?
  • What was the portion of the ex post alpha that was skill?

First, the ex ante alpha is found as the intercept of the ex ante version of the single-factor market model, in this case a CAPM-style model. Inserting the market's expected return, the fund's beta, and the risk-free rate into Equation 8.1 generates the required return, E(RA*), for Fund A in an efficient market:

numbered Display Equation

The return of 8% is the expected return that investors would require on an asset with a beta of 0.75, which is also the expected return that Fund A would offer in an efficient market. The ex ante alpha of Fund A is any difference between the expected return of Fund A and its required return:

numbered Display Equation

Thus, Fund A offers 1% more return than would be required based on its systematic risk (i.e., an ex ante alpha of 1%). Next, the ex post alpha is found from the ex post version of the single-factor market model. Inserting the two realized returns, the beta and the risk-free rate, into Equation 8.2 generates the following:

numbered Display Equation

The analysis indicates that even though Fund A underperformed the market portfolio prior to risk adjustment, it performed 3% better than assets of similar risk. Thus, in the terminology introduced earlier in the chapter, the ex post alpha (idiosyncratic return) was 3%.

Finally, since the analysis assumes that the fund offers an expected superior return, or ex ante alpha, of 1%, Fund A's ex post alpha of 3% could be said to have been one-third (i.e., 1% of the 3%) attributable to skill and two-thirds (i.e., 2% of the 3%) attributable to luck.

In practice, true beta and expected returns are difficult to estimate. The beta is necessary to estimate either ex ante alpha or ex post alpha. The expected returns are necessary only to estimate ex ante alpha and to distinguish between luck and skill. It is common for a return attribution analysis to estimate ex post alpha but not consider ex ante alpha, and not estimate the distinction between luck and skill.

8.4.2 Three Types of Model Misspecification

The previous example assumed that the investment's systematic risks were fully and accurately captured in a single market beta and that the single-factor market model was accurate. Errors in estimating alpha can result from model misspecification, including misspecification of a benchmark. Three primary types of model misspecification can confound empirical return attribution analyses:

  1. Omitted (or misidentified) systematic return factors
  2. Misestimated betas
  3. Nonlinear risk-return relationships

In each case of misspecification, the component of the return attributable to systematic risk is not precisely identified. Because systematic risks have a positive expected return, omitting a significant risk factor or underestimating a beta tends to overstate the manager's skill by attributing beta return to alpha.

The bias caused by omitted systematic return factors in estimating alpha can be illustrated as follows. Assume that a fund's return is driven by four betas, or systematic factors. If an analyst ignores two of the factors (e.g., factor 3 and factor 4), then the estimate of the idiosyncratic return will, on average, contain the expectation of the two missing effects, both of which would have positive expected values. The performance attribution example throughout Chapter 7 illustrated this problem.

In the second case of model misspecification, misestimated betas, when the systematic risk, or beta, of a return series is over- or underestimated, the return attributable to the factors is also over- or underestimated. Underestimation of a beta is a similar but less extreme case of omitting a beta.

The final major problem with misspecification is when the functional relationship between a systematic risk factor and an asset's return is misspecified. For example, most asset pricing models assume a linear relationship between risk factors and an asset's returns. If the true relationship is nonlinear, such as in the case of options, then the linear specification of the relationship generally introduces error into the identification of the systematic risk component of the asset's return.

8.4.3 Beta Nonstationarity

Beta nonstationarity is one reason why return can be attributed to systematic risk with error. Beta nonstationarity is a general term that refers to the tendency of the systematic risk of a security, strategy, or fund to shift through time. For example, a return series containing leverage is generally expected to have a changing systematic risk through time if the leverage changes through time. An example is the stock of a corporation with a fixed dollar amount of debt. As the assets of the firm rise, the leverage of the equity falls (or if the assets fall, leverage of the equity rises), causing the beta of the equity to shift.

A type of beta nonstationarity that is sometimes observed in hedge funds is beta creep. Beta creep is when hedge fund strategies pick up more systematic market risk over time. When assets pour into hedge funds, it might be expected that the managers of the funds will allow more beta exposure in their portfolios in an attempt to maintain expected returns in an increasingly competitive and crowded financial market. This causes the creeping effect: that over time, as more funds flow to hedge fund managers, the amount of systematic risk in their portfolios will creep upward.

The betas of funds may also be nonstationary because of market conditions, such as market turmoil, rather than changes in the fund's underlying assets. In periods of economic stress, the systematic risks of funds have been observed to increase. Beta expansion is the perceived tendency of the systematic risk exposures of a fund or asset to increase due to changes in general economic conditions. Beta expansion is typically observed in down market cycles and is attributed to increased correlation between the hedge fund's returns and market returns.

Another example of beta nonstationarity is market timing: intentional shifting of an investment's systematic risk exposure by its manager. Consider the case of a skilled market timer. The fund manager takes on a positive beta exposure when his or her analysis indicates that the market is likely to rise and takes on a negative beta, or a short position, when he or she perceives that the market is likely to decline. This beta nonstationarity (or beta shifting) makes return attribution more problematic, since the level of beta between reporting periods would typically be very difficult to estimate accurately.

This market-timing example raises an interesting issue in the attribution of returns to alpha or beta. Assume for the sake of argument that a market-timing fund manager possesses superior skill in timing markets. The manager is successful at designing and implementing the strategy to generate superior returns but is unable to enhance returns through picking individual stocks. Would the fund's superior return better be described as alpha or beta?

At first glance, the answer may appear to be ex ante alpha, since the market-timing manager's return is superior. But in each sub-period, the manager earns a rate of return commensurate with the fund's systematic risk exposure; that is, whether the fund's risk exposure is positive or negative, its returns are commensurate with risk. Thus, in each sub-period, the portfolio earns the predicted return and exhibits an ex post alpha of zero. However, when viewed over the full time period, the fund earns a high ex post alpha, since the portfolio outperformed the market through superior market timing.

This example illustrates an important lesson: Evaluation of investment performance over a full market cycle can alleviate difficulties with shifting betas and misspecified models. A full market cycle is a period of time containing a large representation of market conditions, especially up (bull) markets and down (bear) markets. Although use of a full market cycle does not eliminate return attribution difficulties, it can mitigate the impact of modeling misspecifications and estimation errors.

8.4.4 Can Alpha and Beta Be Commingled?

The difficulty of identifying the return attributable to systematic risk is not limited to beta nonstationarity. Sometimes the line between alpha and beta can be blurred, even on a conceptual basis. Consider a specialized type of private equity transaction involving target firms in financial distress. An investment strategy directed at these opportunities requires sophisticated investors with keen negotiating skills and large amounts of available cash, since transactions must be made quickly. Very skilled investors can identify attractive opportunities, but the strategy requires exposures to systematic risks that cannot be hedged. One could argue that any superior return is ex ante alpha, since it takes superior skill to participate successfully in this market. However, one could also argue that the superior return is at least partially beta, since high returns are achieved only through bearing the systematic risk of the sector.

Should highly attractive returns that require skill as well as the bearing of systematic risk be attributed to alpha or beta? Perhaps there is no clear answer, such as in trying to attribute the dancing superiority of a pair of competitive dancers to each performer. In some cases, performance may be better viewed as indistinguishably related to both.

8.5 Ex Ante Alpha Estimation and Return Persistence

Numerous investment advertisements warn that “past performance is not indicative of future results.” That admonition would be true with regard to alpha if markets were perfectly efficient. But there is no doubt that inefficiencies exist and that abnormally good and bad performance has been predictable based on past data in many instances. However, there are also many instances in which investors have incorrectly used past performance to indicate future results.

Abnormal return persistence is the tendency of idiosyncratic performance in one time period to be correlated with idiosyncratic performance in a subsequent time period. This section focuses on return persistence in interpreting idiosyncratic return and identifying ex ante alpha.

8.5.1 Separating Luck and Skill with Return Persistence

Assume that a reasonably accurate performance attribution has distinguished returns due to systematic risks from those due to idiosyncratic risks. The next step is to attribute the idiosyncratic returns to their sources: luck, skill, or both. Proper attribution of the idiosyncratic returns (the ex post alpha) to luck or skill is typically a statistical challenge.

Attempting to identify ex ante alpha through an abnormal return persistence procedure can be summarized in the following three steps:

  1. Estimate the average idiosyncratic returns (ex post alpha) for each asset in time period 1.
  2. Estimate the average idiosyncratic returns (ex post alpha) for each asset in time period 2.
  3. Statistically test whether the ex post alphas in time period 2 are correlated with the ex post alphas in time period 1.

8.5.2 Interpreting Estimated Return Persistence

A statistically significant positive correlation between average idiosyncratic returns in consecutive periods implies positive return persistence. To the extent that the return model has been correctly specified, consistent and statistically significant positive correlation would lead to increased confidence that managerial skill has driven some or all of the investment results.

Note that this approach differs markedly from the more common approach of using a single time period to identify top returns and assuming that the top returns were driven by skill. However, just because an investment experiences positive return persistence in two consecutive periods does not prove that the returns are based on skill. All that a researcher can do is use careful statistical testing to develop increased confidence that persistence has been successfully identified.

The later part of this chapter discusses hypothesis testing with statistics and the care that should be used in constructing tests and interpreting their results.

8.6 Return Drivers

The term return driver represents the investments, the investment products, the investment strategies, or the underlying factors that generate the risk and return of a portfolio. A conceptually simplified way to manage a total portfolio is to divide its assets into two groups: beta drivers and alpha drivers. Briefly, in the context of a portfolio, an investment that moves in tandem with the overall market or a particular risk factor is a beta driver. An investment that seeks high returns independent of the market is an alpha driver.

For example, consider an investor who owns a portfolio consisting of one mutual fund indexed to the S&P 500 and one market-neutral fund with offsetting long and short exposures that attempts to earn superior rates without bearing systematic risk. The allocation to the S&P 500 Index fund is a beta driver, since the holding will generate systematic risk but will not offer ex ante alpha. That allocation is designed simply to harvest the average higher returns of bearing beta (systematic) risk. The allocation to the market-neutral fund is an alpha driver, since it is an attempt to earn superior rates of return through superior security selection rather than through systematic risk bearing.

Viewed from a portfolio management context, various investments and investment strategies can be viewed as alpha drivers, beta drivers, or mixtures of both. Alternative investing tends to focus more on alpha drivers, whereas traditional investing tends to focus more on beta drivers.

8.6.1 Beta Drivers

Beta drivers capture market risk premiums, and good or pure beta drivers do so in an efficient (i.e., precise and cost-effective) manner. Beta drivers capture risk premiums by bearing systematic risk.

Bearing beta risk as defined by the CAPM has been extremely lucrative over the long run. The long-term tendency of beta drivers to earn higher returns from equity investments than are earned on risk-free investments is attributed to the equity risk premium. The equity risk premium (ERP) is the expected return of the equity market in excess of the risk-free rate. This risk premium may be estimated from historical returns or implied by stock valuation models, such as through the relationship between stock prices and forecasts of earnings.

Especially in the United States, stocks have outperformed riskless assets tremendously, and these high historical returns form the equity risk premium puzzle. The equity risk premium puzzle is the enigma that equities have historically performed much better than can be explained purely by risk aversion, yet many investors continue to invest heavily in low-risk assets. Based on the data of the past 100 years or so, it seems that most investors are foolish not to place more of their money in equities rather than riskless assets. There is no consensus, however, on whether the superior equity returns of the past century that generated the high equity premium will persist in magnitude through the twenty-first century.

8.6.2 Passive Beta Drivers as Pure Plays on Beta

Passive investing, such as employing a buy-and-hold strategy to match a benchmark index, is a pure play on beta: simple, low cost, and with a linear risk exposure. A linear risk exposure means that when the returns to such a strategy are graphed against the returns of the market index or another appropriate standard, the result tends to be a straight line. Options and investment strategies with shifting betas have nonlinear risk exposures.

A passive beta driver strategy generates returns that follow the up-and-down movement of the market on a one-to-one basis. In this sense, pure beta drivers are linear in their performance compared to a financial index.

Some managers can deliver beta drivers for annual fees of as little as a few basis points per year, whereas others may charge more than a half percent per year and deliver performance before fees that is virtually identical to that of a pure beta driver. Asset gatherers are managers striving to deliver beta as cheaply and efficiently as possible, and include the large-scale index trackers that produce passive products tied to well-recognized financial market benchmarks. These managers build value through scale and processing efficiency.

8.6.3 Alpha Drivers

Alpha drivers seek excess return or added value through generating returns that exceed the returns on investments of comparable risk. Many alternative assets fall squarely into the category of alpha drivers. They tend to seek sources of return less correlated with traditional asset classes, which reduces risk in the entire portfolio in the process. Alpha drivers are the focus of much alternative investing. Alternative investments are often touted as being able to generate greater combinations of return and risk by providing return streams that have relatively low correlation with traditional stock and bond markets but comparable average returns.

8.6.4 Product Innovators and Process Drivers

Historically, most investment pools were mixes of beta drivers and alpha drivers. In other words, the funds derived considerable return variation from bearing substantial systematic risk but implemented active investment strategies intended to generate alpha. In recent decades, the distinction between alpha drivers and beta drivers has increased. Thus, much of the asset management industry has moved into the tails of the alpha driver–beta driver spectrum. At one end of the spectrum are product innovators, which are alpha drivers that seek new investment strategies offering superior rates of risk-adjusted return. At the other end are passive indexation strategies, previously described as asset gatherers, which offer beta exposure as efficiently as possible without any pretense of alpha seeking.

Another development among beta drivers is the growth of process drivers. Process drivers are beta drivers that focus on providing beta that is fine-tuned or differentiated. As an example, these index trackers have introduced a large number and wide variety of exchange-traded funds (ETFs) that track specific sectors of the market rather than broadly diversifying across most or all sectors. For example, many new ETFs provide beta for a particular market-capitalization range, industry, asset class, or geographic market. These process drivers carve up systematic risk exposure into narrower risk factors as they identify investors desiring targeted risk exposures.

The increased difficulty for a fund manager to capture alpha or to compete with the extremely low-cost asset gatherers has put pressure on beta drivers with high fees. It has been argued that some managers following a pure beta driver strategy do not disclose the true nature of their strategy accurately, perhaps because it would be difficult to justify their high fees when their performance before fees is virtually indistinguishable from that of other beta drivers with fees near zero.

8.7 Using Statistical Methods to Locate Alpha

Suppose that a manager running a fund called the Trick Fund claims the ability to consistently outperform the S&P 500 Index using a secret strategy. It turns out that for each $100 of value in the fund, the manager initially holds $100 in a portfolio that mimics the S&P 500, and then on the first of each month, the fund manager writes a $0.50 call option on the S&P 500 that is far out-of-the-money and expires in a few days. If the fund manager has bad luck and the S&P 500 rises dramatically during the first week, so that the call option rises to, say, $2.50, and is about to be exercised, the fund manager purchases the call at a loss (covering the option position). The fund manager purchases the call option back using money obtained from writing large quantities of new out-of-the-money call options for the second week at combined prices of $2.50. If the second group of options rises in value to, say, $12.50, the fund manager repeats the process by selling even more call options for the third week to generate proceeds of $12.50, which are used to cover the second option positions. The strategy continues into the fourth week, such that if the third set of short options rises to, say, $62.50, a fourth set of out-of-the-money options is sold for $62.50. By the end of the fourth week, either the fourth set of options is worthless or the fund is ruined.

If at any point during the month one of the sets of options expires worthless, the fund manager ceases writing options for the rest of the month, and the fund is $0.50 (i.e., 50 basis points) ahead of its benchmark for the month. There is very little likelihood (perhaps once every 200 months) that all four sets of options would finish in-the-money and therefore that the option strategy would lose a large amount of money. In perhaps 199 of every 200 months, the fund outperforms the S&P 500 by 50 basis points (ignoring any transaction costs or fees). Since there is no open option position at the end of any month, the fund manager's strategy has been kept a secret; the manager shows the fund's positions and risks only at the end of months.

If we assume that the options market is efficient, this manager is not generating ex ante alpha; the manager is simply taking a gamble on a very large chance of making a small amount of money and a very small chance of losing a very large amount of money (relative to the benchmark). But the returns that this manager generates would typically be very hard to distinguish from those of a manager who truly generated a small but consistent return advantage. Could statistical analysis of the fund's returns help us figure out what the Trick Fund was doing and help us differentiate truly superior performance from luck?

8.7.1 Four Steps of Hypothesis Testing

Hypotheses are propositions that underlie the analysis of an issue. Two hypotheses regarding the Trick Fund example could be that the fund has a system that generates ex ante alpha or that it does not have such a system. Hypothesis testing is the process of developing and interpreting evidence to support and refute the various hypotheses. Hypothesis tests typically follow the same four steps, in which the analyst does the following:

  1. States a null hypothesis and an alternative hypothesis to be tested
  2. Designs a statistical test
  3. Uses sample data to perform the statistical test
  4. Rejects or fails to reject the null hypothesis based on results of the analysis

STATING THE HYPOTHESES: Hypothesis testing requires the analyst to state a null hypothesis and an alternative hypothesis. The null hypothesis is usually a statement that the analyst is attempting to reject, typically that a particular variable has no effect or that a parameter's true value is equal to zero. For example, common null hypotheses are that a fund's alpha is zero or that a fund's exposure to a particular risk factor, or beta, is zero.

The alternative hypothesis is the behavior that the analyst assumes would be true if the null hypothesis were rejected. The alternative and null hypotheses are often stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false, and vice versa. For example, if the null hypothesis is that an alpha, beta, or other variable is zero, the alternative hypothesis is that the variable is not equal to zero.

DESIGNING A TEST OF THE HYPOTHESES: The test's plan describes how to use sample data to reject or to not reject the null hypothesis. This stage involves specifying the variables for a model, the relationships between the variables, and the statistical properties of the variables. Typically, the test involves a test statistic, which is a function of observed values of the random variables and typically has a known distribution under the null hypothesis. The test statistic is the variable that is analyzed to make an inference with regard to rejecting or failing to reject a null hypothesis. Given a test statistic and its sampling distribution, an analyst can assess the probability that the observed values of the random variables of interest could come from the assumed distribution and can determine if the null hypothesis should be rejected.

The plan should specify a significance level for the test before the test is run. Generally, the term significance level is used in hypothesis testing to denote a small number, such as 1%, 5%, or 10%, that reflects the probability that a researcher will tolerate of the null hypothesis being rejected when in fact it is true. The selection of a smaller probability for the significance level is intended to reduce the probability that an unusual statistical result will be mistakenly used to reject a true null hypothesis. For example, a hypothesis tested with a significance level of 1% has a 1% likelihood of rejecting a true null hypothesis.

Statistical analyses of parameter estimates often utilize confidence intervals. A confidence interval is a range of values within which a parameter estimate is expected to lie with a given probability. The confidence interval is typically based on a large probability such as 90%, 95%, or 99%. A 90% confidence interval defines the range within which a parameter estimate is anticipated to lie in 90% of the tests given that the null hypothesis is true. An outcome outside the confidence interval provides the researcher with an indication that the true parameter lies outside the confidence interval. For example, suppose that a 95% confidence interval for the estimated beta of an asset ranges from 0.8 to 1.2. If the null hypothesis is true, a statistical estimate of that beta has a 95% chance of falling within that range and a 5% chance of falling outside that range.

RUNNING THE TEST TO ANALYZE SAMPLE DATA: Using sample data, the analyst performs computations called for in the plan. These computations allow calculation of the test statistic that is often standardized in the following form:

(8.3) numbered Display Equation

This standardization creates a test statistic that has zero mean and unit standard deviation under the null hypothesis. The assumptions of the model are used to derive a probability distribution for the test statistic. Using that distribution, a p-value is estimated based on the data. The p-value is a result generated by the statistical test that indicates the probability of obtaining a test statistic by chance that is equal to or more extreme than the one that was actually observed (under the condition that the null hypothesis is true). The p-value that the test generated is then compared to the level of significance that the researcher chose.

REJECTING OR FAILING TO REJECT THE NULL HYPOTHESIS: The analyst rejects the null hypothesis when the p-value is less than the level of significance. A p-value of 2% obtained in a statistical test indicates that there is only a 2% chance that the estimated value would occur by chance (under the assumption that the null hypothesis is true). So a p-value of 2% in a test with a significance level of 5% would reject the null hypothesis in favor of the alternative hypothesis. However, that same p-value of 2% would fail to reject the null hypothesis if the significance level of the test had been set at 1%.

In the previous paragraph, a p-value of 2% was referred to as “fail[ing] to reject the null hypothesis” when the significance level was set at 1%. Why wouldn't the analyst simply conclude that the null hypothesis was accepted? If a test indicates that a variable has not been found to be statistically different from the predictions of the null hypothesis, it does not mean that the null hypothesis is true or even that it is true with some known probability. For example, the test may assume that returns are normally distributed and that the means are equal. If the test indicates inequality, it could mean simply that the returns were not normally distributed.

The results of statistical tests are misunderstood or misused in many investment applications. The famous twentieth-century philosopher Karl Popper helped formulate the modern scientific view that knowledge progresses by proving that propositions are false and that no important proposition can be proven to be true. Popper's philosophy should be used in conducting empirical analyses of alternative investments. Tests should be designed to disprove things that are thought possibly to be true, not to try to confirm those things that are hoped to be true. Unfortunately, the strong desire of investors to confirm their beliefs and to locate an investment that offers positive alpha can lead them to search for confirmation of their hopes and beliefs. Popper's philosophy encourages research that focuses on refuting one's beliefs and is viewed by some as the recipe for greater success in alternative investing.

8.7.2 Four Common Problems Using Inferential Statistics

Results of hypothesis testing are very often interpreted incorrectly. A discussion of four common problems with interpreting p-values follows.

First, outcomes with lower p-values are sometimes interpreted as having stronger relationships than those with higher p-values; for example, an outcome of p < 0.01 is interpreted as indicating a stronger relationship than an outcome of p < 0.05. But at best, a p-value indicates whether a relationship exists; it is not a reliable indicator of the size and strength of the relationship.

A second major problem is failure to distinguish between statistical significance and economic significance. Economic significance describes the extent to which a variable in an economic model has a meaningful impact on another variable in a practical sense. One can be very statistically confident that one variable is related to another, but the size of the estimated parameter and the degree of dispersion in the explanatory variable may indicate that the parameter has only a minor economic effect in the model. Conversely, one might be less statistically confident that another variable has a true relationship, but given the absolute size of the estimated parameter and the dispersion in the related explanatory variable, we might determine that the relationship, if true, would have a very substantial impact on the model.

Third, the p-value is only as meaningful as the validity of the assumption regarding the distribution of test statistic. Researchers should carefully examine the data for indications that the distributional assumptions are violated.

Finally, a major problem is when the p-value from a test is interpreted as the unconditional probability that a statistically significant result it true. For example, assume that an analyst has a null hypothesis that hedge fund managers cannot earn superior returns using skill and an alternative hypothesis that hedge fund managers can earn superior returns using skill. Assume that the analyst has correctly applied a statistical procedure and finds that the hedge fund managers' mean performance is higher than the benchmark's mean performance with a p-value of 1%.

The incorrect statement that is often made regarding such a result is that the research indicates that there is a 99% probability that fund managers are able to outperform the benchmark using skill, or that there is a 99% probability that fund managers will earn higher expected returns than the benchmark. In fact, researchers have no reasonable basis for making this assertion. The relatively uncharted waters of alternative investments make these erroneous assertions even more problematic. Since the body of knowledge is less well-established, false beliefs based on erroneous statistical interpretations are less easily identified and corrected with alternative investment analytics. To explain this important concept carefully, the next section details two types of errors.

8.7.3 Type I and Type II Errors

Two types of errors can be made in traditional statistical tests: type I errors and type II errors. A type I error, also known as a false positive, is when an analyst makes the mistake of falsely rejecting a true null hypothesis. The term α is usually used to denote the probability of a type I error and should not be confused with investment alpha. The symbol α is the level of statistical significance of the test, and 1 − α is defined as the specificity of the test.

A type II error, also known as a false negative, is failing to reject the null hypothesis when it is false. The symbol β is usually used to denote the probability of a type II error and should not be confused with the use of that symbol to denote systematic risk. The statistical power of a test is equal to 1 − β. An analyst may lower the chances of both types of errors by increasing the sample size. Exhibit 8.1 shows a matrix that is often used to denote the four possible outcomes.

Exhibit 8.1 Errors in Hypothesis Testing

Null Hypothesis True Null Hypothesis False
Reject null hypothesis Type I error Correct
Fail to reject null hypothesis Correct Type II error

When a statistical test is performed with a significance level of 5%, it can best be viewed as differentiating between the upper left and lower left shaded boxes of the matrix. Given that the null hypothesis is true, there is a 5% probability that the null hypothesis will be mistakenly rejected (upper left shaded box) and a 95% probability that the correct decision will be made and the null hypothesis will not be rejected (lower left shaded box).

But the key is that the probability that the truth lies on the left-hand side of the matrix is not known. Accordingly, the unconditional probability of the error rate is not known. It cannot be claimed unconditionally that there is only a 5% chance of error in the test, because it is not certain that the null hypothesis is true. It can only be known that if the null hypothesis is true, one has only a 5% chance of error, if that is the significance level. The next section provides an example of this important point.

8.7.4 An Example of Erroneous Conclusions with Statistical Testing

Assume that all traders have equal skill but that one of every 10,000 traders cheats by using inside information. Thus, the probability of picking a trader at random who uses inside information is one in 10,000, or 0.01%. The null hypothesis is that a trader is honest and does not use inside information. A test has been developed that, when applied to an honest trader's transaction record, gives a correct answer that the person does not trade illegally 99% of the time and a false accusation 1% of the time. This test has a type I error rate of 1%, meaning the probability of falsely rejecting the null hypothesis by alleging that an honest trader is cheating is 1%. To simplify the problem, assume that when the test is given to a dishonest trader, the test always correctly identifies the trader as a cheater. In other words, there is no possibility of a type II error. What is the probability that a trader whose transaction record indicates cheating, according to the test, has actually cheated? The answer is not 99%; it is only 1%.

To understand this astounding result, note the assumption that only 0.01% of traders (10 traders out of 100,000 traders) actually cheat. However, from a population of 100,000 honest traders who are tested, the test would falsely indicate that 1,000 of the traders have cheated, since the test has a 1% type I error rate. Since, on average, 10 traders in a sample of 100,000 traders have actually cheated, whereas 1,000 have been falsely accused, approximately 99% of the indications of cheating will be false.

In summary, many analysts interpret a significance level or confidence interval as indicating the probability that a test has reached a correct conclusion. For example, an analyst using a 5% significance level or 95% confidence interval might interpret the finding of a nonzero mean or a nonzero coefficient as being 95% indicative that the mean is not zero or the coefficient is not zero. But this would be an erroneous interpretation of the test results.

Using a 5% level of significance as an example, this is what is known: If the null hypothesis is true, then there is only a 5% chance that the null hypothesis will be incorrectly rejected.

8.8 Sampling and Testing Problems

This section discusses potential problems when the sample being analyzed is not representative of the population or is not correctly interpreted.

8.8.1 Unrepresentative Data Sets

The validity of a statistical analysis depends on the extent to which the sample or data set on which the analysis is performed is representative of the entire population for which the analyst is concerned. When a sample, subsample, or data set is a biased representation of the population, then statistical tests may be unreliable. A bias is when a sample is obtained or selected in a manner that systematically favors inclusion of observations with particular characteristics that affect the statistical analysis.

For example, as privately placed investment pools, the total population or universe of hedge funds is unknown. Suppose that a researcher forms a sample of 100 funds for an in-depth analysis. If the 100 funds were selected at random, then the sample would be an unbiased representation of the population. However, if the 100 funds were selected on the basis of size or years in existence, then the sample would not be representative of the general hedge fund population. Statistical inferences about the entire population should not be made based on this biased sample with regard to such issues as return performance, since return performance is probably related to size and longevity. If the sample tends to contain established and large funds, the sample is likely to contain an upward bias in long-term returns, since these large, established funds probably became large and established by generating higher long-term returns. This is an example of selection bias. Selection bias is a distortion in relevant sample characteristics from the characteristics of the population, caused by the sampling method of selection or inclusion. If the selection bias originates from the decision of fund managers to report or not to report their returns, then the bias is referred to as a self-selection bias.

A number of other related biases have been recognized in alternative investment analysis, especially with regard to the construction of databases of hedge fund returns. For example, survivorship bias is a common problem in investment databases in which the sample is limited to those observations that continue to exist through the end of the period of study. Funds that liquidated, failed, or closed, perhaps due to poor returns, would be omitted.

8.8.2 Data Mining versus Data Dredging

Data mining typically refers to the vigorous use of data to uncover valid relationships.1 The idea is that by using a variety of well-designed statistical tests and exploring a number of data sources, analysts may uncover previously missed relationships. Data dredging, or data snooping, refers to the overuse and misuse of statistical tests to identify historical patterns. The difference is that data dredging involves performing too many tests, especially regarding historical relationships for which there are not a priori reasons for believing that the relationships reflect true causality. The problem with data dredging is not so much the number of tests performed as the failure to take the number of tests performed into account when analyzing the results.

The primary point is this: Any empirical results should be analyzed not only in the context of other research and economic reasoning but also through an understanding of how many tests have been performed. Not only can this information be difficult to obtain or estimate, but it may also be intentionally masked by researchers attempting to bolster a particular view.

8.8.3 Backtesting and Backfilling

Backtesting is the use of historical data to test a strategy that was developed subsequent to the observation of the data. Backtesting can be a valid method of obtaining information on the historical risk and return of a strategy, which can be used as an indication of the strategy's potential going forward. However, backtesting combined with data dredging and numerous strategies can generate false indications of future returns. The reason is that the strategy identified as most successful in the past is likely to have had its performance driven by luck. One must be especially careful of allocating funds to investment managers who choose to report backtested results of their new model rather than the actual returns of the disappointing old model that traded client money in real time.

Backtesting is especially dangerous when the model involves overfitting. Overfitting is using too many parameters to fit a model very closely to data over some past time frame. Models that have been overfit tend to have a smaller chance of fitting future data than a model using fewer and more generalized parameters.

In alternative investments, backfilling typically refers to the insertion of an actual trading record of an investment into a database when that trading record predates the entry of the investment into the database. An example of backfilling would be the inclusion of a hedge fund into a database in 2015, along with the results of the fund since its inception in 2010.

Backfilling of actual results can be an appropriate technique, especially when done with full disclosure and when there is a reasonable basis to believe that the results will not create a substantial bias or deception. Thus, data sets of investment fund returns sometimes include past actual results of funds in the data set when the sample of funds being included is not being assembled on the basis of past investment results. The danger with backfilling is backfill bias. Backfill bias, or instant history bias, is when the funds, returns, and strategies being added to a data set are not representative of the universe of fund managers, fund returns, and fund strategies. Instead, the additions would typically generate an upward return bias because it would be likely that the data set would disproportionately add the returns of successful funds that are more likely to survive and that may be more likely to want to publicize their results.

Backfilling can also refer to the use of hypothetical data from backtesting. In investments in general, backfilling sometimes refers to the insertion of hypothetical trading results into a summary of an investment opportunity. A reason that backfilling rarely refers to the inclusion of hypothetical trading results in the case of alternative investments is that alternative investments often focus on active trading strategies, in which hypothetical trading results would be highly discretionary and would be unsuited to hypothetical backfilling.

For example, an investment firm may have two funds with highly similar strategies, except that one fund uses two-to-one leverage and the other fund is unleveraged. Suppose that the unleveraged fund has been trading for 10 years and the leveraged fund has been trading for five years, and that, over the past five years, the leveraged fund has shown a very consistent relationship to the unleveraged fund. If clearly disclosed as being hypothetical, it may be reasonable to indicate the 10-year return that could have been expected if the leveraged fund had been in existence for 10 years, based on the observed relationship.

Backfilling can be deceptive even with innocent intentions. Often investors change or evolve their strategies as time passes, conditions change, and performance declines. Traders are especially likely to adapt their strategies in response to poor performance. An investor who backtests a revised trading strategy and backfills the hypothetical performance into the track record of the current and revised strategy is clearly providing a biased indication of forward-looking performance. The indication would be especially biased if the revision in the strategy were in response to data from the same time interval on which the backfilling was performed.

8.8.4 Cherry-Picking and Chumming

Cherry-picking is the concept of extracting or publicizing only those results that support a particular viewpoint. Consider an investment manager who oversees 10 funds. If the manager is not particularly skillful but takes large risks, half of the funds might be expected to outperform their benchmark in a given year, and half might be expected to underperform. After three or four years, there would probably be one fund with exceptionally high returns, and perhaps most of the remaining funds might be liquidated. Cherry-picking is the advertising and promotion of the results of the successful fund without adequately disclosing the number and magnitude of failed or poorly performing funds. If an investment firm has a large number of funds and is regularly opening new funds and closing old funds, it should be no surprise if many of the remaining funds are historical performance leaders.

Chumming is a fishing term used to describe scattering pieces of cheap fish into the water as bait to attract larger fish to catch. In investments, we apply this term to the practice of unscrupulous investment managers broadcasting a variety of predictions in the hope that some of them will turn out to be correct and thus be viewed as an indication of skill. For example, consider an unscrupulous Internet-based newsletter writer who sends 10 million emails, 5 million of which forecast that a particular stock will rise and 5 million of which forecast that it will fall. After observing whether the stock rises or falls, the writer sends follow-up emails to the 5 million recipients of the email with the predictions that were correct in retrospect. This second email notes the success of the previous prediction and makes another bold prediction. One version of that second email predicts that a second stock will rise and is sent to 2.5 million addresses, and an opposite prediction is sent to the remaining 2.5 million addresses. The process continues perhaps six or seven times until the writer has a list of 100,000 or so addresses that received six or seven correct predictions in a row. The people who received the string of correct predictions are encouraged to pay money for additional newsletter forecasts.

Would the recipient of six or seven correct predictions be persuaded that the results were generated by skill? Perhaps if the recipients understood that 9.9 million recipients received one or more bad predictions, it would be clear that the good predictions were based on luck. That is the key problem also observed in data dredging: Attractive results are usually not interpreted in the context of the total number of experiments being conducted.

8.9 Statistical Issues in Analyzing Alpha and Beta

Two of the most central tasks in alternative investments are estimating alpha and beta in the sense that alpha and beta represent return and risk. This section applies the concepts of hypothesis testing and other statistical issues from previous sections of this chapter to the estimation of alpha and beta. Alternative investment is a field that emphasizes emerging asset groups, and therefore its empirical analysis must be on the cutting edge of investment research. But with that pioneering task comes the need to use exceptionally solid methods, as the body of knowledge is less established.

8.9.1 Non-Normality and the Cross-Sectional Search for Alpha

Cross-sectional searches for alpha are especially prone to error when performance is analyzed with methods that assume normally distributed returns.

Suppose that an analyst is studying the return performance of 40 hedge fund managers. Assuming that all 40 funds have highly similar systematic risk exposures, the analyst uses a one-way statistical test assuming normality to determine which, if any, funds had a mean return that was 1.96 standard deviations or more above the average returns of the sample (a 97.5% confidence interval). If a fund's return exceeded the test's threshold, the analyst judged the fund as having generated superior returns.

A well-trained analyst would note that one out of 40 funds would typically exceed the 1.96 standard deviation threshold simply by randomness. But suppose that the analyst observes that eight of the 40 fund managers achieved statistically superior returns by this criterion. Should the analyst conclude that such a high number of funds with superior performance must be attributable to the superior skill of most or all of those eight managers?

The logic of this analysis is appealing. If the null hypothesis is true (that returns are normally distributed and that all managers possess equal skill), it would be expected on average that only one fund manager in 40 would achieve statistically significant superior returns using a 97.5% confidence interval. It would seem that eight managers in 40 having statistically significant superior performance would be indicative of a cluster of skill.

A potential explanation of the finding is simply that the returns are not normally distributed.2 Cross-sectional return differentials exist, but dispersion alone does not mean that skill is involved. In fact, the existence of any thickness or length to the tails of a frequency distribution of fund returns provides little or no evidence that the dispersion is caused by skill rather than luck.

8.9.2 Outliers and the Search for Alpha

Another area of concern is whether empirical findings are being driven by one or more outliers. An outlier is an observation that is markedly further from the mean than almost all other observations. Outliers tend to have large impacts on results, and an exceptionally unusual outlier may severely distort the measurement of the economic tendencies of the data in traditional tests, especially in the case of small samples. Many statistical methodologies use squared values. When an outlier value is squared, its impact on the analysis can be huge. However, outliers also represent behavior that can be reasonably expected to recur, and therefore their inclusion in a sample may be useful in generating results that predict behavior well. Outliers often result from non-normally distributed variables, and they are often detected through visual inspection of plots or listings of observations ranked by the size of the regression residuals.

Visually examining plots of variables used in a statistical test can provide insight regarding their distribution, as well as the extent to which outliers may be driving the results. If past results are attributable to an outlier, an analysis based on those results may provide a poor indication of the future unless it is clear that the outlier is as likely to occur in the future as it was likely to occur in the past.

8.9.3 Biased Testing and the Search for Alpha

Two issues of biased testing are: (1) Was the fund being analyzed selected at random, or was the fund identified prior to the sample period being analyzed? (2) Were the test procedures (such as the number of tests and the confidence levels) fully specified prior to the analysis of any results?

The first issue speaks to the tendency to observe a fund that has performed well and then to test if the performance is statistically superior. Did the person performing the test identify this fund based on noticing that it had performed well, or did a salesperson or financial publication bring this fund to the analyst's attention? If so, this test would be tantamount to standing outside a casino, observing a person who has won a great deal of money, and then testing to see if that person's winnings were statistically high.

The second issue speaks to the specification of the test and the importance of avoiding data dredging. Each statistical test typically involves numerous decisions, such as (1) the specification of the return model and benchmark or peer group, (2) the specification of the sample period, and (3) the specification of the significance level. It is vital that these decisions are made prior to the conduct of the test to avoid varying the specifications in search of a more favorable result.

8.9.4 Spurious Correlation, Causality, and Beta Estimation

Beta estimation is a crucial task in measuring systematic risk for use in risk adjustment of returns. As a measure of correlation rather than a measure of central tendency, beta is inherently more difficult to analyze and more subject to complexities. Further, estimates of betas and correlations based on historical data can be highly unreliable. This section overviews the major challenges of estimating beta.

Virtually all of the challenges discussed in the previous sections regarding alpha estimation apply to the estimation of beta: non-normality of the underlying data, outliers, and biased testing. The primary additional challenges with estimation of beta discussed in this section are (1) differentiating between spurious correlation and true correlation, and (2) differentiating between true correlation and causality.

The difference between spurious correlation and true correlation is that spurious correlation is idiosyncratic in nature, coincidental, and limited to a specific set of observations. Estimates of security betas, even using a single-factor market model, are remarkably unstable over different time periods. Thus, the beta of an individual stock, a sophisticated hedge fund strategy, or an alternative investment such as a commodity tends to vary enormously based on the time period being analyzed. The estimated beta of individual stocks is regarded as so erratic that published estimates of beta are automatically adjusted for their historical tendencies toward 1.0 when used to predict future betas. Thus, if XYZ Corporation's beta over the past 60 months is estimated to be 2.0, a forecast of its future beta is often adjusted toward 1.0 (to a value of perhaps a little over 1.5) to provide a more realistic prediction of future correlation. This does not mean that there is no true correlation between XYZ and the market; it means that the correlation is changing or is difficult to measure, so estimates of beta are erratic over different time periods. The estimated correlation is being driven both by true correlation and by spurious correlation.

The difference between true correlation and causality is that causality reflects when one variable's correlation with another variable is determined by or due to the value or change in value of the other variable. Clearly, when the overall economy performs very well, it causes the net asset value of a long-only equity fund to rise. The net asset value of one long-only equity fund might be highly correlated with another long-only equity fund, but there is no reason to believe that one fund's net asset value causes the other fund's net asset value to rise; they are rising together due to common underlying factors.

When economic reasoning indicates a causal relationship between two variables, an analyst or a researcher can be more confident that an observed correlation is true rather than spurious.

8.9.5 Fallacies of Alpha and Beta Estimation

Alpha estimation is central to detecting potentially enhanced returns, while beta estimation is central to measuring the nondiversifiable risks of investments. This section discusses three common misunderstandings about alpha estimation and two common misunderstandings regarding beta estimation. To the extent that analysts are ignoring these issues, their conclusions are likely to be unsupported.

THREE FALLACIES OF ALPHA ESTIMATION: Suppose that an analyst is studying a group of funds to identify possible investment opportunities that offer consistent superior risk-adjusted returns (ex ante alpha).

Fallacy 1. If all funds being analyzed can reasonably be assumed to have highly similar systematic risk exposures, then if the analyst identifies numerous funds with statistically better performance (e.g., 12 managers out of 100 in a test with a 5% level of significance), the analyst should infer that some of the superior performance is attributable to managerial skill.

This conclusion is inaccurate. The results can be explained, and probably are explained, by the distribution of the unexplained returns being non-normal. The managers could all be skilled, all be unskilled, or be any combination in between. In fact, even if every fund manager studied had superior skill and there was absolutely no luck involved, if the skill differentials were normally distributed, only 5% of the managers on average would have statistically higher-than-average returns within the sample. The lesson is this: Returns should be analyzed using a risk-adjusted standard, such as a benchmark or an asset pricing model of efficiently priced assets, rather than compared to each other, and the results should be visually examined.

Fallacy 2. If the analyst examines an investment and estimates ex post alpha as the intercept of a time-series regression of the investment's returns using a multifactor asset pricing model, then a statistically positive alpha indicates that the investment earned a higher-than-average risk-adjusted return.

This conclusion is inaccurate. The test is a joint hypothesis of the appropriateness of the particular model of returns and of whether a particular fund has ex ante alpha. The observed result can be explained by model misspecification. It is very possible that the omission of a type of a systematic risk factor will cause the estimate of idiosyncratic performance, or alpha, to contain returns from bearing systematic risk. Thus, some of the funds being analyzed may have simply speculated on a risk that this model ignores, and happened to benefit from that risk with higher returns.

The lesson is this: A hypothesis test is usually based on critical assumptions, so a test using a particular asset pricing model is only as reliable as the model itself.

Fallacy 3. Assuming that the asset pricing model is well specified, meaning it correctly captures and models all important systematic risks, if a statistically significant positive alpha is estimated using a significance level of 1%, we can conclude that there is a 99% chance that the investment had a positive ex ante alpha, which denotes managerial skill.

This conclusion is inaccurate. As detailed in this chapter, the level of significance used in a hypothesis test is not the probability that the null hypothesis is false if a statistically significant result is found. The proper conclusion is that with a well-specified model, a fund that has zero ex ante alpha has only a 1% chance of being incorrectly estimated as having a nonzero ex ante alpha.

TWO FALLACIES OF BETA ESTIMATION: Beta estimation fallacies include the third fallacy of alpha estimation: that a statistically significant result with a significance level of 10% indicates that the null hypothesis has a 90% chance of being false. This section lists two additional common fallacies.

Fallacy 1. If an analyst performs a test of the relationship between a particular return series and a potential return factor, a consistent result that the coefficient is statistically equal to zero means that the investment's return was not related to that return factor, according to the observed data.

This conclusion is inaccurate. Traditional correlation measures indicate a linear response between the variables but may not capture some nonlinear relationships, such as U-shaped or V-shaped relationships. For example, the correlation between the returns of an at-the-money option straddle and the returns of the underlying assets may be zero, since the V-shaped relationship generates positive returns for large increases or decreases in the underlying asset. The lesson is that alternative assets tend to contain nonlinear risk exposures and that complex statistical techniques suited to studying nonlinear relationships may need to be employed.

Fallacy 2. A statistically significant nonzero beta in a well-specified model indicates that the return factor causes at least part of the investment's return.

This conclusion is inaccurate. Correlation can be different from causation. The price levels of most goods measured over the past century tend to be highly correlated because of inflation in the currency used to measure the prices. Thus, the long-term price level of gold might be highly correlated with the price level of a haircut, but neither of the prices causes the other price. The lesson is that economic intuition should play a role alongside empirical techniques to avoid misinterpretation of spurious correlation and to lessen the possibility of data dredging.

To conclude this chapter, recall the Trick Fund example, which introduced Section 8.7. Can it be determined whether the Trick Fund offers ex ante alpha on the basis of empirical analysis alone? The answer is probably not. The reported returns for Bernard L. Madoff Investment Securities LLC generated an incredibly definitive empirical proof of ex ante alpha. However, the reported investment performance turned out to have been fictitious and fraudulent. Generally, high-quality alternative investment analysis requires economic reasoning as well as statistical and quantitative analysis.

Review Questions

  1. Provide two common interpretations of the investment term alpha.

  2. Provide two common interpretations of the investment term beta.

  3. Does ex ante alpha lead to ex post alpha?

  4. What are the two steps to an analysis of ex ante alpha using historical data?

  5. List the three major types of model misspecification in the context of estimating systematic risk.

  6. What is the goal of an empirical investigation of abnormal return persistence?

  7. What is the term for investment products designed to deliver systematic risk exposure with an emphasis on doing so in a highly cost-effective manner?

  8. Does an analyst select a p-value or a significance level in preparation for a test?

  9. What is the relationship between selection bias and self-selection bias in hedge fund data sets?

  10. What are two methods of detecting outliers in a statistical analysis?

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset