Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5
Simulation Modeling

Simulation is a widely used technique for portfolio risk assessment and management. Portfolio exposure to different factors is often evaluated over multiple scenarios, and portfolio risk measures such as value-at-risk are estimated. Generating meaningful scenarios is an art as much as a science, and presents a number of modeling and computational challenges.

This chapter reviews the main ideas behind Monte Carlo simulation and discusses important issues in its application to portfolio management, such as the number of scenarios to generate and the interpretation of output.

5.1 Monte Carlo Simulation: A Simple Example

As we explained in Chapter 2, the analysis of risk is based on modeling uncertainty, and uncertainty can be represented mathematically by probability distributions. These probability distributions are the building blocks for simulation models. Namely, simulation models take probability distribution assumptions on the uncertainties as inputs, and generate scenarios (often referred to as trials) that happen with probabilities described by the probability distributions. They then record what happens to variables of interest (called “output variables”) over these scenarios, and let us analyze the characteristics of the output probability distributions (see Exhibit 5.1). In the financial context, inputs may be interest rate levels, market returns, and so on, and the output variable can be a portfolio return or the return on a financial instrument.

Diagram of a typical monte carlo system with two line graphs, one bar graph, and a flow diagram with Deterministic model and Input probability distributions with arrows to Simulation (Scenario Generation) connected to Output probability distribution. An arrow pointing to itself in a circle is at the top of the flow diagram. — **Exhibit 5.1** A typical Monte Carlo simulation system.

Let us start with a simple example. Suppose you want to invest $1,000 in the U.S. stock market for one year. To do so, you decide that you want to invest in a stock index that represents the performance of the stock market. You invest in a mutual fund whose investment objective is to reproduce the return performance on the S&P 500. A mutual fund with such an objective is referred to as an index fund. We will denote the initial investment, or capital, invested in the index fund as $c05-math-0001$ (i.e., $c05-math-0002$ ). How much money do you expect to have at the end of the year? Let us label the amount of capital at the end of the year by $c05-math-0003$ .¹ Note that $c05-math-0004$ will be a random variable because it will depend on how the market (i.e., the S&P 500) performs over the year. In fact, if we let $c05-math-0005$ denote the market return over the time period [0,1], then $c05-math-0006$ will equal

or, equivalently,

The return $c05-math-0009$ over a time period [t, t + 1] can be computed as

where $c05-math-0011$ and $c05-math-0012$ are the values of the S&P 500 at times t and t + 1, respectively, and $c05-math-0013$ is the amount of dividends paid over that time period. In this example, we can think of $c05-math-0014$ and $c05-math-0015$ as the S&P 500 index levels at the beginning (t = 0) and at the end of the year (t = 1), respectively, and $c05-math-0016$ is 0.

To estimate the end-of-year capital, you can guess the return on the market, and compute the resulting value for $c05-math-0017$ . However, this would give you only a point estimate of the possible values for your investment. A more sophisticated approach is to generate scenarios for the market return over the year, and compute $c05-math-0018$ in each of these scenarios. In other words, you can represent future market returns by a probability distribution,² generate scenarios that are representative of this probability distribution, and then analyze the resulting distribution of your end-of-year capital. The resulting probability distribution of $c05-math-0019$ will be a set of scenarios itself. You can create a histogram of the outcomes, that is, collect the outcomes of the scenarios into nonoverlapping bins and draw bars above all bins with heights corresponding to the percentage of times outcomes in each bin were obtained in the simulation. This will allow you to visualize the approximate probability distribution of $c05-math-0020$ , and analyze it with the statistical measures described in Chapter 2.6 (central tendency, skew, variability, and so on). The distribution for $c05-math-0021$ from the simulation will be only an approximation, because it will depend both on the number of scenarios and on the set of scenarios you generated for $c05-math-0022$ . Intuitively, if you generate 1,000 scenarios that cover the possible values for $c05-math-0023$ well, you would expect to obtain a better representation of the distribution of $c05-math-0024$ than if you generated only two scenarios.

5.1.1 Selecting Probability Distributions for the Inputs

In the simple simulation example we just discussed, the estimate of the probability distribution of $c05-math-0025$ is affected by the assumptions made about the probability distribution of the input $c05-math-0026$ . How should you select the probability distribution of the input $c05-math-0027$ to the simulation?

One possible starting point is to look at a historical distribution of past returns, and assume that the future will behave in the same way. When creating scenarios for future realizations, then, you can draw randomly from historical scenarios.

Another possibility is to assume a particular probability distribution for future returns, and use historical data to estimate the parameters of this distribution, that is, the parameters that determine the specific shape of the distribution, such as the expected value (μ) and standard deviation (σ) for a normal distribution (see Chapter 2.4); $c05-math-0028$ for a generalized lambda distribution (see Chapter 3.2.3); or α and β for a beta distribution (see Chapter 3.1.9). For example, if you assume a normal distribution for returns, then you can use the historical variability of returns as a measure of the standard deviation σ of this normal distribution, and the historical average (mean) as the expected return μ of the normal distribution.

A third approach is not to start out with a particular distribution, but to use historical data to find a distribution for returns that provides the best fit to the data. As we mentioned in Chapter 2.11.4, the chi-square hypothesis test is one possible goodness-of-fit test. Other goodness-of-fit tests include the Kolmogorov-Smirnov (K-S) test, the Anderson-Darling (A-D) test, and root-mean-squared-error (RMSE).³ Many software packages, including R, have commands that can test the goodness of fit for different probability distributions.

Yet a fourth way is to ignore the past and look forward, constructing a probability distribution based on your subjective guess about how the uncertain variable in your model will behave. For example, using the beta distribution from Exhibit 3.9(a) to model the future market return will express a more pessimistic view about the market than using the beta distribution in Exhibit 3.9(b) or a normal distribution, because most of the probability mass in the distribution in (a) is to the left, so low values for return will happen more often when scenarios are generated.

It is important to realize that none of these approaches will provide “the answer.” Simulation is a very useful tool for modeling uncertainty, but the outcome is only as good as the inputs provided to the model. The art of simulation modeling is in providing good inputs and interpreting the results carefully.

5.1.2 Interpreting Monte Carlo Simulation Output

For purposes of our example, let us assume that the return on the market over the next year will follow a normal distribution. (This is a widely used assumption in practice, despite the fact that few empirical studies find evidence to support it.) Suppose that the S&P 500 has historically returned 8.79% per annum on average, with a standard deviation of 14.65%. We will use these numbers as approximations for the average return and the standard deviation of the return on your investment in the stock market over the next year. Relying on historical data is flawed, but is a reasonable starting point.

Let us discuss the output one would obtain after generating 100 scenarios for the market return over the next year. (Note that to generate these scenarios, we simply need to draw 100 numbers from a normal distribution with mean 8.79% and standard deviation 14.65%.⁴) The input to the simulation would then be a sequence of 100 numbers such as

0.0245
–0.1561
0.1063
0.1300
–0.0801
0.2624
0.2621
0.0824
0.1358
0.1135
0.0605
…

The output graph would look like Exhibit 5.2. Summary statistics obtained based on the 100 values of the distribution are provided to the right of the graph.⁵

Image described by caption and surrounding text. — **Exhibit 5.2** Histogram and summary statistics for the end-of-year distribution of 100 simulated values for $1,000 invested at the beginning of the year.

If historical trends hold, you would expect to have $1,087.90 on average at the end of the first year. The standard deviation of the end-of-year capital you would expect is $146.15, that is, on average, you would expect to be $146.15 off the mean value. With 5% probability, you will not be able to make more than $837 (the 5th percentile of the distribution), and with 95% probability you will make less than $1,324 (the 95th percentile of the distribution). The skewness is close to 0 and the kurtosis is close to 3, which means that the simulated distribution is close to normal. (In fact, the output distribution is normal. This is because the input distribution we provided for the simulation of this simple relationship was normal and the relationship between the market return and the end-of-year capital is a simple linear expression. However, the estimate from the simulation will never be perfectly accurate.)

Be careful with the interpretation of minima and maxima in a simulation. Theoretically, the minimum and maximum we could have obtained in this simulation are negative and positive infinity because the probability distribution for the return (the normal distribution) has an infinite range. We did not obtain a particularly small minimum or a particularly large maximum because we only simulated 100 values. While not completely accurate mathematically, a rule of thumb is that an event in the tail of the distribution with probability of occurring of roughly less than 1/100 would be unlikely to appear in this set of simulated values. The minimum and the maximum are highly sensitive to the number of simulated values and whether the simulated values in the tails of the distribution provide good representation for the tails of the distribution. There are smart ways to simulate scenarios so that the tails are well represented but the minimum and the maximum values obtained in a simulation should nevertheless be interpreted with care.

In Chapter 2.11, we explained the statistical concept of confidence interval (CI) estimates. The main idea was the following: in statistics, when we want to estimate a specific parameter of a distribution, such as the mean, we take a sample and observe what the value of the parameter is in the sample (in technical terms, we record the value of the sample statistic for the mean). Instead of reporting a single value for our estimate for the mean, however, we could report an interval whose length is related to the probability that the true distribution parameter indeed lies in that interval.

Simulation is very similar to statistical sampling in that we try to represent the uncertainty by generating scenarios, that is, “sampling” values for the output parameter of interest from an underlying probability distribution. When we estimate the average (or any other parameter of interest) of the sample of scenarios, we run into the same issue statisticians do—we need to worry about the accuracy of the estimate. To compute a 95% CI estimate for the average end-of-year capital, we use the 95% CI formula from Chapter 2.11.2, and substitute the values obtained from the simulation statistics: $c05-math-0029$ , $c05-math-0030$ and $c05-math-0031$ . The value for $c05-math-0032$ for 95% CI is the value of the 97.5th percentile of the standard t-distribution with 99 degrees of freedom, which is 1.98. The 95% CI is therefore

Therefore, if the 100 scenarios were independent when generated, we can be 95% confident that the true average end-of-year capital will be between $1,058.90 and $1,116.90. It just happens that because of the simplicity of the example, we know exactly what the true mean is. It is $c05-math-0034$ , because 8.79% was assumed to be the true mean of the distribution of returns (see Chapter 2.8 for calculating means of functions of random variables), and it is indeed contained inside the 95% CI. In 5% of all possible collections of 100 scenarios, however, we will be unlucky to draw a very extreme sample of scenarios, and the true mean will not be contained in the confidence interval we calculate. Note that if we had calculated a 99% confidence interval, then the true mean will not be contained in the confidence interval in only 1% of the cases. If we generated $c05-math-0035$ (instead of n) scenarios, then the 95% confidence interval's length would be half of the current length. (This is because the square root of the number of scenarios is contained in the denominator of the expression that determines the length of the confidence interval.) We revisit the issue of confidence interval estimation and the implications for accuracy again later in this chapter when we talk about the number of scenarios needed in a simulation.

Drawing “independent” samples from distributions is not the most efficient way to simulate random numbers that provide good representation of the underlying probability distribution. Most simulation engines nowadays use sophisticated methodology that estimates parameters from the distribution of output variables of interest a lot more accurately. The CI formula we used above is a conservative bound, rather than an exact estimate, for the actual accuracy in estimating the mean. Still, it is a useful benchmark to have.

5.2 Why Use Simulation?

The example in the previous section illustrated a very basic Monte Carlo simulation system. We started out with a deterministic model that involved a relationship between an input variable (market return $c05-math-0036$ ) and an output variable of interest (capital at the end of one year $c05-math-0037$ ). We modeled the input variable as a realization of a probability distribution (we assumed a normal distribution), generated scenarios for that input variable, and tracked what the value of the output variable was in every scenario by computing it through the formula that defines the relationship between $c05-math-0038$ and $c05-math-0039$ . This is the general form of simulation models illustrated in Exhibit 5.1.

Despite its simplicity, this example allows us to point out one of the advantages of simulation modeling over pure mathematical modeling. Simulation enables you to evaluate (approximately) a function of a random variable. In this case, the function is very simple—your end-of-year capital, $c05-math-0040$ , is dependent on the realization of the returns through the equation $c05-math-0041$ . If you are given a probability distribution for $c05-math-0042$ , in some cases you can compute the probability distribution for $c05-math-0043$ in closed form. For example, if $c05-math-0044$ followed a normal distribution with mean $c05-math-0045$ and standard deviation $c05-math-0046$ , then $c05-math-0047$ would follow a normal distribution, too, with mean $c05-math-0048$ and standard deviation $c05-math-0049$ .

However, if $c05-math-0050$ did not follow a normal distribution, or if the output variable $c05-math-0051$ were a more complex function of the input variable $c05-math-0052$ , it would be difficult and in some cases impossible to derive the probability distribution of $c05-math-0053$ from the probability distribution of $c05-math-0054$ in closed form. Using simulation simplifies matters substantially.

There are three other important advantages of simulation that can only be appreciated in more complex situations. The first one is that simulation enables us to visualize a probability distribution resulting from compounding probability distributions for multiple input variables. The second is that it allows us to incorporate correlations between input variables. The third is that simulation is a low-cost tool for checking the effect of changing a strategy on an output variable of interest. Next, we extend the investment example to provide illustrations of such situations.

5.2.1 Multiple Input Variables and Compounding Distributions

Suppose now that you are planning for retirement and decide to invest in the stock market for the next 30 years (instead of only the next year). Suppose that your initial capital is still $1,000. You are interested in the return (and, ultimately, in the end-of-year capital, $c05-math-0055$ ) you will have after 30 years.

Let us assume that every year, your investment returns from investing in the S&P 500 will follow a normal distribution with the mean and standard deviation from the example in Section 5.1.2. The final capital you have will depend on the realizations of 30 random variables—one for each year you are invested in the market.⁶ We found through simulation in Section 5.1.2 that the probability distribution of the capital at the end of the first year will be normal. What do you think the probability distributions for the total return and the capital at the end of the 30th year will look like? Will they be normal?

An investment of $1 at time 0 will grow to $c05-math-0056$ dollars at the end of year t, and the total return $c05-math-0057$ from time 0 to time t equals

Interestingly, the probability distribution of $c05-math-0059$ is not normal, and neither is the distribution of the capital at the end of 30 years. (The distribution of the capital is basically a scaled version of the distribution of total return, since it can be obtained as $c05-math-0060$ , and the initial capital $c05-math-0061$ is a constant (nonrandom) number.) In general, here are some useful facts to keep in mind when dealing with multiple input probability distributions:

When a constant is added to a random variable, as in 1 added to the random variable $c05-math-0062$ , the distribution of $c05-math-0063$ has the same shape as the distribution of $c05-math-0064$ ; however, it is shifted to the right by 1.
As we saw in Chapter 2.8, when a random variable is added to another random variable (e.g., $c05-math-0065$ ), we cannot simply “add” the two probability distributions. In fact, even in cases when the two distributions have the same shape, the probability distribution of the sum of the random variables does not necessarily have the same shape. There are some exceptions—for instance, if we add two independent normal random variables, the probability distribution of the sum is normal. However, holding aside this case, this is not true in general.

In our example, we are multiplying two random variables, $c05-math-0066$ and $c05-math-0067$ , in order to obtain the total return. Products of random variables are even more difficult to visualize than sums of random variables. Again, it virtually never happens that a product of several random variables, even if the random variables all follow the same probability distributions, results in a random variable with that same probability distribution. The lognormal distribution, which we introduced in Chapter 3.1.4, is a rare exception, and this is one of the reasons that the lognormal distribution is used very often in financial modeling.

Fortunately, simulation makes visualizing the probability distribution of the product easy. Exhibit 5.3 presents the output distribution of the capital at the end of 30 years. We can observe (both from the graph and from the statistics for skewness and kurtosis) that the distribution is very skewed, even though the distributions for individual returns in each of the 30 years were symmetric (normal).

Histogram of total amount of money in account with Values x 10^-5 on the y-axis and Values in Thousands on the x-axis. There is a table of statistics on the right with two columns. — **Exhibit 5.3** Output distribution for amount of capital after 30 years.

5.2.2 Incorporating Correlations

Let us now complicate the situation more. Suppose that you have the opportunity to invest in stocks and Treasury bonds over the next 30 years. Suppose that today you allocate 50% of your capital to the stock market by investing in the index fund, and 50% in bonds. Furthermore, suppose over the 30 years you never rebalance your portfolio (i.e., you do not change the allocation between stocks and bonds). What will be the total amount in your portfolio after 30 years?

Historically, stock market and the Treasury bond market returns have exhibited extremely low, but often statistically significant, negative correlation. This is because these two asset classes tend to move in opposite directions. When the stock market is performing poorly, investors tend to move their money to what they perceive to be safer investments such as bonds; conversely, when the stock market is performing well, investors tend to reallocate their portfolios, increasing their allocation to the stock market and reducing their allocation to bonds.

Visualizing the impact of multiple input variables at the same time and incorporating correlations between these variables is very difficult to do in an analytical way. Simulation eliminates the need for complex mathematics but preserves the benefits of creating richer and more accurate models. Correlations can be incorporated both implicitly (by generating joint scenarios for realizations of input variables, for example, by sampling from observed past data) and explicitly (by specifying a correlations matrix as an input to the simulation). Here, we give an example in which the correlations are specified as an input.

Let us assume that the correlation between the stock market and the Treasury bond market returns will be about –0.2. Let us also assume for the purpose of this exercise that the annualized return on the Treasury bonds in your portfolio will be normally distributed with mean 4% and standard deviation 7%. Therefore, the returns on the stock market and the bond market follow a multivariate normal distribution with correlation coefficient $c05-math-0068$ .

Exhibit 5.4 shows the output distribution for the total amount of money in your account after generating 5,000 scenarios for stock market (as measured by the S&P 500) returns and Treasury bond returns over 30 years. The shape of the distribution of the capital available after 30 years is similar to the shape of the distribution from Exhibit 5.3; however, the variability (in terms of standard deviation) is smaller.

5.2.3 Evaluating Decisions

In the end, the goal of using simulation is to help us make decisions. Is a 50–50 portfolio allocation in stocks and bonds “better” than a 30–70 allocation? We refer to the former allocation as Strategy A, and to the latter as Strategy B. Let us evaluate the distribution of the capital at the end of 30 years for each allocation strategy, and use that knowledge to decide on the “better” allocation. Notice that it is unclear what “better” means in the context of uncertainty. We need to think about whether “better” for us means higher return on average, lower risk, acceptable trade-off between the two, and so on. Exhibit 5.5 contains the summary statistics of the simulated capital at the end of 30 years with each allocation over 5,000 scenarios.

We can observe that although Strategy A performs better than Strategy B as evaluated based on the mean capital at the end of 30 years ($7,905.30 for Strategy A versus $6,040.17 for Strategy B), Strategy A's standard deviation is higher ($5,341.57 versus $3,219.06). In terms of risk/return trade-off, as measured by the coefficient of variation,⁷ Strategy A's CV is $c05-math-0069$ , whereas Strategy B's CV is $c05-math-0070$ , which makes Strategy A appear riskier than Strategy B. This is apparent also from the overlay chart shown in Exhibit 5.5—much of the mass of the histogram for Strategy B is contained within the histogram for Strategy A, which means that Strategy B has less variability and results in less extreme outcomes than Strategy A.

The standard deviation may not be a good measure of risk when the underlying distributions are asymmetric. Strategy A's 5th percentile ($2,930.51), for example, is higher than Strategy B's 5th percentile ($2,809.56), meaning that if you are concerned with events that happen with 5% probability, Strategy A would be less risky. Strategy A also has a higher upside—its 95th percentile ($17,834.32) is higher than Strategy B's 95th percentile ($11,940.22).⁸ The fact that Strategy A has a high upside “penalizes” its standard deviation relative to the standard deviation of Strategy B because it results in more outcomes that are far away from the mean. A high standard deviation is not necessarily a bad thing if the largest deviations from the mean happen on the upside.

It should be clear from the discussion so far that the summary statistics do not tell the whole story. It is important to look at the entire distribution of outcomes. Suppose now that we would like to compare Strategy A to Strategy B on a scenario-by-scenario basis. In what percentage of scenarios does Strategy B perform better than Strategy A? One efficient way to answer this question is to create an additional variable, Difference (A-B), that keeps track of the difference between the capital at the end of 30 years from Strategy A and from Strategy B during the simulation. Exhibit 5.6 shows a histogram of Difference (A-B) and presents its summary statistics.⁹

It is interesting to observe that even though Strategy A appeared riskier than Strategy B based on the summary statistics in Exhibit 5.5 (Strategy A's standard deviation was almost twice the standard deviation of Strategy B), Strategy A results in lower realized outcomes than Strategy B in only 10.2% of the 5,000 generated scenarios. (As the graph in Exhibit 5.6 illustrates, 10.2% of the 5,000 scenarios for Difference (A–B) have values less than zero.) This perspective on the risk of one strategy versus the risk of another is valuable because it can substantially impact the final decision on which strategy to choose. For example, the problematic scenarios can be specifically identified, and in some situations, managerial action can be taken to avoid them. A strategy that appears riskier may therefore be selected if it is desirable for other qualitative reasons.

When comparing two alternative decisions under uncertainty, it is technically correct (and fair) to evaluate them under the same set of scenarios. For example, when obtaining the summary statistics for Strategy A and Strategy B earlier, we should have used the same set of 5,000 scenarios for both. This would eliminate circumstances in which we happened to generate more favorable scenarios when evaluating one of the strategies than the other, which would lead us to conclude erroneously that the strategy evaluated over the more favorable set of scenarios is better.

In principle, if we generate a huge number of scenarios for the two strategies, even if the two sets of scenarios are not the same, the estimates will be quite accurate. However, generating a large number of scenarios is time consuming. Moreover, even a difference of a few digits after the decimal point may be significant in some financial applications.

One way to simulate the same set of scenarios multiple times is to specify the seed of the simulation. The seed determines the first random number that gets generated in the simulation. We explain what the seed of the simulation is in Section 5.4. For now, it is only important to understand that it determines the first random number that gets generated in a simulation.

By default, most simulation packages change the seed every time they run a simulation. If we enter a particular number for the seed (depending on the software, different ranges for possible seed values are available), we will be fixing the first scenario that will be generated, which would enable the software to generate the same sequence of scenarios again the next time it runs the simulation.¹⁰

Alternatively, in a statistical modeling language like R, we can dedicate variables in our program that store the generated scenarios, and then evaluate all the different strategies (Strategy A and Strategy B in this example) over that same set of scenarios. Storing 5,000 scenarios is not a problem, but if there are multiple variables and the number of scenarios increases substantially, we could run into memory problems.

5.3 How Many Scenarios?

A simulation may not be able to capture all possible realizations of uncertainties in the model. For instance, think about the distribution of the end-of-year capital in Section 5.1.1. As we explained in Section 5.1.2, the possible number of values for the simulation output variable—the end-of-year capital—is technically infinite. Thus, we could never obtain the exact distribution of $c05-math-0071$ or the exact expected value of $c05-math-0072$ by simulation. We can, however, get close. The accuracy of the estimation will depend on the number of generated scenarios. As we discussed in Section 5.1.2, if the scenario generation is truly random, then the variability (the standard error) of the estimate of the true expected value will be $c05-math-0073$ , where s is the standard deviation of the simulated values for the output variable and n is the number of scenarios.

Hence, to double the accuracy of estimating the mean of the output distribution, we would need to quadruple (roughly) the number of scenarios. For instance, in the example in Section 5.1.2, we generated 100 scenarios, calculated that the average capital after one year is $1,087.90, and estimated the 95% CI for the average capital as ($1,058.90, $1,116.90). We concluded that we can be 95% confident that the true expected capital will be between $1,058.90 and $1,116.90; that is, that the true mean will not be further than $29 from the mean estimated from the simulation ($1,087.90). Now suppose that we had obtained the same numbers for sample mean ($1,087.90) and sample standard deviation ($146.15) but we had generated four times as many scenarios (400). The 95% CI would have been

This¹¹ means that we could be 95% confident that the true mean would not be more than $14.37 from the simulated mean of $1,087.90, which is about half of the amount by which we could be off ($29) when we generate 100 scenarios. Therefore, our accuracy has increased about twofold after quadrupling the number of generated scenarios.

Increasing the number of scenarios to improve accuracy can get expensive computationally, especially in more complicated multiperiod situations such as the simulation of a 30-year investment in Section 5.2.1. Fortunately, there are modern methods for generation of random numbers and scenarios that can help reduce the computational burden.

Although the average output from a simulation is important, it is often not the only quantity of interest, something that practitioners tend to forget when using simulation to value complex financial instruments. When evaluating the value-at-risk or the conditional value-at-risk of a portfolio, for example, a portfolio manager may be interested in the percentiles of the distribution of possible portfolio returns. Unfortunately, it is not as straightforward to determine the accuracy of estimates of percentiles and other sample statistics from a simulation. There are some useful results from probability theory that apply,¹² and we can use bootstrapping, as described in Chapter 2.11.3. However, in general, the question of how many scenarios one should generate to get a good representation of the output distribution does not have an easy answer. This issue is complicated further by the fact that results from probability theory do not necessarily apply to many of the scenario-generating methods used in practice, which do not simulate “pure” random samples of observations, but instead use smarter simulation methods that reduce the number of scenarios needed to achieve good estimate accuracy.

5.4 Random Number Generation

Contrary to what many would imagine, coming up with truly random numbers is difficult and time consuming. Moreover, the ability to reproduce the random number sequence and to analyze the random number characteristics is actually a desirable property for random number generators. In particular, the ability to reproduce a sequence of random numbers allows for reducing the variance of estimates and for debugging computer code by rerunning experiments in the same conditions in which they were run in previous iterations of code development.

Most simulation software employs random number generation algorithms that produce streams of numbers that appear to be random, but in fact are a result of a clearly defined series of calculation steps in which the next “random number” $c05-math-0076$ in the sequence is a function of the previous “random number” $c05-math-0077$ , that is, $c05-math-0078$ . As mentioned earlier, the sequence starts with a number called the seed, and if the same seed is used in several simulations, each simulation sequence will contain exactly the same numbers, which is helpful for running fair comparisons between different strategies evaluated under uncertainty. It is quite an amazing statistical fact that some of these recursion formulas (named pseudo-random number generators) define sequences of numbers that imitate random behavior well and appear to obey (roughly) some major laws of probability, such as the Central Limit Theorem (Chapter 2.11.1).

Generating random numbers from a wide variety of distributions reduces to generating random numbers from the continuous uniform distribution on the unit interval [0,1], that is, to generating random numbers on the interval [0,1] in such a way that each value between 0 and 1 is equally likely to occur.¹³ Many computer languages and software packages have a command for generating a random number between 0 and 1: =RAND() in Microsoft Excel, runif(n, min=0, max=1) in R, rand(1) in MATLAB and FORTRAN, and rand() in C++.

A truly random number generator may produce clustered observations, which necessitates generating many scenarios in order to obtain a good representation of the output distribution of interest. Quasi-random (also called low discrepancy) sequences as well as a variety of so-called variance reduction techniques are used to speed up execution and improve accuracy in simulations. Most simulation packages implement such techniques. The interested reader is referred to Chapters 4 and 14 in Pachamanova and Fabozzi (2010) for more information.

Summary

Monte Carlo simulation is a valuable tool for evaluating functional relationships between variables, visualizing the effect of multiple correlated variables, and testing strategies.
Monte Carlo simulation involves creating scenarios for output variables of interest by generating scenarios for input variables for which we have more information.
The art of Monte Carlo simulation modeling is in selecting input probability distributions wisely and interpreting output distributions carefully.
The distributions of output variables can be analyzed through statistical summaries. Statistics of interest include measures of central tendency (average, median, mode), measures of volatility (standard deviation, percentiles), skewness, and kurtosis.
Minima and maxima from simulations should be interpreted with care because they often depend on the input assumptions and are very sensitive to the number of trials in the simulation.
The accuracy of estimation through simulation is related to the number of generated scenarios. Unfortunately, the relationship is nonlinear—in order to double the accuracy, we need to more than quadruple the number of scenarios.
Random number generation is not trivial, and simulation software packages do not produce truly “random” numbers. There is value, however, in generating random number sequences that are replicable, and thus not completely random.

³ There is no rule for which goodness-of-fit test is “best.” Each of them has advantages and disadvantages. The chi-square test is the most general one, and can be used for data that come from both continuous and discrete distributions; however, to calculate the chi-square test statistic, one needs to divide the data into “bins,” and the results depend strongly on how the bins are determined. The K-S and the A-D tests apply only for continuous distributions. They do not depend on dividing the data into bins, so their results are less arbitrary. The K-S statistic is concerned primarily with whether the centers of the empirical and the expected distribution are “close,” whereas A-D focuses on the discrepancy between the tails of the observed and the expected distribution. For all three tests, the smaller the value of the test statistic, the closer the fit is. (As mentioned in Chapter 2.11.4, most statistical software packages report a p-value, that is, a “probability,” in addition to a test statistic. The larger the p-value, the closer the fit.) Finally, the RMSE measures the squared error of the differences between observed and the expected values. The smaller the number, the better, but the actual magnitude of the RMSE depends on the distribution and data at hand.

⁴ These scenarios can be generated with a number of software packages. If one uses only Excel, for example, then a random number from a normal distribution with mean μ and standard deviation σ can be generated with the formula =NORMINV(RAND(),μ,σ), or =NORMINV(RAND(),0.0879,0.1465) in this example. Excel has limited simulation capabilities but there are simulation add-ins to Excel such as @RISK and Crystal Ball that make the simulation of a wide range of probability distributions easy. In R, one can use the rnorm function. Specifically, the command scen <− rnorm(100, 0.0879, 0.1465) will generate 100 random numbers from a normal distribution with mean 0.0879 and standard deviation 0.1465, and will store them in the array scen for further analysis.

¹⁰ To set the seed in R, one uses the command set.seed, for example, set.seed(123). In Excel, this is not straightforward because one can only generate one random variable at the time. Excel add-ins for simulation all have the ability to set the seed by modifying the simulation settings.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5: Simulation Modeling

Create new playlist

Sign In

Sign Up