Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

9
Simulation

In many practical situations, particularly also in reinsurance, the model assumptions or the number of model ingredients and their interactions are too complex to allow for an explicit calculation (or direct approximation) of quantities of interest. Many such quantities can, however, be expressed as expectations of a random variable X,¹ for which the distribution is specified through the model assumptions (albeit typically as a complicated interaction of other random variables, so that the direct calculation is impossible). For such situations, one can generate independent samples of that random variable and estimate the expectation by the arithmetic mean of those sample values. This is the core idea of Monte Carlo (MC) simulation. In view of its importance in reinsurance applications, we will discuss some of the main ideas in this chapter. For links to more detailed surveys on the topic see the Notes at the end of the chapter.

9.1 The Monte Carlo Method

Assume that for some random variable X we want to estimate , and we know that this value is finite. If we are able to generate n independent sample values X_i, then the strong law of large numbers guarantees that, with probability 1,

The estimator is unbiased and strongly consistent. If also , then by the Central Limit Theorem, in distribution,

That is, for large n the sample mean has the approximate distribution

respectively

This shows that the convergence rate of the Monte Carlo method is of the order images .² Clearly, the error bounds for the Monte Carlo estimator will by construction always be probabilistic (i.e., not certain). From Φ(1.96) = 0.975 for the standard normal c.d.f. one then receives the approximate 95% confidence interval for :

(9.1.1)

and this confidence interval is the usual way Monte Carlo estimates are reported. In order to state this interval, one therefore needs to estimate the variance, and this is done by the (unbiased) sample variance of the generated replications:

Note that adding one decimal place of precision requires 100 times as many replications, so for a high accuracy of the estimate very large sample sizes are needed, which can be very time‐consuming. One alternative to reduce the size of the confidence interval is to reduce the variance of the estimator, which is sometimes possible with some smart ideas and workarounds. We will discuss such possibilities in Section 9.2.

In many cases relevant for reinsurance purposes, one in fact wants to estimate a probability for a certain event A (e.g., the event that the aggregate loss exceeds some threshold). However, such a probability can also be expressed as the expectation over the indicator function of the event A, that is, . The corresponding Monte Carlo estimate for ℙ(A) is then simply the relative frequency of the occurrence of A among n independent experiments

Since the variance of a Bernoulli random variable is given by

(9.1.2)

the sample variance is and we then obtain the approximate 95% confidence interval for ℙ(A) in the form

(9.1.3)

For the implementation of the Monte Carlo method, one needs to produce random numbers through a deterministic algorithm (“pseudorandom numbers”) which imitates randomness to such a degree that the result is practically indistinguishable from truly random numbers, that is, that the respective statistical tests for randomness are passed. Since F_X(X) is uniform(0,1) distributed, the random number has distribution F_X and so it typically suffices to focus on the generation of uniform pseudorandom numbers (even if in many circumstances there are more efficient algorithms than this inversion method available to generate a random sample with distribution F_X, particularly when the inverse function is cumbersome to work with).³ Since for the generation of one sample value of X one may in fact need many pseudorandom numbers (either X can be generated more efficiently by combining several variables or, more typically, X itself depends on many further random variables which all need to be generated for one realization of X), it is also important that there are no significant deterministic patterns in a produced pseudorandom sequence if one looks at certain blocks of numbers. Developing pseudorandom number generators and studying their properties is a classical topic of mathematical research, and nowadays there are very efficient and quick pseudorandom number generators available, so that we can generally take it for granted that the generation of “sufficiently” random and independent samples is possible (see Korn et al. [497] for an overview). The computational bottleneck in this context is hence typically not the generation of the pseudorandom sequence, but the evaluation of the functions that need to be applied to these numbers.

Another interpretation of the MC method is as follows: if the distribution of X (e.g., as a function of many other, possibly dependent, random variables) is not available explicitly, one can simulate n realizations X₁, …, X_n and build up an empirical c.d.f. from these n points by assigning a weight of 1/n to each (cf. (4.2.7)). Using as an approximation of the true F_X in calculations (e.g., for quantiles, expectations of functions of X, etc.) exactly corresponds to the MC estimator of the respective quantity.

Example 9.1

(Estimation of VaR) In Section 4.6 we discussed how to estimate quantiles from data. In applications, many risk factors can contribute to the distribution of the final random variable X of interest for which the high quantile Q(α) = VaR_α(X) needs to be approximated. The crude MC estimator is obtained by simulating n replicates X₁, …, X_n with c.d.f. F_X, and reading off the quantile

from the respective empirical c.d.f. (cf. (4.2.7)). Assume that the density function f_X of X exists. Applying the central limit theorem for quantiles, one gets the 95% confidence interval for the true value of Q(α) by

However, the expression f_X(Q(α)) is hard to approximate (f_X is not available explicitly and Q(α) needs to be approximated by ). Even if f_X were available, the resulting interval length can be huge (e.g., if , the simulation of Q(0.995) = VaR_0.995(X) = 2.576 needs more than 366 million replications, if we want the confidence interval to be of length 0.001).

9.2 Variance Reduction Techniques

If Var (X) (respectively its estimate) in the confidence interval (9.1.1) is large, it can take a large number of replications to arrive at a satisfactory confidence interval. If we can replace the original estimate by another one with the same expectation, but smaller variance, then we can achieve an efficiency gain.⁴

Example 9.2

An example where the need for variance reduction becomes particularly obvious is rare event sampling, where one needs to estimate the probability of an event A which occurs with very low probability (a situation that frequently occurs in reinsurance applications). From (9.1.2), the variance of the crude MC estimator Z = 1_A is , and clearly as z ↘ 0. However, what matters is the relative error (coefficient of variation)

and the latter grows beyond any bound when z ↘ 0. If one wants to guarantee a certain relative precision of the MC estimate (in terms of the width of the confidence interval (9.1.3)), the sample size n needs to grow according to n ∼ const. z⁻¹ as z ↘ 0, and this can be infeasible for small z. The techniques illustrated in the examples below will lead to estimators Z^′ for z with bounded relative error or (slightly weaker) logarithmic efficiency for any ɛ > 0.

In the following we briefly discuss three popular variants of variance reduction.

9.2.1 Conditional Monte Carlo

Let Y be another random variable and can be calculated. Then

is an unbiased estimator of , and from

it follows that Var (Y₁) ≤ Var (X). Consequently, this always leads to a reduction of variance, but it will not always be easy to find an appropriate Y to serve the purpose.

We illustrate this approach with an impressive example for the simulation of tail probabilities of compound sums of heavy‐tailed risks:

Example 9.3 (Asmussen–Kroese estimator)

Assume that one wants to simulate an estimate of , where and X_i are i.i.d. and heavy tailed. Crude MC suggests simulation of n replicates of S(t) and reporting the fraction of the cases where the result exceeded u. This is clearly not efficient, particularly for large values of u (which, however, is often the region of interest in practical applications). If the claims X_i are subexponential, then we know from Chapter 3 that for large u the largest summand dominates the sum and is the crucial ingredient in the calculation. The ingenious idea is now to condition on the fact that the last term X_N(t) is the largest, simulate all but that largest term, and estimate z(u) from those terms, “calculating” rather than simulating that largest (crucial) term. If we denote and , then by symmetry

and one can formulate the conditional Monte Carlo estimator

9.2.2 Importance Sampling

One of the problems of crude MC estimation is that many generated sample points will not really be in the relevant region for the quantity to be approximated. For instance, in Example 9.1 for the estimation of the quantile, for very small or very large α, there will not be enough sample points close to the quantile (it is inefficient to determine the exact value of a replication if it is in any case far to one side of the quantile; only the fact that it lands on that side is relevant), and this adds to the variance of the estimator.

Assume again that F_X allows for a density f_X. The idea of importance sampling is now to switch from f_X to another density that concentrates more strongly on the region of interest (in Example 9.1 this would mean that has a lot of probability mass around the suspected value of Q(α)). Such a new density can be obtained from f_X by shifting, rescaling, twisting etc. The quantity is called the likelihood ratio function. We then have

We instead simulate n independent replicates from the new random variable (with density ) and use the importance sampling estimator

(9.2.4)

This represents a weighted MC estimator for with weights according to the likelihood ratio function, in order to “correct” for using the new density . The variance of this estimator is

so that we achieve a variance reduction whenever . Correspondingly, importance sampling will be most efficient, if heuristically

is large whenever x²f_X(x) is large,
is small whenever x²f_X(x) is small.

Furthermore, should be easy to evaluate and should be easy to simulate. The following example illustrates the efficiency gain.

Illustration: VaR of a gamma‐distributed random variable.

Consider the simulation of VaR_0.995(X) for the case where X has a gamma(3.3,0.9) density. If one has an a priori guess for that value, then one can use importance sampling to concentrate the probability mass into that region. For instance, a large deviation argument may suggest that the final value is 14.2. If one decides to use exponential twisting

(which is in essence an Esscher transform of X, cf. Chapter 7), then one can identify the value of θ for which , which turns out to be θ = 0.88. Figure 9.2 illustrates the efficiency gain of using the importance sampling estimate (9.2.4) compared to crude MC as a function of number of simulations. Note that although the first guess 14.2 on the true value was actually quite bad, the resulting twist of X is still much better than using the original distribution.

Figure 9.2 Estimation of VaR_0.995(X) for a gamma(3.3,0.9)‐distributed X as a function of n for crude MC and an importance sampling estimator.

Let us now turn to estimating the tail probability of an aggregate sum for large u and light‐tailed claims X_i. Then one can apply the exponential twisting idea described in the above illustration to the entire random variable S(t) (which modifies the distribution of both N(t) and X_i), to get the importance sampling estimate

where is the cumulant‐generating function of S(t). We can now choose θ(u) (i.e., define the amount of tilting) in such a way that under the new measure we expect S(t) to be equal to the threshold value u, so that⁵

For heavy‐tailed claim sizes, exponential twisting is not applicable because the moment‐generating function does not exist for any r > 0. For such cases it is popular to twist the hazard rate instead (further refinements are so‐called asymmetric hazard rate twisting and delayed hazard rate twisting, see Juneja and Shahabuddin [472]).

9.2.3 Control Variates

Assume that a random variable Y is available for which is known exactly and which is positively correlated with X. Then the deviation of the simulated from the exact value of Y may be used to correct the estimate for X. The identity indeed suggests the control variate MC estimator

where (X_i, Y_i) are independent copies of (X, Y). One immediately gets

so that using X_Y instead of the crude MC estimator will reduce the resulting variance whenever 2 Cov(X, Y) − Var (Y) ≥ 0. In particular, the closer Y is to X, the more variance of crude MC can be eliminated. If a suitable Y can be identified and the additional computation time for simulating Y is not excessive, this variance reduction technique can perform remarkably well.

Note that if Y is a control variate, so is a Y for any constant a > 0 and this suggests to choose a in the most favorable way, namely such that

is minimized. This leads to the optimal constant

and a resulting overall variance reduction of Cov²(X, Y)/(nVar (Y)). The corresponding relative variance reduction then amounts to , where ρ_{X, Y} is the correlation coefficient. In most cases, Cov(X, Y) and Var (Y) will not be known explicitly, so typically the latter values will be simulated as well (in a smaller pre‐run simulation or in parallel).

If a further control variate is available, one can iteratively apply the same procedure again and use

which will lead to a further reduced variance if

Case study: Dutch fire insurance data.

Figure 9.4 illustrates the stop‐loss premium , where the retention C = 919, 100, 000 is the VaR_0.995(S(365)) calculated in this model. One sees that the Asmussen–Kroese estimator is an impressive improvement over the crude MC estimator, but the control variate is a very minor additional improvement in this case. The reason for this is that the variability of N(t), which the control estimate Z_SL(u) improves upon, only makes up a small part of the overall variance, so that the efficiency gain is not considerable.

Figure 9.4 Dutch fire insurance data: confidence intervals for the simulation of for the compound Poisson case of Figure 6.2 as a function of n for crude MC (top left), Asmussen–Kroese estimator Z_SL(u) (top right), and its control variate improvement (bottom).

For further details on the above methods and further variance reduction techniques like antithetic variables and stratified sampling, see, for example, Asmussen and Glynn [59].

9.3 Quasi‐Monte Carlo Techniques

The simulation of one replicate of in fact involves k random variables (or even more if X_i is generated with the help of several pseudo‐random numbers). If the number of summands is itself random (as for ), then, for each replicate, N(t) needs to be simulated first and the outcome determines the resulting number of needed pseudo‐random numbers for this replicate. If s such pseudo‐random numbers are needed, one can interpret that each replication needs an s‐dimensional pseudorandom number, that is, (after suitable transformation) one uses a uniformly distributed point in the s‐dimensional unit cube [0, 1]^s. One advantage of MC methods (and partly the reason why they are so popular) is that the error bound does not depend on s. However, the error bound is (by construction) only probabilistic and the convergence rate of is not overly fast.

To improve the performance of the MC method, an alternative to reducing the variance is to improve the convergence speed of the method. This can be achieved by replacing the pseudorandom numbers with deterministic point sequences in [0, 1]^s which imitate the properties of the uniform distribution in this unit cube well. The construction of such point sequences with good distribution properties is a classical topic in mathematics and nowadays there is a plethora of available algorithms to quickly generate such quasi‐Monte Carlo (QMC) sequences (x_j)_{j=1, …, n} in [0, 1]^s, for example see Dick and Pillichshammer [288] for a recent survey.

One way to measure the distribution properties of such a deterministic sequence is the star discrepancy

where [0, y) = [0, y₁) × … × [0, y_s). That is, one considers the worst‐case deviation of the empirical fraction of points from the theoretical fraction images over all intervals [0, y) ∈ [0, 1]^s. Correspondingly, (x_j)_{j=1, 2, …,} is called uniformly distributed if . If such a sequence is now used for approximating images (which in our context represents the expected value of a random variable), then an upper bound for the approximation error is given by the Koksma–Hlawka inequality

where V (f) is the variation of the function f.⁶ Hence, one can split the error into a term that only depends on the integrand and a second term that only depends on the quality of the used point sequence. The star discrepancy of the best known sequences has an asymptotic order of images , so that for not too large values of s the convergence rate of QMC integration can considerably outperform MC. The actual performance can then even be better than this upper bound, but it also depends on the constant involved in the O term for the concrete QMC sequence. Typically, it can be a significant advantage to use QMC sequences for up to 20 or 30 dimensions.⁷

Example 9.5

A simple example for a QMC sequence is the Halton sequence. Choose an integer b and define for each integer j the sequence a_k(j) as the digits of j w.r.t. to base b, that is, . Then the radical inverse function is the reflection of this representation at the decimal point. If one repeats this procedure now for relatively prime integers b₁, …, b_s ≥ 2, one receives the s‐dimensional Halton sequence with bases b₁, …, b_s

This sequence is uniformly distributed and satisfies images . Figure 9.6 depicts (for dimension s = 2) 250 MC points and the first 250 points from a Halton sequence with bases b₁ = 2 and b₂ = 3. One sees how the QMC sequence fills up the space much more regularly.

While the construction of Halton sequences is extremely simple, their properties in higher dimensions are not favorable (the constant in the upper discrepancy bound increases fast as s grows). Better alternatives are, for example, (t,s) sequences (with Sobol sequences [705] as particular examples, cf. Figure 9.6 (right) for a plot of the first 250 points in the first two dimensions). Their construction is more involved, but respective codes for a quick generation are nowadays available in many computer packages (see the next section for references).

Figure 9.6 Two‐dimensional sequence of 250 pseudorandom numbers (left), Halton numbers with b₁ = 2 and b₂ = 3 (middle) and Sobol numbers (right).

Case study: Dutch fire insurance data.

Figure 9.7 shows the performance of a QMC implementation of the Asmussen–Kroese estimator of Figure 9.1 (right) for a Halton and a Sobol sequence. As can be observed, the performance of the Halton sequence (in the plot the bases b₁, …, b_s are the first s prime numbers) is somewhat disappointing for this particular application. Note that each individual simulation run is generated from one high‐dimensional point, where the first dimension of the point is used to determine the number N(365) of claims to be generated for that run; in the present case the largest dimension finally needed is s = 149. But as mentioned above, in such high dimensions the uniformity properties of Halton sequences deteriorate, and it takes many simulation runs to approximate the exact value well. At the same time, one sees that the estimate based on a Sobol sequence is excellent already for a low number of simulation runs. In both cases one can achieve further improvements using hybrid MC methods, as mentioned above. This illustrates that a judicious choice of an appropriate sequence is crucial (and will depend on the quantity to be approximated), yet the Sobol plot shows that the potential of QMC sequences to speed up computations is very promising.

Figure 9.7 Dutch fire insurance data: QMC simulation of for the compound Poisson case of Figure 6.2 as a function of n for the Asmussen–Kroese estimator.

Since the first dimensions of a QMC sequence typically have better properties (an effect known as curse of dimensionality), one strategy to increase the efficiency of QMC algorithms is to use the low dimensions of the sequence for the generation of realizations of those random variables that are particularly important for the outcome (like N(t) in the simulation of ). One way to formalize this is by means of a refined version of the Koksma–Hlawka inequality:

where F_l are (s − l)‐dimensional faces, are points in those faces with elsewhere, and are the respective lower‐dimensional variations of the function f. Hence, one can try to keep the overall error bound low by assigning the “good” dimensions (with lower discrepancy) of the sequence to those variables that have a higher (low‐dimensional) variation and hence contribute most to the overall variation of f. In particular, it is often observed that only some dimensions of the function f are really crucial for the total variability (leading to the notion of the effective dimension of the integrand, cf. Wang and Fang [774]).

9.4 Notes and Bibliography

Excellent surveys on stochastic simulation techniques are Asmussen and Glynn [59], Korn et al. [497], Glasserman [390]. See also Asmussen and Albrecher [57, Ch. XV]. The first logarithmically efficient rare event simulation algorithm can be found in Asmussen and Binswanger [58], and the Asmussen–Kroese estimator was proposed in [64]. For the study of rare event simulation algorithms for stop‐loss premiums, see also Hartinger and Kortschak [422] and Asmussen and Kortschak [62, 63] for further performance improvements. An important contribution for rare event sampling is Blanchet and Glynn [141].

Classical texts on the theory of QMC methods are Niederreiter [590] and Drmota and Tichy [309]. For more recent accounts see, for example, Dick and Pillichshammer [288] and Lemieux [538]. Since, by its deterministic nature, the QMC method does not provide a confidence interval for the obtained estimate, it is sometimes common to sequentially use randomized QMC sequences (e.g., choosing different starting values) and then providing a statistical error estimate, as for MC (e.g., see L’Ecuyer and Lemieux [553]). QMC techniques have been used in various application areas in insurance (for classical risk models see Albrecher and Kainhofer [24] and Preischl et al. [629], and for an application to CAT bond pricing see Albrecher et al. [22]). One further advantage of QMC methods is that due to its deterministic construction an entire simulation run can easily be replicated, whereas this can be more tricky for a MC implementation.

In recent years there also has been a lot of research activity on simulation techniques for dependent risks (see Mai and Scherer [558] for a survey). QMC techniques for dependent scenarios are now being studied in more detail (see Cambou et al. [181] and Preischl [628] for some first contributions).

Whenever a model admits an explicit expression (or an explicit approximation) in terms of the parameters, model sensitivities and respective tuning can be done in a very efficient way, and then such an approach is preferable to simulation, since the latter only gives one number and changing parameters entails an entirely new simulation exercise. However, when it comes to aggregating the entire portfolio (which, for instance, is needed for the determination of the solvency capital of a company), the number and type of involved risks will be too complex to allow for explicit expressions, and simulations are the essential tool to assess the resulting profit and loss distribution, and particularly its tail. For sensitivity analysis in connection with simulation techniques in semi‐explicit situations see, for example, Glasserman [390]. It is also conceptually easy to add particular additional scenarios in a simulation approach (see Mack [555] for early suggestions in that direction). The regulator in fact often asks to store all the simulated values for potential control purposes, which in view of the enormous number of data can be a challenge. In this connection Arbenz and Guevara [45] recently proposed a data compression technique that allows such empirical c.d.f. to be stored much more efficiently, keeping in mind the reproducability of the concrete implemented risk measures.

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9 Simulation

Create new playlist

Sign In

Sign Up