9
Simulation

In many practical situations, particularly also in reinsurance, the model assumptions or the number of model ingredients and their interactions are too complex to allow for an explicit calculation (or direct approximation) of quantities of interest. Many such quantities can, however, be expressed as expectations of a random variable X,1 for which the distribution is specified through the model assumptions (albeit typically as a complicated interaction of other random variables, so that the direct calculation is impossible). For such situations, one can generate independent samples of that random variable and estimate the expectation by the arithmetic mean of those sample values. This is the core idea of Monte Carlo (MC) simulation. In view of its importance in reinsurance applications, we will discuss some of the main ideas in this chapter. For links to more detailed surveys on the topic see the Notes at the end of the chapter.

9.1 The Monte Carlo Method

Assume that for some random variable X we want to estimate images, and we know that this value is finite. If we are able to generate n independent sample values Xi, then the strong law of large numbers guarantees that, with probability 1,

images

The estimator images is unbiased and strongly consistent. If also images, then by the Central Limit Theorem, in distribution,

images

That is, for large n the sample mean has the approximate distribution

images

respectively

images

This shows that the convergence rate of the Monte Carlo method is of the order images.2 Clearly, the error bounds for the Monte Carlo estimator will by construction always be probabilistic (i.e., not certain). From Φ(1.96) = 0.975 for the standard normal c.d.f. one then receives the approximate 95% confidence interval for images:

and this confidence interval is the usual way Monte Carlo estimates are reported. In order to state this interval, one therefore needs to estimate the variance, and this is done by the (unbiased) sample variance of the generated replications:

images

Note that adding one decimal place of precision requires 100 times as many replications, so for a high accuracy of the estimate very large sample sizes are needed, which can be very time‐consuming. One alternative to reduce the size of the confidence interval is to reduce the variance of the estimator, which is sometimes possible with some smart ideas and workarounds. We will discuss such possibilities in Section 9.2.

In many cases relevant for reinsurance purposes, one in fact wants to estimate a probability for a certain event A (e.g., the event that the aggregate loss exceeds some threshold). However, such a probability can also be expressed as the expectation over the indicator function of the event A, that is, images. The corresponding Monte Carlo estimate for ℙ(A) is then simply the relative frequency of the occurrence of A among n independent experiments

images

Since the variance of a Bernoulli random variable is given by

the sample variance is images and we then obtain the approximate 95% confidence interval for ℙ(A) in the form

For the implementation of the Monte Carlo method, one needs to produce random numbers through a deterministic algorithm (“pseudorandom numbers”) which imitates randomness to such a degree that the result is practically indistinguishable from truly random numbers, that is, that the respective statistical tests for randomness are passed. Since FX(X) is uniform(0,1) distributed, the random number images has distribution FX and so it typically suffices to focus on the generation of uniform pseudorandom numbers (even if in many circumstances there are more efficient algorithms than this inversion method available to generate a random sample with distribution FX, particularly when the inverse function images is cumbersome to work with).3 Since for the generation of one sample value of X one may in fact need many pseudorandom numbers (either X can be generated more efficiently by combining several variables or, more typically, X itself depends on many further random variables which all need to be generated for one realization of X), it is also important that there are no significant deterministic patterns in a produced pseudorandom sequence if one looks at certain blocks of numbers. Developing pseudorandom number generators and studying their properties is a classical topic of mathematical research, and nowadays there are very efficient and quick pseudorandom number generators available, so that we can generally take it for granted that the generation of “sufficiently” random and independent samples is possible (see Korn et al. [497] for an overview). The computational bottleneck in this context is hence typically not the generation of the pseudorandom sequence, but the evaluation of the functions that need to be applied to these numbers.

Another interpretation of the MC method is as follows: if the distribution of X (e.g., as a function of many other, possibly dependent, random variables) is not available explicitly, one can simulate n realizations X1, …, Xn and build up an empirical c.d.f. images from these n points by assigning a weight of 1/n to each (cf. (4.2.7)). Using images as an approximation of the true FX in calculations (e.g., for quantiles, expectations of functions of X, etc.) exactly corresponds to the MC estimator of the respective quantity.

9.2 Variance Reduction Techniques

If Var (X) (respectively its estimate) in the confidence interval (9.1.1) is large, it can take a large number of replications to arrive at a satisfactory confidence interval. If we can replace the original estimate by another one with the same expectation, but smaller variance, then we can achieve an efficiency gain.4

In the following we briefly discuss three popular variants of variance reduction.

9.2.1 Conditional Monte Carlo

Let Y be another random variable and images can be calculated. Then

images

is an unbiased estimator of images, and from

images

it follows that Var (Y1) ≤ Var (X). Consequently, this always leads to a reduction of variance, but it will not always be easy to find an appropriate Y to serve the purpose.

We illustrate this approach with an impressive example for the simulation of tail probabilities of compound sums of heavy‐tailed risks:

9.2.2 Importance Sampling

One of the problems of crude MC estimation is that many generated sample points will not really be in the relevant region for the quantity to be approximated. For instance, in Example 9.1 for the estimation of the quantile, for very small or very large α, there will not be enough sample points close to the quantile (it is inefficient to determine the exact value of a replication if it is in any case far to one side of the quantile; only the fact that it lands on that side is relevant), and this adds to the variance of the estimator.

Assume again that FX allows for a density fX. The idea of importance sampling is now to switch from fX to another density images that concentrates more strongly on the region of interest (in Example 9.1 this would mean that images has a lot of probability mass around the suspected value of Q(α)). Such a new density images can be obtained from fX by shifting, rescaling, twisting etc. The quantity images is called the likelihood ratio function. We then have

images

We instead simulate n independent replicates images from the new random variable images (with density images) and use the importance sampling estimator

This represents a weighted MC estimator for images with weights according to the likelihood ratio function, in order to “correct” for using the new density images. The variance of this estimator is

images

so that we achieve a variance reduction whenever images. Correspondingly, importance sampling will be most efficient, if heuristically

  • images is large whenever x2fX(x) is large,
  • images is small whenever x2fX(x) is small.

Furthermore, images should be easy to evaluate and images should be easy to simulate. The following example illustrates the efficiency gain.

Let us now turn to estimating the tail probability images of an aggregate sum images for large u and light‐tailed claims Xi. Then one can apply the exponential twisting idea described in the above illustration to the entire random variable S(t) (which modifies the distribution of both N(t) and Xi), to get the importance sampling estimate

images

where images is the cumulant‐generating function of S(t). We can now choose θ(u) (i.e., define the amount of tilting) in such a way that under the new measure we expect S(t) to be equal to the threshold value u, so that5

images

For heavy‐tailed claim sizes, exponential twisting is not applicable because the moment‐generating function images does not exist for any r > 0. For such cases it is popular to twist the hazard rate images instead (further refinements are so‐called asymmetric hazard rate twisting and delayed hazard rate twisting, see Juneja and Shahabuddin [472]).

9.2.3 Control Variates

Assume that a random variable Y is available for which images is known exactly and which is positively correlated with X. Then the deviation of the simulated from the exact value of Y may be used to correct the estimate for X. The identity images indeed suggests the control variate MC estimator

images

where (Xi, Yi) are independent copies of (X, Y). One immediately gets

images

so that using XY instead of the crude MC estimator will reduce the resulting variance whenever 2 Cov(X, Y) − Var (Y) ≥ 0. In particular, the closer Y is to X, the more variance of crude MC can be eliminated. If a suitable Y can be identified and the additional computation time for simulating Y is not excessive, this variance reduction technique can perform remarkably well.

Note that if Y is a control variate, so is a Y for any constant a > 0 and this suggests to choose a in the most favorable way, namely such that

images

is minimized. This leads to the optimal constant

images

and a resulting overall variance reduction of Cov2(X, Y)/(nVar (Y)). The corresponding relative variance reduction then amounts to images, where ρX, Y is the correlation coefficient. In most cases, Cov(X, Y) and Var (Y) will not be known explicitly, so typically the latter values will be simulated as well (in a smaller pre‐run simulation or in parallel).

If a further control variate is available, one can iteratively apply the same procedure again and use

images

which will lead to a further reduced variance if

images

For further details on the above methods and further variance reduction techniques like antithetic variables and stratified sampling, see, for example, Asmussen and Glynn [59].

9.3 Quasi‐Monte Carlo Techniques

The simulation of one replicate of images in fact involves k random variables (or even more if Xi is generated with the help of several pseudo‐random numbers). If the number of summands is itself random (as for images), then, for each replicate, N(t) needs to be simulated first and the outcome determines the resulting number of needed pseudo‐random numbers for this replicate. If s such pseudo‐random numbers are needed, one can interpret that each replication needs an s‐dimensional pseudorandom number, that is, (after suitable transformation) one uses a uniformly distributed point in the s‐dimensional unit cube [0, 1]s. One advantage of MC methods (and partly the reason why they are so popular) is that the error bound does not depend on s. However, the error bound is (by construction) only probabilistic and the convergence rate of images is not overly fast.

To improve the performance of the MC method, an alternative to reducing the variance is to improve the convergence speed of the method. This can be achieved by replacing the pseudorandom numbers with deterministic point sequences in [0, 1]s which imitate the properties of the uniform distribution in this unit cube well. The construction of such point sequences with good distribution properties is a classical topic in mathematics and nowadays there is a plethora of available algorithms to quickly generate such quasi‐Monte Carlo (QMC) sequences (xj)j=1, …, n in [0, 1]s, for example see Dick and Pillichshammer [288] for a recent survey.

One way to measure the distribution properties of such a deterministic sequence is the star discrepancy

images

where [0, y) = [0, y1) × … × [0, ys). That is, one considers the worst‐case deviation of the empirical fraction of points from the theoretical fraction images over all intervals [0, y) ∈ [0, 1]s. Correspondingly, (xj)j=1, 2, …, is called uniformly distributed if images. If such a sequence is now used for approximating images (which in our context represents the expected value of a random variable), then an upper bound for the approximation error is given by the Koksma–Hlawka inequality

images

where V (f) is the variation of the function f.6 Hence, one can split the error into a term that only depends on the integrand and a second term that only depends on the quality of the used point sequence. The star discrepancy of the best known sequences has an asymptotic order of images, so that for not too large values of s the convergence rate of QMC integration can considerably outperform MC. The actual performance can then even be better than this upper bound, but it also depends on the constant involved in the O term for the concrete QMC sequence. Typically, it can be a significant advantage to use QMC sequences for up to 20 or 30 dimensions.7

Since the first dimensions of a QMC sequence typically have better properties (an effect known as curse of dimensionality), one strategy to increase the efficiency of QMC algorithms is to use the low dimensions of the sequence for the generation of realizations of those random variables that are particularly important for the outcome (like N(t) in the simulation of images). One way to formalize this is by means of a refined version of the Koksma–Hlawka inequality:

images

where Fl are (s − l)‐dimensional faces, images are points in those faces with images elsewhere, and images are the respective lower‐dimensional variations of the function f. Hence, one can try to keep the overall error bound low by assigning the “good” dimensions (with lower discrepancy) of the sequence to those variables that have a higher (low‐dimensional) variation and hence contribute most to the overall variation of f. In particular, it is often observed that only some dimensions of the function f are really crucial for the total variability (leading to the notion of the effective dimension of the integrand, cf. Wang and Fang [774]).

9.4 Notes and Bibliography

Excellent surveys on stochastic simulation techniques are Asmussen and Glynn [59], Korn et al. [497], Glasserman [390]. See also Asmussen and Albrecher [57, Ch. XV]. The first logarithmically efficient rare event simulation algorithm can be found in Asmussen and Binswanger [58], and the Asmussen–Kroese estimator was proposed in [64]. For the study of rare event simulation algorithms for stop‐loss premiums, see also Hartinger and Kortschak [422] and Asmussen and Kortschak [62, 63] for further performance improvements. An important contribution for rare event sampling is Blanchet and Glynn [141].

Classical texts on the theory of QMC methods are Niederreiter [590] and Drmota and Tichy [309]. For more recent accounts see, for example, Dick and Pillichshammer [288] and Lemieux [538]. Since, by its deterministic nature, the QMC method does not provide a confidence interval for the obtained estimate, it is sometimes common to sequentially use randomized QMC sequences (e.g., choosing different starting values) and then providing a statistical error estimate, as for MC (e.g., see L’Ecuyer and Lemieux [553]). QMC techniques have been used in various application areas in insurance (for classical risk models see Albrecher and Kainhofer [24] and Preischl et al. [629], and for an application to CAT bond pricing see Albrecher et al. [22]). One further advantage of QMC methods is that due to its deterministic construction an entire simulation run can easily be replicated, whereas this can be more tricky for a MC implementation.

In recent years there also has been a lot of research activity on simulation techniques for dependent risks (see Mai and Scherer [558] for a survey). QMC techniques for dependent scenarios are now being studied in more detail (see Cambou et al. [181] and Preischl [628] for some first contributions).

Whenever a model admits an explicit expression (or an explicit approximation) in terms of the parameters, model sensitivities and respective tuning can be done in a very efficient way, and then such an approach is preferable to simulation, since the latter only gives one number and changing parameters entails an entirely new simulation exercise. However, when it comes to aggregating the entire portfolio (which, for instance, is needed for the determination of the solvency capital of a company), the number and type of involved risks will be too complex to allow for explicit expressions, and simulations are the essential tool to assess the resulting profit and loss distribution, and particularly its tail. For sensitivity analysis in connection with simulation techniques in semi‐explicit situations see, for example, Glasserman [390]. It is also conceptually easy to add particular additional scenarios in a simulation approach (see Mack [555] for early suggestions in that direction). The regulator in fact often asks to store all the simulated values for potential control purposes, which in view of the enormous number of data can be a challenge. In this connection Arbenz and Guevara [45] recently proposed a data compression technique that allows such empirical c.d.f. to be stored much more efficiently, keeping in mind the reproducability of the concrete implemented risk measures.

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset