4
Statistics for Claim Sizes

If there is any suspicion of heavy‐tailed distributions, then it is advisable that the actuary should make a number of different data plots. Modelling of large claims is quite an uncertain undertaking, and hence the more graphs considered the better in order to make a balanced conclusion.

As a baseline distribution one might depart from the exponential distribution and inspect for HTE tails. If the right tail of the distribution is obviously heavier than any exponential distribution, then Weibull, log‐normal or Pareto quantile plots offer potential improvements. Such a first step can be performed using different kinds of quantile plots (exponential, log‐normal, Weibull or Pareto) and their derivative plots.

After this large claims modelling using extreme value methodology comes into play. Here the maximum likelihood methodology applied to the peaks over threshold (POT) approach plays the central role. We also emphasize methods based on quantile plotting in order to allow for graphical validation of the models and results. We first discuss the classical case of independence and identically distributed data, followed by regression settings, censored and multivariate data. In reinsurance, the development of large claims can take several years. When evaluating a portfolio, not all the claims are fully developed and the indexed payments at the last available development year are an underestimation of the real final indexed payment. When historical incurred information per claim is available, this should assist in the estimation of the tails of the payment distribution.

It remains desirable to construct a distribution with an appropriate tail fit but which at the same time has enough parameters to fit also in the medium range. An early reference here is Albrecht [32], who pointed out that claim size data are often well described by a Pareto distribution for large claims, while the log‐normal distribution provides a good fit for medium‐sized claims. For a general review on the construction of mixture models with tail components, see Scarott and MacDonald [667]. Here we discuss the method of splicing different distributions in more detail, and in particular we propose combining a mixed Erlang distribution with a tail fit.

All of this material will be illustrated using the data sets introduced in Chapter 1. While the automobile liability data and the Dutch fire insurance data will be used throughout, we end the chapter by analysing the Austrian storm risk, European flood risk data, the Groningen earthquake data, and the Danish fire insurance case in order to illustrate statistical methods for tail estimation.

For a more general survey and statistical methods of extreme value theory see Embrechts et al. [329], Reiss et al. [645], Coles [221], and Beirlant et al. [100]. These references also contain more technical details that are omitted here.

4.1 Heavy or Light Tails: QQ‐ and Derivative Plots

As discussed in Section 3.4 the mean excess function offers a first tool to discriminate between HTE and LTE tails. In practice, based on a sample X1, X2, …, Xn, the mean excess function images can be naively estimated when replacing the expectation by its empirical counterpart:

images

where for any set A, 1A(Xi) equals 1 if XiA, and 0 otherwise. The value t is often taken equal to one of the data points, say the (k + 1)‐largest observation Xnk, n for some k = 1, 2, …, n − 1. We then obtain

(4.1.1) images

The mean excess values ek, n can be plotted as a function of the threshold xnk, n or as a function of the inverse rank k.

There is an interesting link between the values ek, n and exponential QQ‐plots. For an exponential distribution the quantile values images stand in linear relationship to the corresponding quantiles of the standard exponential distribution images:

images

Hence, when estimating images by the empirical quantiles Xj, n, we have that the exponential QQ‐plot, defined by

images

should exhibit a linear pattern which passes through the origin for the exponential model to be a plausible model. An estimator of the slope can then also be used as an estimator of 1/λ.

Now ek, n can be viewed as an estimate of the slope 1/λk of the exponential QQ‐plot to the right of an anchor point images, and hence (xnk, n, ek, n) or (k, ek, n) for k = 1, …, n, can be interpreted as a derivative plot of the exponential QQ‐plot. When fitting a regression line which passes through the anchor point using least squares regression minimizing

images

with respect to 1/λk, one indeed obtains

images

so that images using the approximation images, which is sharp even for small k.

Also, when the data come from a distribution with a tail heavier than exponential, the exponential QQ‐plot will ultimately be convex and ultimately upcross the fitted regression line for every k, so that the slopes ek, n will increase always with increasing Xnk, n (or decreasing k), while for a tail lighter than exponential, the QQ‐plot will ultimately be concave, ultimately appearing under the fitted regression line for every k, and the slopes will decrease with increasing Xnk, n (or decreasing k).

When modelling reinsurance claim data we expect convex exponential QQ‐plots linked with increasing mean excess plots (xnk, n, ek, n). A popular second step is to inspect log‐normal or Pareto QQ‐plots. Note that the mean excess plots of a Pareto‐type distribution ultimately will be linear increasing with slope 1/(α − 1), as follows from (3.4.12). Again, log‐normal, respectively Pareto, tail fits appear appropriate when the right upper end of the corresponding QQ‐plot is linear from some point on. It is advisable to accompany the QQ‐plot with the corresponding derivative plots.

  • Since log X is exponentially distributed with λ = α when X is strict Pareto(α) distributed, the Pareto QQ‐plot is defined as
    images
    with derivative plot
    images
    whereHk, n is the estimator of 1/α introduced by Hill [442]. Indeed, if the data come from a Pareto distribution, then the Pareto QQ‐plot is linear and the derivative plot is horizontal at the level 1/α.
  • The normal QQ‐plot based on the logarithms of the data provides the log‐normal QQ‐plot
    images
    where Φ−1 denotes the standard normal quantile function. The derivative plot is then given by
    images
    with
    images
    since, with φ denoting the standard normal density,
    images
  • The quantile function of the Weibull distribution is given by
    images
    so that for this model images. Again taking images (i = 1, …, n) and estimating Q(i/(n + 1)) by Xi, n leads to the definition of the Weibull QQ‐plot
    images
    The derivative plot is then given by
    images
    with
    images

Insurance claim data often exhibit different statistical behavior over various subsets of the outcome set which can be observed in mean excess plots, starting with components in the center of the data followed by a Pareto tail. Sometimes such Pareto tails then turn out to be upper‐truncated, as defined in Section 3.3.1.1.

Image described by caption.
Image described by caption.

Figure 4.1 Dutch fire insurance data: exponential QQ‐plot and mean excess plot (xnk, n, ek, n) (top); Pareto QQ‐plot and Hill plot images (second line); log‐normal QQ‐plot and derivative plot images (third line); Weibull QQ‐plot and derivative plot images (bottom). For each QQ‐plot the regression line through Xn−99, n, …, Xn, n is plotted.

Image described by caption.
Image described by caption.

Figure 4.2 MTPL data for Company A, ultimate data values at evaluation: exponential QQ‐plot and mean excess plot (xnk, n, ek, n) (top); Pareto QQ‐plot and Hill plot images (second line); log‐normal QQ‐plot and derivative plot images (third line); Weibull QQ‐plot and derivative plot images (bottom).

Image described by caption.
Image described by caption.

Figure 4.3 MTPL data for Company B, ultimate data values at evaluation: exponential QQ‐plot and mean excess plot (xnk, n, ek, n) (top); Pareto QQ‐plot and Hill plot images (second line); log‐normal QQ‐plot and derivative plot images (third line); Weibull QQ‐plot and derivative plot images (bottom).

4.2 Large Claims Modelling through Extreme Value Analysis

4.2.1 EVA for Pareto‐type Tails

In order to model large claims, Pareto tail modelling is probably the most common approach. Here we use the subset of models with tails heavier than exponential for which the EVI γ is positive, as discussed in Section 3.3.2.2, which in fact equals the set of Pareto‐type models that can be defined through tail functions 1 − F, quantile functions Q, or tail quantile functions U(x) = Q(1 − 1/x). Indeed

where γ = 1/α > 0 and U is a slowly varying function. Also (3.2.7) is equivalent to

for every u > 0. In this section we discuss the estimation of γ = 1/α, large quantiles Q(1 − p) = U(1/p), and small tail probabilities images. We assume in this and the next subsection that the data are independent and identically distributed (i.i.d.). Moreover mathematical approximations of variances (AVar), bias (ABias), mean squared error (AMSE), and distributions of estimators using k largest observations of the n data will hold when images and k/n → 0.

4.2.1.1 Estimating a Positive EVI

The most popular estimator for γ is given by the Hill estimator Hk, n defined in (4.1.2), see [442]. In the preceeding section this estimator was retrieved through regression on the Pareto QQ‐plot. Here, we also show how the maximum likelihood method based on the so‐called POT approach leads to the same estimation method in the Pareto‐type case.

  • In Section 4.1 the Hill estimator was motivated as an estimator of the slope of a linear Pareto QQ‐plot to the right of an anchor point images. In fact, this interpretation can be carried over to the general case of Pareto‐type distributions since then ultimately for images the Pareto QQ‐plot is still linear with slope γ for a small enough k or, equivalently, for a large enough Xnk, n. Indeed, under (4.2.3),
    images
    It can now be shown that for every slowly varying function
    images
    as images. Hence, whereas Pareto QQ‐plots are hardly ever completely linear, they are ultimately linear at some set of largest values. The speed at which the linearity sets in depends on the underlying slowly varying function. Like many publications, following Hall [419], we assume here that for some C, β > 0, and D a real constant. This can, however, be generalized to
    images
    as images with b essentially a power function or, more correctly, a regularly varying function with index − β, and images. Under (4.2.5) from which Hk, n follows taking x = (n + 1)/j (j = 1, …, k), estimating U((n + 1)/j) by Xnj+1, n (j = 1, …, k + 1), and taking the average of both sides of (4.2.6) over j = 1, …, k after deleting the last term on the right‐hand side. Omitting this final term (or, equivalently, assuming a strict Pareto distribution with constant slowly varying function) causes a bias which will be more important with smaller β. Adverse situations for the Hill estimator are logarithmic slowly varying functions U, as in case of the log‐gamma distribution. Such cases exhibit β = 0.
  • Alternatively, the Hill estimator is also a maximum likelihood estimator based on (4.2.4). Indeed, extreme value methodology proposes fitting the limiting Pareto distribution with distribution function 1 − x−1/γ to the POT values Y = X/t over a high threshold t conditionally on X > t. Note that the use of the mathematical limit in (4.2.4) to fit the exceedance data introduces an approximation error that leads to estimation bias. Let Nt denote the number of exceedances over t. Then the log‐likelihood equals
    images
    with
    images
    leading to the maximum likelihood estimator
    images
    Choosing an upper order statistic Xnk, n for the threshold t (so that Nt = k) we obtain Hk, n.
  • From Section 4.1 it also follows that the Hill statistic can be interpreted as an estimator of the mean excess function of the log‐transformed data, that is, images, with the threshold value t substituted by Xnk, n. As in (3.4.10) we here find
    images
    Estimating F(u) using the empirical distribution function
    (4.2.7)images
    with value images over the interval [Xnj, n, Xnj+1, n) we are led to the estimator Using summation by parts one observes that this final expression equals the Hill estimator: with
    images
    (with images).

To deduce approximate expressions for the variance and bias of the Hill estimator it is helpful to consider the preceding interpretation in terms of the scaled log‐spacings Zj. Thanks to the Rényi representation j(Enj+1, n − Enj, n) = d Ej (j = 1, …, n) concerning order statistics E1, n ≤ E2, n ≤ … ≤ En, n from a random sample E1, E2, …, En of n independent standard exponential random variables, we have in case of a strict Pareto distribution (i.e., with U constant), that

This representation is based on the memoryless property of the exponential distribution and the fact that nE1, n is standard exponentially distributed. From (4.2.9) and (4.2.10) we expect that, as images,

images

Concerning the bias due to the approximation error, we confine ourselves to the model (4.2.5). Then the theoretical analogue of the Hill estimator is given by

images

with images. Hence, the approximate mean squared error is given by

images

while in order to construct confidence bounds we have that, as images with k/n → 0,

Image described by caption.

Figure 4.4 Pareto QQ‐plot and Hill plot (k, Hk, n): Dutch fire insurance data with 95% confidence intervals for each k (top); MTPL data for Company A, ultimate values (middle); MTPL data for Company B, ultimate values (bottom).

4.2.1.2 Estimating Large Quantiles and Small Tail Probabilities

One of the most important applications of EVA is the estimation of extreme quantiles qp = Q(1 − p) with p small, also termed Value‐at‐Risk (VaR) in risk applications. Alternatively, the return period for a high claim amount x given by images is another measure describing extreme risks.

The estimation of a high quantile under Pareto‐type modelling can be performed by extrapolating along a fitted regression line on the Pareto QQ‐plot through the point images with slope Hk, n. Following (4.2.6) with x = 1/p, estimating U((n + 1)/(k + 1)) by Xnk, n and γ by Hk, n, and omitting the second term on the right‐hand side, that is, using

images

we arrive at the estimator

which was first proposed by Weissman [777]. The estimator images can also be retrieved from (4.2.3), leading to the approximation U(vx)/U(x) ≈ vγ for large values of x. Setting vx = 1/p, x = (n + 1)/(k + 1) so that v = (k + 1)/((n + 1)p), and estimating U((n + 1)/(k + 1)) by Xnk, n and γ by Hk, n, we obtain images again.

Estimation of return periods can be obtained using the inverse relationship on the Pareto QQ‐plot:

The expression for images can also be deduced from (4.2.4), leading to the approximation images for large values of t. Setting tu = x, t = Xnk, n so that u = x/Xnk, n, and estimating images by (k + 1)/(n + 1) we obtain images.

Approximate confidence bounds for such parameters have been derived based on asymptotic distributions of the estimators. In the case of the tail probability estimator we find with images

images

when images, k/n → 0, npx/k → τ ∈ [0, 1) and images, while with qp = Q(1 − p),

images

when images, k/n → 0, np/k → τ ∈ [0, 1) and images.

4.2.1.3 Bias Reduction

When constructing confidence intervals for risk measures such as px and qp, again the approximation of the underlying conditional distribution by the simple Pareto distribution entails a bias for all the existing estimators, next to the bias induced by estimating γ. One approach to reduce the bias is to construct estimators based on regression models of the values Zj. Indeed, under (4.2.5), using the approximation images and with the mean value theorem on xβ at the points j/(n + 1) and (j + 1)/(n + 1), the theoretical analogue of a Zj random variable can be approximated by

(4.2.14) images

An alternative representation, using 1 + u ≈ eu for small values of u, is then

images

The more accurate approximation

images

where Ej denotes a sequence of independent standard exponentially distributed random variables, was derived in an asymptotic sense in Beirlant et al. [97]. For each k, model (4.2.15) can be considered as a non‐linear regression model in which one can estimate the intercept γ, the slope bk, n, and the power β with the covariates j/(k + 1). One can estimate these parameters jointly, or by using an external estimate for β, or using external estimation for β and images on the regression model

Gomes et al. found that external estimation for B and β should be based on k1 extreme order statistics where k = o(k1) and images. Such an estimator for β was presented, for example, in Fraga Alves et al. [35]. Given an estimator images for β, an estimator for B was given in Gomes and Martins [399]:

images

When the three parameters are jointly estimated for each k, the asymptotic variance turns out to be γ2((1 + β)/β)4, which is to be compared with the asymptotic variance γ2 for the Hill estimator. Performing linear regression on images importing an external estimator for β, the asymptotic variance drops down to γ2((1 + β)/β)2. The original variance γ2 is retained when using the external estimators for B and β in (4.2.16).

Bias reduction of the extreme quantile estimator images should not be based solely on replacing Hk, n by a bias‐reduced estimator for γ. Here we use the fact that Xnk, n = dU(1/Uk+1, n), where Uk+1, n denotes the (k + 1)th smallest order statistic from a uniform (0,1) sample of size n. Then we obtain from (4.2.3) and (4.2.5) with x = 1/p, approximating Uk+1, n by its expected value (k + 1)/(n + 1), and using 1 + u ≈ eu for u small, that

images

so that a bias‐reduced version of images is given by

images

where images, and images are bias‐reduced estimators based on the regression model (4.2.15).

Bias‐reduced estimators can also be obtained by improving on the approximation (4.2.4) of the POT distribution images by the simple Pareto distribution, using an extension of the Pareto distribution as introduced in Beirlant et al. [102]. Indeed, when U satisfies (4.2.5), then

The distribution of the POT’s X/t (X > t) can then be approximated using the expansion (1 + u)b ≈ 1 + bu for u small:

images

This leads to the extended Pareto distribution (EPD) with distribution function

(images) with δ = δt = DCβ/γtβ/γ and τ = −β/γ. Note that for an EPD random variable Y with τ = −1 and images, it follows that Y − 1 is GPD distributed with parameters γ and σ.

Using the density of the EPD gγ, δ, τ(y) = γ−1y−1/γ−1{1 + δ(1 − yτ)}−1/γ−1[1 + δ{1 − (1 + τ)yτ}], maximum likelihood estimators are then derived through maximization of

images

with respect to γ, δ using an external estimator of τ through estimates of β and γ, where the values images denote the POT values over the threshold t.

Bias‐reduced estimation of return periods is then obtained using Xnk, n again as a threshold t:

images

4.2.1.4 Estimating the Scale Parameter

Finally, note that the scale parameter C in (4.2.5), or A = C1/γ in (4.2.17), can be estimated with

(4.2.20) images

which follows, for instance, from (4.2.17) replacing x by Xnk, n and estimating images by the empirical probability (k + 1)/(n + 1).

The estimator Ĉk,n can also be retrieved using least squares regression on the k top points of the log–log plot

images

minimizing

(4.2.21) images

Substituting Hk, n for γ and taking the derivative with respect to log C indeed gives

images

In Beirlant et al. [104] it is shown that Âk,n is asymptotically normally distributed with asymptotic variance images and asymptotic bias images.

A bias‐reduced estimator of the scale parameter A is then given by

images

where images is a bias‐reduced estimator of γ, and images estimators of bk, n and β.

Image described by caption.
Image described by caption.

Figure 4.5 Hill estimates and bias‐reduced versions, regression approach as a function of k (left) and log k (right), and EPD approach (middle) as a function of k: Dutch fire insurance data (top); MTPL data for Company A, ultimate values (middle); MTPL data for Company B, ultimate values (bottom).

Image described by caption.

Figure 4.6 Quantile estimates images and images (left) and log‐return periods images and images (right), as a function of k: Dutch fire insurance data (x = 200 million, top); MTPL data for Company A, ultimate values (x = 4 million, middle); MTPL data for Company B, ultimate values (x = 6 million, bottom).

Scale estimating  k,n and  BR k,n as a function of k: Dutch fire insurance data (top); MTPL data for Company A, ultimate values (middle); MTPL data for Company B, ultimate values (bottom).

Figure 4.7 Scale estimates Âk,n and images as a function of k: Dutch fire insurance data (top); MTPL data for Company A, ultimate values (middle); MTPL data for Company B, ultimate values (bottom).

4.2.2 General Tail Modelling using EVA

In order to allow tail modelling with log‐normal or Weibull tails, one has to incorporate the case where the EVI γ can be 0, next to positive values. Estimation of γ, extreme quantiles and return periods under the max‐domains of attraction conditions Cγ in (3.2.5) or (3.2.6), with as few restrictions on the value of γ as possible, is the next step in tail modelling. Again we have two possible approaches: using quantile plotting or using a likelihood approach on POT values.

  • Here, several existing estimators start from the following condition, which follows from (3.2.5): for all u ≥ 1 as images
    (4.2.22)images
    From this it follows with images, that as images, k/n → 0

    Hence estimating EHk, n by Hk, n, and images by Xnk, n, we find that for any estimator images of γ

    leads to an estimator for images.

    Since a regularly varies with index images, it also follows from (4.2.23) that U((n + 1)/(k + 1))EHk, n = ((n + 1)/(k + 1))γ((n + 1)/(k + 1)) for some slowly varying function . Hence the approach using linear regression and extrapolation on linear tail patterns on a QQ‐plot can be generalized to the case of a real‐valued EVI using the generalized QQ‐plot

    images

    which ultimately for smaller values of k will be linear with slope γ, whatever the sign or values of γ. Hence if a generalized QQ‐plot is ultimately horizontal, then tail modelling using a distribution in the Gumbel domain of attraction is appropriate. An ultimately decreasing generalized QQ‐plot indicates a negative EVI, which can occur, for instance, for truncated heavy‐tailed distributions.

    A generalized Hill estimator of γ estimating the slopes at the last k points on the generalized QQ‐plot is then given by

    images

    where UHj, n = Xnj, nHj, n.

    Another generalization of the Hill estimator to real‐valued EVI was given in Dekkers et al. [268], termed the moment estimator:

    images

    where

    images
  • Condition (3.2.6) for a distribution to belong to a domain of attraction of an extreme value distribution means that the generalized Pareto law is the limit distribution of the distribution of POT values X − t given X > t when t → x+: setting h(t) = σt Hence, we are led to modelling the tail function images of POT values Y = X − t with X > t using the GPD with survival function images. Denoting the number of exceedances over t again by Nt, the log‐likelihood is given by
    images
    Using a reparametrization (γ, σ) → (γ, τ) with τ = γ/σ, leads to the likelihood equations
    images
    Replacing t by an intermediate order statistic Xnk, n again gives
    images
    In order to assess the goodness‐of‐fit of the GPD when modelling the POT values Y = X − t for a given threshold t, one can use the transformation
    images
    so that R is standard exponentially distributed if the POT values do follow a GPD, and the fit can be validated inspecting the overall linearity of the exponential QQ‐plot
    images
    where images (i = 1, …, Nt) denote the ordered values of
    images

For all these estimators images is asymptotically normal under some regularity conditions on the underlying distributions when images and k/n → 0, with mean 0 if k is not too large (or, equivalently, if the threshold t = xnk, n is not too small), and asymptotic variances (or covariance matrix for images) given by

images

Estimators for small tail probabilities or return periods can easily be constructed from the POT approach. In fact, the approximation of P(X − t > y|X > t) by (1 + (γ/σ)y)−1/γ for y > 0, setting t + y = x, leads to

images

Inversion leads to an extreme quantile estimator

images

The estimators for high quantiles based on the approach used in the construction of the moment estimator images are defined by

images

with â defined in (4.2.24). Note that images can be seen as an alternative estimator for σ when comparing the expressions of images and images. This then in turn leads to a moment tail probability estimator

images

The asymptotic distributions of these tail estimators have been derived in the literature. For instance, for images we have under some regularity conditions that with an = (k + 1)/(p(n + 1)) as npn → c ≥ 0, images one has

  • for γ > 0
    images
  • for γ < 0
    images

For further details see de Haan and Ferreira [258].

Image described by caption.
Image described by caption.

Figure 4.8 Generalized QQ‐plot (middle) and estimators of EVI images, images, images as a function of k (left) and log k (right): Dutch fire insurance data (first and second line); MTPL data for Company A, ultimate values (third and fourth line); MTPL data for Company B, ultimate values (bottom two lines).

Image described by caption.

Figure 4.9 Large quantile estimators images, images, images (left) and log‐return period estimators images, images, images (right) as a function of k: Dutch fire insurance data (x = 200 million, top); MTPL data for Company A, ultimate values (x = 4 million, middle); MTPL data for Company B, ultimate values (x = 6 million, bottom).

4.2.3 EVA under Upper‐truncation

Practical problems can arise when using the strict Pareto distribution and its generalization to the Pareto‐type model because some probability mass can still be assigned to loss amounts that are unreasonably large or even impossible. With respect to tail fitting of an insurance claim data, upper‐truncation is of interest and can be due to the existence of a maximum possible loss. Such truncation effects are sometimes visible in data, for instance when an overall linear Pareto QQ‐plot shows non‐linear deviations at only a few top data. Let W be an underlying non‐truncated distribution with distribution function FW, quantile function QW and tail quantile function UW. Upper‐truncation of the distribution of W at some value T was defined in the preceding chapter through the conditioning operation W|W < T. Let FT and UT be the distribution and tail quantile function of this truncated distribution. In practice one does not always know if the data X1, …, Xn come from a truncated or non‐truncated distribution, and hence the behavior of estimators should be evaluated under both cases, and a statistical test for upper‐truncation is useful. This section is taken from Beirlant et al. [95].

Upper‐truncation of the distribution of W at some truncation point T yields

images

The corresponding quantile function QT is then given by

images

while the tail function UT satisfies

(4.2.26) images

or

with images the odds of the truncated probability mass under the untruncated distribution W. Note also that for a fixed T, upper‐truncation models are known to exhibit an EVI γ = −1. This follows from verifying (3.2.5) for UT as given in (4.2.27). For instance when UW(x) = xγ, we find images as images. This final expression satisfies (3.2.5) with γ = −1.

4.2.3.1 EVA for Upper‐truncated Pareto‐type Distributions

We restrict attention to tail estimation for upper‐truncated Pareto‐type distributions:

images

where F is a slowly varying function at infinity or, with W/t denoting the peaks over a threshold t when W > t,

images

Upper‐truncation of a Pareto‐type distribution at a high value T then necessarily requires images and

images

One can now consider two cases as images:

  • Rough upper‐truncation with the threshold t when T/t → β > 1 and This corresponds to situations where the deviation from the Pareto behavior due to upper‐truncation at a high value will be visible in the data from the threshold t onwards, and an adaptation of the above Pareto tail extrapolation methods appears appropriate.
  • Light (or no) upper‐truncation with the threshold t: when images
    (4.2.29)images
    and hardly any truncation is visible in the data from the threshold t onwards, and the Pareto‐type model without truncation and the corresponding extreme value methods for Pareto‐type tails appear appropriate when restricted to excesses over t.

Under rough upper‐truncation we have

images

with density

images

Estimating T by Xn, n and taking t = Xnk, n so that images, we obtain the following log‐likelihood:

images

Now

images

which leads to the defining equation for the likelihood estimator images:

images

This estimator was first proposed in Aban et al. [6]. Beirlant et al. [95] showed that with κ = β1/γ − 1

images

From (4.2.27) it is clear that the estimation of DT is an intermediate step in important estimation problems following the estimation of γ, namely of extreme quantiles and of the endpoint T. When UW satisfies (4.2.3) it follows from (4.2.27) that as images and T/t → β

so that

Motivated by (4.2.31) and estimating QT(1 − (k + 1)/(n + 1))/QT(1 − 1/(n + 1)) by Rk, n, one arrives at

(4.2.32) images

as an estimator for DT. In practice one makes use of the admissible estimator

images

to make it useful for truncated and non‐truncated Pareto‐type distributions.

For DT > 0, in order to construct estimators of T and extreme quantiles qp = QT(1 − p), as in (4.2.31) we find that

Then taking logarithms on both sides of (4.2.33) and estimating QT(1 − (k + 1)/(n + 1)) by Xnk, n we find an estimator images of qp:

which equals the Weissman estimator images when images. An estimator images of T follows from letting p → 0 in the above expressions for images:

(4.2.35) images

Here we take the maximum of log Xn,n and the value following from (4.2.34) with p → 0 in order for this endpoint estimator to be admissible. It has been shown that images is superefficient under rough upper‐truncation, which means that the asymptotic variance is o(1/k) and the asymptotic bias is also smaller than, for instance, that of the moment quantile estimator images.

However, images is not a consistent estimator for qp under light upper‐truncation and when npn → 0. In that case one should use

The estimation of tail probabilities px = P(X > x) can be based directly on (4.2.28) using Rk, n as an estimator for 1/β:

(4.2.37) images

Of course, in order to decide between (4.2.34) and (4.2.36) one should use a statistical test for deciding between rough and light upper‐truncation.

4.2.3.2 Testing for Upper‐truncated Pareto‐type Tails

Aban et al. [6] proposed a test for images versus images under the strict Pareto model, rejecting H0 at level q ∈ (0, 1) when

for some 1 < k < n with A the scale parameter in the Pareto model. In (4.2.38), γ is estimated by Hk, n, the maximum likelihood estimator under H0, while A is estimated using Âk,n from (4.2.19). Note that the rejection rule (4.2.38) can be rewritten as

(4.2.39) images

and the P‐value is given by images.

Considering the testing problem

images

under the upper‐truncated Pareto‐type model, Beirlant et al. [95] propose to reject images when an appropriate estimator of (n + 1)DT/(k + 1) is significantly different from 0. Here we construct such an estimator generalizing images with an average of ratios (Xnk, n/Xnj+1, n)1/γ, j = 1, …, k, which then possesses an asymptotic normal distribution under the null hypothesis. Observe that with (4.2.30) under images as images

images

Estimating Ēk,n by

images

leads now to

(4.2.40) images

as an estimator of (n + 1)DT/(k + 1), with images an appropriate estimator of γ. Under images, the Hill estimator Hk, n is an appropriate estimator of γ. Moreover, it can be shown that under some regularity assumptions on the underlying Pareto‐type distribution, we have under images for images and k/n → 0, that images is asymptotically normal with mean 0 and variance 1/12. It is then also shown under rough upper‐truncation as images, k/n → 0 that Lk, n(Hk, n) tends to a negative constant so that an asymptotic test based on Lk, n(Hk, n) rejects images on level q when

(4.2.41) images

with images. The P‐value is then given by images.

Image described by caption.

Figure 4.10 MTPL data for Company A, ultimate estimates: images and Hk, n as a function of k (top left), P‐value of TB, k as a function of k (top right), images and images as a function of k (middle left), estimates of endpoint images (middle right), images as a function of k (bottom left), truncated Pareto fit to Pareto QQ‐plot based on top k = 250 observations (bottom right).

4.3 Global Fits: Splicing, Upper‐truncation and Interval Censoring

Given an appropriate tail fit, the ultimate goal consists of fitting a distribution with a global satisfactory fit. Rather than trying to splice specific parametric models such as log‐normal or Weibull models for the modal part of the distribution, one can rely on fitting a mixed Erlang (ME) distribution, as discussed in Verbelen et al. [758]. We also consider this set‐up in the presence of truncation and censoring.

4.3.1 Tail‐mixed Erlang Splicing

The Erlang distribution has a gamma density

images

where r is a positive integer shape parameter. Following Lee and Lin [534] we consider mixtures of M Erlang distributions with common scale parameter 1/λ having density

images

and tail function

images

where the positive integers r = (r1, …, rM) with r1 < r2 < … < rM are the shape parameters of the Erlang distributions, and α = (α1, …, αM) with αj > 0 and images are the weights in the mixture. Tijms [745] showed that the class of mixtures of Erlang distributions with a common scale 1/λ is dense in the space of positive continuous distributions on images. Moreover this class is also closed under mixtures, convolution and compounding. Hence aggregate risks calculations are simple, and XL premiums and risk measures based on quantiles can also be evaluated in a rather straightforward way.For instance, a composite ME generalized Pareto distribution can be built using (3.5.14), that is, a two‐component spliced distribution with density

images

If a continuity requirement at t were imposed, this would lead to

images

The survival function of this spliced distribution is given by

images

Alternatively, one can take images, where k* is an appropriate number of top order statistics corresponding to an extreme value threshold images.

Fitting ME distributions through direct likelihood maximization is difficult. A first algorithm was proposed by Tijms [745], but it turns out to be slow and can lead to overfitting. Lee and Lin [534] use the expectation‐maximization (EM) algorithm proposed by Dempster et al. [271] to fit the ME distribution. Model selection criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) information criteria, are then used to avoid overfitting. Verbelen et al. [758] extend this approach to censored and/or truncated data. The need for the EM algorithm follows from the data incompleteness due to mixing and censoring.

The EM algorithm is used to compute the maximum likelihood estimator (MLE) for incomplete data where direct maximization is impossible. It consists of two steps that are put in an iteration until convergence:

  • E‐step: Compute the conditional expectation of the log‐likelihood given the observed data and previous parameter estimates.
  • M‐step: Determine a subsequent set of parameter estimates in the parameter range through maximization of the conditional expectation computed in the E‐step.

Rather than proposing a data‐driven estimator of the splicing point t, we use an expert opinion on the splicing point t based on EVA as outlined above. Then, π can be estimated by the fraction of the data not larger than t. Similarly, T is deduced from the EVA. The extreme value index γ is estimated in the algorithm, starting from the value obtained from the EVA at the threshold t. The final estimates for γ always turned out to be close to the EVA estimates. Next, the ME parameters (α, λ) are estimated using the EM algorithm as developed in Verbelen et al. [758]. The number of ME components M is estimated using a backward stepwise search, starting from a certain upper value, whereby the smallest shape is deleted if this decreases an information criterion such as AIC or BIC. Moreover, for each value of M, the shapes r are adjusted based on maximizing the likelihood starting from r = (s, 2 s, …, …, M × s), where s is a chosen spread factor.

Of course tail splicing of an ME can also be performed using a simple Pareto fit, or an EPD fit, whether or not adapted for truncation. For instance, splicing an ME with an upper‐truncated Pareto approximation leads to

Image described by caption.

Figure 4.11 MTPL data for Company A ultimates: fit of spliced model with a mixed Erlang and a Pareto component with thresholds t indicated on mean excess plot (top left); empirical and model survival function (top right); PP plot of empirical survival function against splicing model RTF (bottom left); idem with images transformation (bottom right).

4.3.2 Tail‐mixed Erlang Splicing under Censoring and Upper‐truncation

In reinsurance data left truncation appears at some point, denoted here by tl, which can be a deductible or a percentage of the retention u from an XL contract. Claims leading to a cumulative payment below tl at a given stage during development are then left truncated. Such a claim constitutes an IBNR claim. As discussed above, an upper‐truncation mechanism at some point T can appear.

We denote the ME density and distribution function by fME and FME, and similarly fEV and FEV for the EVA distribution. We then define, omitting the model parameters from the notation for the moment,

images
images

with 0 ≤ tl < t < T where T can be equal to images. The densities f1 and f2 are then valid densities on the intervals [tl, t] and [t, T], respectively. For the first density, this means that it is lower truncated at tl and upper truncated at t, and the second density is lower truncated at t and upper truncated at T. The corresponding distribution functions are

images
images

We consider the splicing density and distribution function

images

and

Next to truncation, censoring mechanisms occur in reinsurance1 :

  • right censoring occurs for instance when a claim has not been settled at the evaluation date (RBNS claims). See Chapter 1 for the case of motor liability data. The final claim amount xi will be larger than the lower censoring value li
  • left censoring occurs when only an upper bound ui to the claim xi is given
  • interval censoring means that the final claim value xi is only known to be inside an interval [li, ui] ⊂ [tl, T].

In the splicing context with an EVA component from a threshold t on, we have the following five classes of observations:

  1. Uncensored observations with tl ≤ li = ui = xi ≤ t < T.
  2. Uncensored observations with tl < t < li = ui = xi ≤ T.
  3. Interval censored observations with tl ≤ li < ui ≤ t < T.
  4. Interval censored observations with tl < t ≤ li < ui ≤ T.
  5. Interval censored observations with tl ≤ li < t < ui ≤ T.

These classes are shown in Figure 4.12. In the conditioning argument in the E‐step of the algorithm, the fifth case is split into xi ≤ t and xi > t, as indicated in Figure 4.12.

Illustration displaying 3 vertical dashed lines for t l, t, and T with 2 solid squares as observed data point in columns i and ii, and 4 shaded circles as unobserved data point in columns iii, iv, and v.

Figure 4.12 The different classes of censored observations.

For the Erlang mixture, the number M and the integer shapes r are fixed when estimating Θ1 = (α, λ). Also, Θ2 denotes the extreme value parameter γ (together with σ when using the GPD tail fit). The idea behind the EM algorithm in this context is to consider the censored sample images in contrast to the complete data images which is not fully observed. Given a complete version of the data, we can construct a complete likelihood function as

images

where images is the indicator function for the event {Xi ≤ t}. The corresponding complete data log‐likelihood function is

images

As we do not fully observe the complete version images of the data sample, it is not possible to optimize the complete data log‐likelihood directly. The intuitive idea for obtaining parameter estimates in the case of incomplete data is to compute the expectation of the complete data log‐likelihood and then use this expected log‐likelihood function to estimate the parameters. However, taking the expectation of the complete data log‐likelihood requires the knowledge of the parameter vector, and so the algorithm has to run iteratively. Starting from an initial guess for the parameter vector, the EM algorithm iterates between two steps. In the hth iteration of the E‐step the expected value of the complete data log‐likelihood is computed with respect to the unknown data images given the observed data images and using the current estimate of the parameter vector Θ(h−1) as true values,

images

In the M‐step, one maximizes the expected value of the complete data log‐likelihood obtained in the E‐step with respect to the parameter vector:

images

Both steps are iterated until convergence.

In the E‐step we distinguish the five cases of data points again to determine the contribution of a data point to this expectation:

  1. images
  2. images
  3. images
  4. images
  5. images

Note that the event {tl ≤ li = ui ≤ t < T} indicates that we know tl, li = ui, t and T, and that the ordering tl ≤ li = ui ≤ t < T holds. Similar reasonings hold for the other conditional arguments in the expectations. Then, using the law of total probability, the final case can be rewritten as

images

where {tl ≤ li < Xi ≤ t < ui ≤ T} denotes that tl, li, t, ui and T are known, that the ordering tl ≤ li < t < ui ≤ T holds, and that {Xi ≤ t}. Using (4.3.43) we find that the probability in the first term is then given by

images

and similarly for the second term. The M‐step with maximization with respect to π, Θ1 and Θ2, and the choice of the initial values, is discussed in detail in Reynkens et al. [647].

EVA is not available in the literature for interval censored data. The role of the empirical survival and quantile functions in the construction of a tail analysis for complete data (i.e., setting Xnj+1, n as an estimator of Q(1 − j/(n + 1)), j = 1, …, n) is taken over by the Turnbull [747] estimator images. The Turnbull estimator is an extension to interval censoring of the Kaplan–Meier estimator or product‐limit estimator [482], that is, when images.

  • The Kaplan–Meier estimator images of 1 − F is defined as follows: letting 0 = τ0 < τ1 < τ2 < … < τN (with N < n) denote the observed possible censored data, Nj the number of observations Xi ≥ τj, and dj the number of values li equal to τj, then
    images
    This expression is motivated from the fact that
    images
  • Turnbull’s algorithm is then constructed as follows: Let 0 = τ0 < τ1 < … < τm denote here the grid of all points li, ui, i = 1, 2, …, n. Define δij as the indicator whether the observation in the interval (li, ui] could be equal to τj, j = 1, …, m. δij equals 1 if (τj−1, τj] ⊂ (li, ui] and 0 otherwise. Initial values are assigned to 1 − F(τj) by distributing the mass 1/n for the ith individual equally to each possible τj ∈ (li, ui]. The algorithm is given as:
    1. Compute the probability pj that an observation equals τj by pj = F(τj) − F(τj−1), j = 1, …, m.
    2. Estimate the number of observations at τj by
      images
    3. Compute the estimated number of data with li ≥ τj by images.
    4. Update the product‐limit estimator using the values of dj and Nj found in the two preceding steps. Stop the iterative process if the new and old estimate of 1 − F for all τj do not differ too much.

In case of interval censored data we can then estimate the mean excess function e (see (3.4.10)) substituting 1 − F by the Turnbull estimator images:

As discussed in Section 4.2.1, the mean excess function based on the log‐data leads to an estimator of a positive extreme value index γ. As in (4.2.8), using the Turnbull estimator rather than the classical empirical distribution we obtain an estimator of γ > 0 in the case of incomplete data:

We then compute these statistics at the positions images, k = 1, …, n − 1. Such plots will assist in choosing an appropriate threshold t and estimates of the extreme value index γ to validate the tail component in the splicing.

Image described by caption.

Figure 4.13 Dutch fire claim data: fit of spliced model mixed Erlang and Pareto with threshold t indicated on the mean excess plot (top left); empirical and model survival function (top right); PP plot of empirical survival function against splicing model RTF (bottom left); idem with images transformation (bottom right).

Image described by caption.

Figure 4.14 Dutch fire claim data: fit of spliced model with a mixed Erlang and two Pareto components with thresholds t1 and t2 as indicated on the mean excess plot (top left); empirical and model survival function (top right); PP plot of empirical survival function against splicing model RTF (bottom left); idem with images transformation (bottom right).

MPTL data for Company A: proportion vs. DY with 15 open circle plots (top) and boxplots of Ri,d for every development year d and factor fd used in the interval censoring approach (bottom).

Figure 4.15 MTPL data for Company A: percentage of closed claims with incurred value being a correct upper bound for final payment as a function of the number of development years (DY) (top); boxplots of Ri, d for every development year d and factor fd used in the interval censoring approach (bottom).

Image described by caption.

Figure 4.16 MTPL data for Company A: fit of spliced mixed Erlang and Pareto models with interval censoring based on upper bounds Ii, d, i = 1, …, 596, d = 5, …, 16, for non‐closed claims: mean excess plot based on (4.3.44) (top left); Hill plot based on (4.3.45) (top right); empirical and model survival function (middle left); PP plot of empirical survival function against splicing model RTF (middle right); idem with images transformation (bottom).

Image described by caption.

Figure 4.17 MTPL data for Company A (1995–2009): fit of spliced mixed Erlang and Pareto models with interval censoring based on upper bounds Ĩi,d, i = 1, …, 849, d = 1, …, 15 for non‐closed claims: mean excess plot based on (4.3.44) (top left); Hill plot based on (4.3.45) (top right); empirical and model survival function (middle); PP plot of images survival function, empirical against splicing mode (bottom left); size of confidence intervals using interval censoring with upper bounds Ii, d and Ĩi,d (bottom right).

Image described by caption.

Figure 4.18 MTPL data for Company B: mean excess plot based on (4.3.44) (top left) and Hill plot based on (4.3.45) (top right) for interval censored data based on accidents from 1990 to 2005; fit of spliced model mixed Erlang and Pareto models: empirical and model survival function (middle left); PP plot of empirical survival function against splicing model RTF (middle right) left); idem with images transformation (bottom).

If the upper bounds are put to images, that is, if one uses the right censoring framework, then, under the random right censoring assumption of independence between the real cumulative payment at closure of the claim and the censoring variable C which is observed in case the claim is right censored, estimators of γ > 0 have been proposed in Beirlant et al. [96, 101], Einmahl et al. [315], and Worms and Worms [796]. Using the likelihood approach, Beirlant et al. [101] proposed the estimator

(4.3.46) images

with images (i = 1, …, n) and images the proportion of non‐censored data in the top kZ‐data. Einmahl et al. [315] derived asymptotic results, while Beirlant et al. [96] proposed a bias‐reduced version. Worms and Worms [796] derived a tail index estimator which is derived through the estimation of the mean excess function of the log‐data, comparable with the estimator derived in (4.3.45):

where the Kaplan–Meier estimator can be written as

images

with Δi, n equal to 1 if the ith smallest observation Zi, n is non‐censored, and 0 otherwise.

Image described by caption.

Figure 4.19 Hill plots adapted for interval censoring images with upper bounds Ii, d and Ĩi,d, and Hill estimates based on random right censoring images without upper bounds. The vertical line indicates the splicing threshold t used above.

4.4 Incorporating Covariate Information

In certain instances, the assumption of i.i.d. random variables, underlying the extreme value methods discussed above, may be violated. When analysing claim data from different companies, the tail fits may differ. Also, loss distributions may change over calendar years or along the number of development years. Sometimes considering covariates may remedy the situation. Let the covariate information, whether using continuous or indicator variables, be contained in a covariate vector x = (x1, …, xp). The extension of the POT approach based on (4.2.25) has been popular in literature, starting with the seminal paper by Davison and Smith [253]. However, there are also some methods available that focus on response random variables that exhibit Pareto‐type tails. Here we denote the response variables Zi (rather than Xi as in the preceding sections) with the corresponding exceedances or POTs Y = Z/t or Y = Z − t when Z > t.

4.4.1 Pareto‐type Modelling

When modelling time dependence or incorporating any other covariate information in an independent data setting with Pareto‐type distributed responses Zi, the exceedances are defined through Yi = Zi/t for some appropriate threshold t. Note that in many circumstances the threshold should then also be modelled along x = xi, i = 1, …, n. As before, we assume that as images

(4.4.48) images

where Ai, γi > 0. Regression can be modelled through the scale parameter A and/or the extreme value index γ.

Changes in γ can be modelled in a parametric way using likelihood techniques. Suppose, for instance, that regression modelling of γ > 0 using an exponential link function appears appropriate in a given case study:

images

The log‐likelihood function is then given by

images

leading to the likelihood equations

images

Beirlant and Goegebeur [99] propose to inspect the goodness‐of‐fit of such a regression model under constant scale parameter A on the basis of a Pareto QQ‐plot using

images

which are indeed approximately Pareto distributed with tail index 1, when the regression model is appropriate.

The case where γ does not depend on i, while A does depend on i, was formalized in Einmahl et al. [316] assuming that there exists a tail function 1 − F and a continuous, positive function A defined on [0, 1] such that

uniformly for all images and all i = 1, …, n with images. A is then called the skedasis function, which characterizes the trend in the extremes through the changes in the scale parameter A. Under (4.4.49), Einmahl et al. [316] showed that the Hill estimator Hk, n is still a consistent estimator for γ. Assuming equidistant covariates xi = i/n, i = 1, …, n, as in (4.2.19),

images

where images denotes the number of Z values larger than the threshold Znk, n with covariate value x in a neighbourhood of xi. The contribution of the observations to images is governed by a symmetric density kernel function K on [−1, 1] and Kh(x) = K(x/h)/h, so that K gives more weight to the observations with covariates closer to xi. We hence obtain

images

Finally, estimators of small tail probabilities and large quantiles follow directly from (4.4.49), (4.2.12) and (4.2.13):

images

4.4.2 Generalized Pareto Modelling

Let Y1, …, Yn be independent GPD random variables and let xi denote the covariate information vector, that is,

images

where γ(x), σ(x), μ(x) denote admissible functions of x, whether of parametric nature using three vectors of regression coefficients βj (j = 1, 2, 3) of length p with images, images and images, or of non‐parametric nature. Again this model is used as an approximation of the conditional distribution of excesses Y (x) = Z − μ(x) over a high threshold μ(x) given that there is an exceedance. The choice of an appropriate threshold μ(x) is of course even more difficult than in the non‐regression setting since the threshold can depend on the covariates in order to take the relative extremity of the observations into account.

  • When parametric functions images, images and images have been chosen, the estimators of βj (j = 1, 2, 3) can be obtained by maximizing the log‐likelihood function
    images
    where Nμ denotes the number of excesses over the threshold function μ(x).
  • Alternatively, non‐parametric regression techniques are available to estimate the parameter functions γ(x), σ(x). Consider independent random variables Z1, …, Zn and associated covariate information x1, …, xn. Suppose we focus on estimating the tail of the distribution of Z at x*. Fix a high local threshold μ(x*) and compute the exceedances Yi = Zj − μ(x*), provided Zj > μ(x*), images. Here j is the index of the ith exceedance in the original sample, and images denotes the number of exceedances over the threshold μx*. Then re‐index the covariates in an appropriate way such that xi denotes the covariate associated with exceedance Yi. Using local polynomial maximum likelihood estimation, one approximates γ(x) and σ(x) by polynomials, centered at x*. Let h denote a bandwidth parameter and consider a univariate covariate x. Assuming γ, respectively σ, being m1, respectively m2, times differentiable one has for |xi − x*| ≤ h,
    images

    where

    images

    The coefficients of these approximations can be estimated by local maximum likelihood fits of the GPD, with the contribution of each observation to the log‐likelihood being governed by a kernel function K. The local polynomial maximum likelihood estimator (β1, β2) = (images) is then the maximizer of the kernel weighted log‐likelihood function

    images

    with respect to images, where g(y;μ, σ) = (1/σ)(1 + (γ/σ)y)−1−1/γ is the density of the generalized Pareto distribution.

    A more recent approach is using penalized log‐likelihood optimization based on spline functions. Let the covariates x be one‐dimensional within an interval [a, b]. The goal is to fit reasonably smooth functions hγ and hσ with γ(x) = hγ(x) and σ(x) = hσ(x) to the observations (Yi, xi), i = 1, …, Nμ. The penalized log‐likelihood is then given by

    images

    The introduction of the penalty terms is a standard technique to avoid over‐fitting when one is interested in fitting smooth functions (see Hastie and Tibshirani [428] or Green and Silverman [408]). Next • stands for γ or σ. Intuitively the penalty functions images measure the roughness of twice‐differentiable curves and the smoothing parameters λ are chosen to regulate the smoothness of the estimates ĥ : larger values of these parameters lead to smoother fitted curves.

    Let a = s0 < s1 < … < sm < sm+1 = b denote the ordered and distinct values among images. A function h defined on [a, b] is a cubic spline with the above knots if the following conditions are satisfied:

    • on each interval [si, si+1], h is a cubic polynomial
    • at each knot si, h and its first and second derivatives are continuous.

    A cubic spline is a natural cubic spline if in addition to the two latter conditions it satisfies the natural boundary condition that the second and third derivatives of h at a and b are zero. It follows from Green and Silverman [408] that for a natural cubic spline h with knots s1, …, sm one has

    images

    where h = (h(s1), …, h(sm)), and K is a symmetric m × m matrix of rank m − 2 only depending on the knots s1, …, sm. Hence

    images

In order to assess the validity of a chosen regression model one can generalize the exponential QQ‐plot of generalized residuals defined before in the non‐regression case:

images

with

images

Finally, given regression estimators for (γ(x), σ(x)) using an appropriate threshold function μ(x), extreme quantile estimators are given by

images

where images can, for instance, be taken to be equal to the Nadaraya–Watson estimator

images

For more details and other non‐parametric methods, refer to Davison and Ramesh [252], Hall and Tajvidi [420], Chavez‐Demoulin and Davison [200], Daouia et al. [249] [248], Gardes and Girard [368, 369], Gardes and Stupfler [370], Goegebeur, Guillou and Osmann [393], and Stupfler [713], as well as Chavez‐Demoulin et al. [201] for other non‐parametric extreme value regression methods and applications.

Image described by caption.

Figure 4.20 Austrian storm claim data: plot of images for Upper Austria and Vienna area (top); images (middle left) and residual QQ‐plot (middle right) for Upper Austria; images (bottom left) with and without outlier and residual QQ‐plot (bottom right) for Vienna.

4.4.3 Regression Extremes with Censored Data

In Section 4.3 we discussed the problem when estimating the distribution of the final payments based on censored data using the Kaplan–Meier estimator of the distribution of the payment data. Here we propose to consider regression modelling of the final payments given the development time at the closure of a claim. Note, however, that both the final payments and development periods are right censored, both variables being censored (or not censored) at the same time. We again use the notation Zi (i = 1, …, n) for the observed cumulative payment at the end of the study from that section, and similarly nDYe, i for the observed number of development years at the end of 2010. Again Δi, n denotes the indicator of non‐censoring corresponding to the ith smallest observed value payment Zi, n. Akritas and Van Keilegom [11] proposed the following non‐parametric estimator of the conditional distribution of X given a specific value of nDY assuming that X and the censoring variable C (see Section 4.3) are conditionally independent given nDY:

images

with weights

images

Denoting the weight W corresponding to the ith smallest Z value Zi, n with Wi, n we then arrive at the following Hill‐type estimator of the conditional extreme value given nDY = d, generalizing the unconditional Worms and Worms estimator images defined in (4.3.47):

images

Pareto QQ‐plots adapted for censoring per chosen d value can then be defined as

images
MTPL data for Company A: time plots of cumulative payments Zi as a function of nDYe (top); Pareto QQ-plots (middle) and Hill estimates (bottom) adapted for right censoring at development years nDY = 3, 8,13.

Figure 4.21 MTPL data for Company A: time plots of cumulative payments Zi as a function of nDYe (top); Pareto QQ‐plots (middle) and Hill estimates (bottom) adapted for right censoring at development years nDY = 3, 8, 13.

In order to derive a full model for the complete payments X as a function of nDY = d a local version of the splicing algorithm from Section 4.3.2 can be developed considering random right censoring on X with the kernel weights Wi = Wi(d;h) as introduced above. The EM algorithm can then be applied using a kernel weighted log‐likelihood, comparable with the approach from Section 4.4.1. For instance, given the complete version of the data, the complete likelihood function is then given by

images

The corresponding complete data weighted log‐likelihood function then equals

images

Table 4.1

nDY = 3 nDY = 8 nDY = 13
π 0.859 0.846 0.826
t 390 000 390 000 390 000
M 2 2 2
α (0.235,0.765) (0.213,0.787) (0.173,0.827)
r (1,5) (1,4) (1,4)
λ−1 39 693 50 954 51 966
γ 0.441 0.453 0.468
MTPL data for Company A: fit of splicing model at development years nDY = 3, 8, 13; PP plot of empirical survival function against splicing model RTF at nDY = 8 (top); idem with – log transformation (bottom).

Figure 4.22 MTPL data for Company A: fit of splicing model at development years nDY = 3, 8, 13; PP plot of empirical survival function against splicing model RTF at nDY = 8 (top); idem with images transformation (bottom).

4.5 Multivariate Analysis of Claim Distributions

Joint or multivariate estimation of claim distributions, for example originating from different lines of business which are possibly dependent, requires estimation of each component or marginal separately and of the dependence structure. The joint analysis of loss and allocated loss adjustment expenses (ALAE) forms another example in insurance. An early analysis of such a case is provided in Frees and Valdez [359]. A detailed EVA of such a data set using the concept of extremal dependence is found in Chapter 9 in Beirlant et al. [100].

We first model the multivariate tails using data that are large in at least one component, followed by a splicing exercise combining a tail and a modal fit. In a multivariate setting this program is of course much more complex in comparison with the univariate case. For the tail section we refer to the multivariate POT modelling using the multivariate generalized Pareto distribution, as introduced in Section 3.6. The joint modelling of “small” losses will be based on a multivariate generalization of the mixed Erlang distribution introduced by Lee and Lin [533]. Research in this matter has started only recently and here we only examine an ad hoc modelling for the Danish fire insurance data.

4.5.1 The Multivariate POT Approach

From (3.6.25) and (3.6.26) one observes the importance of estimating the stable tail dependence function l defined in Chapter 3. The estimation of the tail dependence can be performed non‐parametrically or using parametric models. We refer to Kiriliouk et al. [488] for fitting parametric multivariate generalized Pareto models using censored likelihood methods.

A non‐parametric estimator of an STDF is given by

(4.5.50) images

with

images

where Ri, j denotes the rank of Xi, j among X1, j, …, Xn, j:

images

The estimator images is a direct empirical version of definition (3.6.22) of l with u = n/k.

A slightly different version is given by

(4.5.51) images

where images denotes the ith smallest observation of component j. Bias‐reduced versions of these estimators were proposed in Fougères et al. [357] and Beirlant et al. [98]. In the bivariate case, images or images then act as estimators of the extremal coefficient θ.

An estimator of the extremal dependence coefficient χ can be constructed on the basis of an estimator of χ(u) for u → 1 using the estimator of C(u, u)

(4.5.52) images

where images (j = 1, 2; i = 1, …, n) with images denoting the empirical distribution function of the jth marginal and Xi, j the ith observation of the jth component. Hence

images

As an application note that an estimator of the parameter τ in the logistic dependence model can be obtained from χ(u) → 2 − 2τ as u → 1, from which

images

Setting

images

the Hill estimator based on images (i = 1, …, n) leads to an estimator images of the coefficient of tail dependence η. Of course bias reduction techniques can be applied here too.

4.5.2 Multivariate Mixtures of Erlangs

Lee and Lin [533] defined a d‐variate Erlang mixture where each mixture component is the joint distribution of d independent Erlang distributions with a common scale parameter 1/λ > 0. The dependence structure is then captured by the combination of the positive integer shape parameters of the Erlangs in each dimension. We denote the positive integer shape parameters of the jointly independent Erlang distributions in a mixture component by the vector r = (r1, …, rd) and the set of all shape vectors with non‐zero weight by ℛ. The density of a d‐variate Erlang mixture evaluated in x > 0 can then be written as

(4.5.53) images

Lee and Lin [533] showed that, given any density f(x), the d‐variate Erlang mixture

images

with mixing weights

images

satisfies images. The weights αr of the components in the mixture are defined by integrating the density over the corresponding d‐dimensional rectangle of the grid formed by the shape parameters multiplied with the common scale. When the value of λ increases, this grid becomes more refined and the sequence of Erlang mixtures converges to the underlying distribution function.

Verbelen et al. [757] provided a flexible fitting procedure for multivariate mixed Erlangs (MMEs), which iteratively uses the EM algorithm, by introducing a computationally efficient initialization and adjustment strategy for the shape parameter vectors. Randomly censored and fixed truncated data can also be dealt with.

Image described by caption.

Figure 4.23 Danish fire insurance data, building and contents: Hill and bias‐reduced Hill plots, building (top left) and contents (top right), plot of images against k (middle left) and Ĉ(u, u) against u ∈ (0, 1) (middle right), plot of images (bottom left) and cumulative distribution function of the fitted bivariate splicing model (bottom right).

The bivariate distribution function of the fitted bivariate GPD is given by

images

In order to guarantee that the marginal distributions have support on images one has to impose the constraints images and images, which then lead to the parameter values (γ1 = 0.57, σ1 = 3.57) and (γ2 = 0.65, σ2 = 6.05).

4.6 Estimation of Other Tail Characteristics

In Section 4.2.1.2 using EVA we discussed the estimation of an extreme quantile or a VaR

images

in detail. Another popular tail characteristic is the conditional tail expectation CTE1−p(X) defined by

images

when images, where e denotes the mean excess function defined in Section 3.4. If X is a continuous random variable, the CTE equals the Tail‐VaR and the expected shortfall (ES) (cf. Section 7.2.2, where the role of these quantities for determining the solvency capital is discussed).

For an unlimited XL treaty with retention u, recall from Chapter 2 that the expected reinsured amount images of a single claim X is given by

images

which is also referred to as the pure premium for R (see Chapter 7 for details). One immediately observes

images

With a finite layer size v in the XL treaty, the pure premium becomes

images

Hence the estimation of VaR1−p(X) and Π(u) at small and intermediate values of p, and at high and intermediate values of u is an important building block in measuring and managing risk.

When estimating VaR1−p(X) for a two‐component spliced distribution, we have from (4.3.43)

where Q1 denotes the quantile function of the ME component and Q2 of the tail component. Q1 can be obtained numerically. When the tail component is given by a simple Pareto distribution we have

images

and hence with t = Xnk, n and 1 − π = (k + 1)/(n + 1), (4.6.54) yields images from (4.2.12) when π < p ≤ 1. Using an upper‐truncated Pareto or a generalized Pareto tail fit, one can use images or images, respectively, for π < p ≤ 1.

When estimating Π(u) we again identify two cases: u ≤ t = xnk, n and u > t = xnk, n, in which case the EVA modelling can be used.

When u > t, then from (4.3.43)

images

where Π2(u) is given by the following expressions for the different possible EVA tail fits with EVI estimate smaller than 1:

  • Truncated Pareto fit:
    images
  • EPD fit: using the notation from (4.2.18)
    images
  • Generalized Pareto fit:
    images

When u < t, we have from (4.3.43) that

images

Note that Π(u) = 0 for u ≥ T and Π(u) = Π(tl) + (tl − u) for u ≤ tl. For the mixed Erlang distribution we get

images

with

images

and, assuming that rn = n, n = 1, …, M,

images
U vs. excess−loss premium displaying descending solid, dashed, and dotted lines representing ME−Pa ic, ME−Pa rc and ME−Pa ultimates, respectively.

Figure 4.24 MTPL data for Company A: XL pure premium Π(u) based on ME‐Pa fit taking interval censoring into account. Comparison with the result when the upper bounds are ignored (right censoring) and when the premium is based on the ultimates.

Of course the estimation of Π(u) can be extended to a regression context. For instance, when u = u(x) is larger than a threshold function μ(x) of a one‐dimensional covariate x, and using the GPD modelling approach, one obtains for images

images

The result of this procedure based on the GPD regression fit for the storm claim data of Upper Austria, with GPD(0.445, e−7.2+0.046w, 0), is shown in Figure 4.25.

Austrian storm claim data: XL pure premium Π(exp(0.001w)) for Upper Austria based based on a GPD regression fit with the wind index W as covariate, illustrated by an ascending curve.

Figure 4.25 Austrian storm claim data: XL pure premium Π(exp(0.001w)) for Upper Austria based based on a GPD regression fit with the wind index W as covariate.

4.7 Further Case Studies

We end this chapter by analysing the case studies on flood risk and earthquake risk which were introduced in Chapter 1.

  • Flood risk. Here we model the aggregate annual loss data introduced in Section 1.3.4 (given as a percentage of the building value) for Germany and the UK. All presented derivative plots for Germany in Figure 4.27 based on the Pareto, log‐normal, and Weibull QQ‐plots ultimately are decreasing, while for the UK in Figure 4.26 the decrease in the Weibull derivative plot is small and this plot is closest to being constant when images. The systematic decrease in the different estimators of γ with increasing threshold, together with the P‐values of the TB test for upper‐truncation does indicate some evidence for a truncated Pareto tail. Indeed, for both countries the truncated Pareto model fits well. The estimates images of the right truncation point T are situated around 0.25 for the UK and 0.35 for Germany. However, for the UK data, a Weibull fit provides a valid alternative.
  • Earthquake risk. We consider recent magnitude data of the 200 largest earthquakes in the Groningen area (the Netherlands) which are caused by gas extraction. In Figure 4.28 (top left), we present the exponential QQ‐plot. A linear pattern is visible for a large section of the magnitudes data, while some concave curvature appears at the largest values. Along the Gutenberg–Richter (1956) law the magnitudes of independent earthquakes are drawn from a doubly truncated exponential distribution
    images
    Kijko and Singh [487] provide a review of the vast literature on estimating the maximum possible magnitude TM. The energy E released by earthquakes, expressed in megaJoules, relates to the magnitude M by
    images
    When transforming the magnitude data back to the energy scale, the Gutenberg–Richter model predicts a truncated Pareto tail. In Figure 4.28, plotting the Hill estimates we observe a systematic decrease with decreasing k, while the moment and ML‐GPD estimators tend to − 1 near k = 1. The estimates of images stay rather stable at a level images. The P‐values of the TB test for upper‐truncation are boundary significant at significance level 0.05 for k ∈ (30, 70). The amount of truncation is estimated around images. The goodness of fit of the truncated Pareto fit is illustrated on the Pareto QQ‐plot of the energy data where the truncated Pareto‐model is fitted based on the top 50 values. The maximum magnitude images is then estimated at 3.75 for the Groningen area.
Image described by caption.
Image described by caption.

Figure 4.26 UK flood loss data: mean excess plot (xnk, n, ek, n) (top left); Hill plot images (top right); log‐normal derivative plot images (second line left); Weibull derivative plot images (second line right); γ estimates (third line left); P‐values of TB test (third line right); endpoint estimates images (bottom left); Pareto QQ‐plot with truncated Pareto fit (full line) and Pareto fit (dashed line) (bottom right).

Image described by caption.
Image described by caption.

Figure 4.27 Flood loss data Germany: mean excess plot (xnk, n, ek, n) (top left); Hill plot images (top right); log‐normal derivative plot images (second line left); Weibull derivative plot images (second line right); γ estimates (third line left); P‐values of TB test (third line right); endpoint estimates images (bottom left); Pareto QQ‐plot with truncated Pareto fit (full line), and Pareto fit (dashed line) (bottom right).

Image described by caption.

Figure 4.28 Earthquake magnitude data from Groningen area. Exponential QQ‐plot based on magnitudes (top left); estimates of γ (top right); TB P‐value plot (middle left); images estimates (middle right); Pareto QQ‐plot of energy values with truncated Pareto fit (bottom right); estimates of maximum magnitude images (bottom right).

4.8 Notes and Bibliography

In case of the Gumbel domain of attraction with γ = 0, EVA based on fitting a generalized Pareto distribution to POT values is known to exhibit slow convergence rates in many cases. To this end more specific models have been proposed, for example El Methni et al. [320] and De Valk and Cai [262].

In the last few decades some papers have appeared concerning robust estimation methods. Robust methods can improve the quality of extreme value data analysis by providing information on influential observations, deviating substructures and possible mis‐specification of a model while guaranteeing good statistical properties over a whole set of underlying distributions around the assumed one. On the other hand an EVA precisely is performed to consider and emphasize the role of extremes. Hence in a risk management context it can hardly be the purpose to delete the most extreme observations when they were correctly reported. Robust and non‐robust estimators then yield different scenarios for risk assessment should be compared. An interesting discussion on this can be found in Dell’Aquila and Embrechts [270].

EVA is an active field of research. A notable recent contribution is Naveau et al. [587], which gives an alternative to splicing methods in order to produce full models in a hydrological context. In De Valk [261] and Guillou et al. [415], some further new modelling approaches in the multivariate case are introduced.

A Bayesian approach to estimate the total cost of claims in XL reinsurance has been covered in Hesselager [441]. For the estimation of the Pareto index in XL treaties, see Reiss et al. [644]. Leadbetter [529] studied the connection between tail inference and high‐level exceedance modelling, which is relevant for the XL case. Examples of early statistical analyses of large fire losses are Ramachandran [639], Ramlau‐Hansen [640], and Corradin et al. [227]. Resnick [646] also studied the Danish fire insurance data set. Data for coverages of homes are analyzed in Grace et al. [404]. For glass losses, see Ramlau‐Hansen [640]. Property reinsurance for the USA is covered in Gogol [394], for example.

Note

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset