Check-Pointing and Calibration

In the previous chapter, we learned to differentiate simulations against model parameters in constant time, and produced microbuckets (sensitivities to local volatilities) in Dupire's model with remarkable speed. In this chapter, we present the key check-pointing algorithm, apply it to differentiation through calibrations to obtain market risks out of sensitivities to model parameters, and implement superbucket risk reports (sensitivities to implied volatilities) in Dupire's model.


Reducing RAM consumption with check-pointing

We pointed out in Chapter 11 that, due to RAM consumption, AAD cannot efficiently differentiate calculations taking more than 0.01 seconds on a core. Longer calculations, which include almost all cases of practical relevance, must be divided into pieces shorter than 0.01 seconds.core, and differentiated separately over each piece, wiping RAM in between and aggregating sensitivities in the end.

How exactly this is achieved depends on the instrumented algorithm. This is particularly simple in the case of path-wise simulations, where sensitivities are computed path by path and averaged in the end. But this solution is specific to simulations. This chapter discusses a more general solution called check-pointing. Check-pointing applies to many problems of practical relevance, in finance and elsewhere. Huge successfully applied it to the differentiation of multidimensional FDM in [90]. Huge and Savine applied it to efficiently differentiate the LSM algorithm and produce a complete xVA risk in [31]. We explain check-pointing in general mathematical and programmatic terms and apply it to the differentiation of calibration.

Formally, we differentiate a two-step algorithm, that is, a scalar function images that can be written as:


where images is a vector-valued function and images is a scalar function, assumed differentiable in constant time. We denote images and images.

In the context of a calibrated financial simulation model, images is the value of some transaction in some model with images parameters images. We learned to differentiate images in constant time in the previous chapter. images is an explicit1 calibration that produces the images model parameters images out of images market variables images, with images. images computes the value images of the transaction from the market variables images. The differentials of images are the hedge coefficients, also called market risks, that risk reports are designed to produce.

Importantly, there are many good reasons to split differentiation this way, besides minimizing the size of the tape. In the context of financial simulations, intermediate differentials provide useful information for research and risk management, the sensitivity of transactions to model parameters and the sensitivity of model parameters to market variables aggregated into market risks, but also constitute interesting information in their own right. Besides, to differentiate calibration is fine when it is explicit, but to differentiate a numerical calibration may result in unstable, inaccurate sensitivities. We present in Section 13.3 a specific algorithm for the differentiation of numerical calibrations. But we can only do that if we split differentiation between calibration and valuation.

Another example is how we differentiated the simulation algorithm in the previous chapter. We split the simulation algorithm images into an initialization phase images, which pre-calculates images deterministic results images from the model parameters images, and a simulation phase images, which generates and evaluates paths and produces a value images out of images and images.2 We implemented images in a loop over paths, and differentiated each of its iterations separately, before propagating the resulting differentials over images. This is a direct application of check-pointing, although we didn't call it by its name at the time. If we hadn't split the differentiation of images into the differentiation of images and the differentiation of images, we could not have implemented path-wise differentiation that efficiently. More importantly, we could not have conducted the differentiation of images in parallel. In order to differentiate in parallel a parallel calculation, we must first extract the parallel piece and differentiate it separately from the rest, applying check-pointing to connect the resulting differentials.

As a final example, we could split the processing images of every path into the generation images of the images-dimensional scenario images and the evaluation images of the payoff. We differentiated images altogether in the previous chapter, but for a product with a vast number of cash-flows valued over a high-dimensional path, like an xVA, we would separate the differentiation of the two to limit RAM consumption.

In all these cases, images can be differentiated in constant time with AAD because it is a scalar function. In theory, images is also differentiable in constant time because it is also a scalar function. But we discussed some of the many reasons why it may be desirable to split its differentiation. Hence, the exercise is to split the differentiation of images into a differentiation of images and a differentiation of images while preserving the constant time property. The problem is that images is not a scalar function, hence it cannot be differentiated in constant time with straightforward AAD. To achieve this, we need additional logic, and it is this additional logic that is called check-pointing.

Formally, from the chain rule:


and our assumption is that we have a constant time computation for images. But AAD cannot compute the Jacobian images in constant time. We have seen in Chapter 11 that Jacobians take linear time in the number of results images. With bumping, it takes linear time in images. In any case, it cannot be computed in constant time. Furthermore, images is the product of the images vector images by the images matrix images, linear in images.

Check-pointing applies adjoint calculus to compute images in constant time, in a sequence of steps where the adjoints of images and images are propagated separately, without ever computing a Jacobian or performing a matrix product.

If, hypothetically, we did differentiate images altogether with a single application of AAD, what would the tape look like?

Illustration of a tape, where part of the tape that belongs to H is self-contained for adjoint propagation, because H only depends on Y, and not on G.

It must be this way, because images is computed before images, and images only depends on the results of images.

The part of the tape that belongs to images is self-contained for adjoint propagation, because images only depends on images, and not on the details of its internal calculations within images.3 So the arguments to all calculations within images must be located on images's part of the tape, including images; they cannot belong to images's part of the tape.

The section of the tape that belongs to images (inclusive of inputs images and outputs images) is also self-contained. Evidently, the calculations within images cannot depend on the calculations in images, which is evaluated later. And we have seen that the calculations within images cannot directly reference those of images, except through its outputs images.

Hence, the tape for images is separable for the purpose of adjoint propagation: it can be split into two self-contained tapes, with a common section images as the output to images's tape and the input to images's tape.

Illustration of a tape for F which is separable for the purpose of adjoint propagation, split into two self-contained tapes, with a common section Y as the output to G's tape and the input to H's tape.

It should be clear that an overall back-propagation through the entire tape of images is equivalent, and produces the same results, as two successive back-propagations, first through the tape of images, and then through the tape of images. Note that the order is reversed from evaluation, where images is evaluated first and images is evaluated last.

It is this separation that allows the multiple benefits of separate differentiations, including a smaller RAM consumption. Only one of the two tapes of images and images is active at a time in memory, and the differentials of images are accumulated through adjoint propagation alone, hence, in constant time. The check-pointing algorithm is articulated below:

  1. Starting with a clean tape, compute and store images without AAD instrumentation. The only purpose here is to compute images. Put images on tape.
  2. Compute the final result images with an instrumented evaluation of images. This builds the tape for images.
  3. Back-propagate from images to images, producing the adjoints of images: images. Store this result.
  4. Wipe images's tape. It is no longer needed.
  5. Put images on tape.
  6. Recompute images as in step 1, this time with AAD instrumentation. This builds the tape for images.
  7. Seed that tape with the adjoints of images, that is the images, known from step 3. This is the defining step in the check-pointing algorithm. Instead of seeding the tape with 1 for the end result (and 0 everywhere else), seed it with the known adjoints for all the components of the vector images.
  8. Conduct back-propagation through the tape of images, from the known adjoints of images to those, unknown, of images.
  9. The adjoints of images are the final, desired result: images.
  10. Wipe the tape.

The following figure shows the state of the tape after each step:

Illustration of a tape for H displaying the result of back-propagating from z to Y, producing the adjoints of Y.

It should be clear that this algorithm guarantees all of the following:

  • Constant time computation since only adjoint propagations are involved. Note that the Jacobian of images is never computed. We don't know it at the term of the computation, and we don't need it to produce the end result. Also note that successive functions are propagated in the reverse order to their evaluation. Check-pointing is a sort of “macro-level” AAD where the nodes are not mathematical operations but steps in an algorithm.
  • Correct adjoint accumulation since it should be clear that these computations produce the exact same results as a full adjoint propagation throughout the entire tape for images. It is actually the same propagations that are executed, but through pieces of tape at a time.
  • Reduced RAM consumption since only the one tape for images or images lives in memory at a time.

In code, check-pointing goes as follows:


We could write a generic higher-order function to encapsulate check-pointing logic. But we refrain from doing so. It is best left to client code to implement check-pointing at best in different situations. The AAD library provides all the basic constructs to implement check-pointing easily and in a flexible manner.

Note that the algorithm also works in the more general context where:


because, then:


The left-hand side images is computed in constant time by differentiation of images (we can do that: it has been our working hypothesis all along). The right-hand side images is computed by check-pointing.

Alternatively, we may redefine images to return the images coordinates of images in addition to its result images, and we are back to the initial case where images.

This concludes our general discussion of check-pointing. Check-pointing applies in vast number of contexts, to the point that every nontrivial AAD instrumentation involves some form of check-pointing, including the instrumentation of our simulation library in the previous chapter, as pointed out earlier. In the next section, we apply check-pointing to calibration and the production of market risks. In the meantime, we quickly discuss application to black box code.

Check-pointing black box code

AAD instrumentation cannot be partial. The entire calculation code must be instrumented, and all the active code must be templated. A partial instrumentation would break the chain rule and prevent adjoints to correctly propagate through non-instrumented active calculations, resulting in wrong differentials. It follows that all the source code implementing a calculation must be available, and modifiable, so it may be instantiated with the Number type.

In a real-world production environment, this is not always the case. It often happens that part of the calculation code is a black box. We can call this code to conduct some intermediate calculations, but we cannot easily see or modify the source code. The routine may be part of third-party software with signatures in headers and binary libraries, but no source code. Or, the source code may be written in a different language. Or, the source is available but cannot be modified, for technical, policy, or legal reasons.

Or maybe we could instrument the code but we don't want to. An intensive calculation code with a low number of active inputs may be best differentiated either analytically or with finite differences. Or, as we will see in the case of a numerical calibration, some code must be differentiated in a specific manner, a blind differentiation, either with finite differences or AAD, resulting in wrong or unstable derivatives. This applies to many iterative algorithms, like eigenvalue decomposition, Cholesky decomposition, or SVD regression, as noted by Huge in [93].

In all these cases, we have an intermediate calculation that remains non-instrumented, and differentiated in its own specific way, which may or may not be finite differences.4 Check-pointing allows to consistently connect this piece in the context of a larger differentiation, the rest of the calculation being differentiated in constant time with AAD.

For example, consider the differentiation of a calculation images that is evaluated in three steps, images, images, and images:


where images is the black box. It is not instrumented, and its Jacobian images is computed by specific means, perhaps finite differences. The problem is to conduct the rest of the differentiation in constant time with AAD, and connect the Jacobian of the black box without breaking the derivatives chain. We have discussed a walkaround in Chapter 8 in the context of manual adjoint code. In the context of automatic adjoint differentiation, we have two choices.

We could overload images and make it a building block in the AAD library, like we did for the Gaussian functions in Chapter 10. This solution invades and grows the AAD library. It is recommended when images is a low-level, general-purpose algorithm, called from many places in the software.

In most situations, however, images would be a necessary intermediate calculation in a specific context, which doesn't justify an invasion of the AAD library. All we need is a walkaround in the instrumentation of images, along the lines of Chapter 8, but with automatic adjoint propagation. We can implement such walkaround with check-pointing. Denote:


then, by the chain rule:


We start with a non-instrumented calculation of images, as is customary with check-pointing. Next, we compute the value images of images, as well as its Jacobian images, computed, as discussed, by specific means.

Knowing images, we compute the gradient images, a row vector in dimension images, of images, in constant time, with AAD instrumentation. We multiply it on the right by images to find:


This row vector in dimension images is by definition the adjoint of images in the calculation of images. We can therefore apply the check-pointing algorithm. Execute an instrumented instance of:


which builds the tape of images, seed the adjoints of the results images with the known images, and back-propagate to find the desired adjoints of images, that is images.

We successfully applied check-pointing to connect the specific differentiation of images with the rest of the differentiated calculation. The differentiation of images takes the times it takes, and a matrix-by-vector product is necessary for the connection, but the rest of the differentiation proceeds with AAD in constant time.


Dupire's formula

We now turn toward financial applications of the check-pointing algorithm, more precisely, the important matter of the production of market risks.

So far in this book, we implemented Dupire's model with a given local volatility surface images, represented in practice by a bilinearly interpolated matrix. Its differentiation produced the derivatives of some transaction's value in the model with respect to this local volatility matrix.

But this is not the application Dupire meant for his model. Traders are not interested in risks to a theoretical, abstract local volatility. Dupire's model is meant to be calibrated to the market prices of European calls and puts, or, equivalently, market-implied Black and Scholes volatilities, such that its values are consistent with the market prices of European options, and its risk sensitivities are derivatives to implied volatilities, which represent the market prices of concrete instruments that traders may buy and sell to hedge the sensitivities of their transactions.

Dupire's model is unique in that its calibration is explicit. The calibrated local volatility is expressed directly as a function of the market prices of European calls by Dupire's famous formula [12]:


where images is today's price of the European call of strike images and maturity images, and subscripts denote partial derivatives.

Dupire's formula may be elegantly demonstrated in a couple of lines with Laurent Schwartz's generalized derivatives and Tanaka's formula (essentially an extension of Ito's lemma in the sense of distributions), following the footsteps of Savine, 2001 [44].

By application of Tanaka's formula to the function images under Dupire's dynamics images, we find:


where images is the Dirac mass. Taking (risk-neutral) expectations on both sides:


where images is the (risk-neutral) density of images in images, and since images, we have Dupire's result:


Similar formulas are found with this methodology in extensions of Dupire's model with rates, dividends, stochastic volatility, and jumps; see [44].

The Implied Volatility Surface (IVS)

Dupire's formula refers to today's prices of European calls of all strikes images and maturities images, or, equivalently, the continuous surface of Black and Scholes's market-implied volatilities images. In Chapter 4, we pointed out that this is also equivalent to marginal risk-neutral densities for all maturities, and called this continuous surface of market prices an Implied Volatility Surface or IVS.

The IVS must satisfy some fundamental properties to feed Dupire's formula: it must be continuous, differentiable in images, and twice differentiable in images. images must be strictly positive, meaning call prices must be convex in strike. We also require that images, meaning call prices are increasing in maturity. images or images would allow a static arbitrage (see for instance [46]) so any non-arbitrageable IVS guarantees that images. But this is not enough. We need strict positiveness as well as continuity and differentiability.

We pointed out in Chapter 4 that the market typically provides prices for a discrete number of European options, and that to interpolate a complete, continuous, differentiable IVS out of these prices was not a trivial exercise. The implementation of the accepted solutions in the industry, including Gatheral's SVI ([40]) and Andreasen and Huge's LVI ([41]), are out of our scope here.

We circumvent this difficulty by defining an IVS from Merton's jump-diffusion model of 1976 [98]. Merton's model is an extension of Black and Scholes where the underlying asset price is not only subject to a diffusion, but also random discontinuities, or jumps, occurring at random times and driven by a Poisson process:


where images is a Poisson process with intensity images and the imagess are a collection IID random variables such that images. The Poisson process and the jumps are independent from each other and independent from the Brownian motion. Jumps are roughly Gaussian with mean images and variance images. images guarantees that images satisfies the martingale property so the model remains non-arbitrageable.

Merton demonstrated that the price of a European call in this model can be expressed explicitly as a weighted average of Black and Scholes prices:


where images is Black and Scholes's formula. The model is purposely written so the distribution of images, conditional to the number images of jumps, is log-normal with known mean and variance. The conditional expectation of the payoff is therefore given by Black and Scholes's formula, with a different forward and variance depending on the number of jumps. It follows that the price, the unconditional expectation, is the average of the conditional expectations, weighted by the distribution of the number of jumps. The distribution of a Poisson process is well known, and Merton's formula follows.

The term in the infinite sum dies quickly with the factorial, so it is safe, in practice, to limit the sum to its first 5–10 terms. The formula is implemented in analytics.h, along with the Black and Scholes's formula.

We are using a continuous-time, arbitrage-free model to define the IVS; therefore, the properties necessary to feed Dupire's formula are guaranteed. In addition, Merton's model is known to produce realistic IVS with a shape similar to major equity derivatives markets.

We are using a fictitious Merton market in place of the “real” market so as to get around some technical difficulties unrelated to the purpose of this document. This is evidently for illustration purposes only and not for production.

We declare the IVS as a polymorphic class that provides a Black and Scholes market-implied volatility for all strikes and maturities. The implementation is simplified in that it ignores rates or dividends. The following code is found in ivs.h:


where the concrete IVS derives images to provide a volatility surface. The IVS also provides a method for the pricing of European calls:


where the function images is implemented in analytics.h, templated. By application of Dupire's formula, the IVS also provides the local volatility for a given spot and time:


As discussed, we define a concrete IVS from Merton's model. All a concrete IVS must do is derive images:


where images is an implementation of Merton's formula, and images implements a numerical procedure to find an implied volatility from an option price. Both are implemented in analytics.h.

We implemented a generic framework for IVS. Although we only implemented one concrete IVS, and a particularly simple one that defines the market from Merton's model, we could implement any other concrete IVS, including:

  • Hagan's SABR [36] with parameters interpolated in maturity and underlying, as is market practice for interest rate options,
  • Heston's stochastic volatility model [42] with parameters interpolated in maturity, as is market practice for foreign exchange options.5
  • Gatheral's SVI implied volatility interpolation [40], as is market standard for equity derivatives, or
  • Andreasen and Huge's recent award-winning LVI [41] argitrage-free interpolation.

Any concrete IVS implementation must only override the images method to provide a Black and Scholes implied volatility for any strike and maturity. The rest, in particular the computation of Dupire's local volatility, is on the base IVS.

Calibration of Dupire's model

It is easy to calibrate Dupire's model to an IVS; all it takes is an implementation of Dupire's formula. The formula guarantees that the resulting local volatility surface in Dupire's model matches the option prices in the IVS. We write a free calibration function in mcMdlDupire.h. It accepts a target IVS, a grid of spots and times, and returns a local volatility matrix, calibrated sequentially in time:


It is convenient to conduct the calibration sequentially in time, although our Dupire stores local volatility in spot major. For this reason, we calibrate a temporary volatility matrix in time major, and return its transpose (defined in matrix.h). We calibrate each time slice independently with the free function dupireCalibMaturity() defined in mcMdlDupire.h:


Finally, we have the following higher-level function in main.h for our application:


It takes around 50 milliseconds to calibrate a local volatility grid of 30 spots between 50 and 200 and 60 times between now and 5 years, to a Merton IVS with spot 100, volatility 15, jump intensity 5, mean −15 and standard deviation 10. With 150 spots and 260 times, it takes 400 ms. Calibration is embarrassingly parallel and trivially multi-threadable across maturities. This is left as an exercise.

We can easily test the quality of the calibration. Initialize Dupire's model with the result of the calibration. Price a set of European options of different strikes and maturities (developed as a single product with multiple payoffs on page 238) by simulation in this model, and compare with Merton's price as implemented in the images function in analytics.h in closed-form. In our tests, Dupire and Merton prices match within a couple of basis points over a wide range of strikes and maturities (with 500,000 paths, weekly time steps, where a parallel Sobol pricing of 20 European calls with maturities up to three years takes 400 milliseconds).

Risk views

The process calibration + simulation produces the value of a transaction out of market-implied volatilities. Its differentials are sensitivities to market-traded variables, more relevant for trading and hedging than sensitivities to model parameters:

Illustration for conducting adjoint propagation through the tape of G, from the known adjoints of Y to those, unknown, of X.

Model parameters are obtained from market variables with a prior calibration step. Model sensitivities are obtained with AAD as explained and developed in Chapter 12. We can therefore obtain the market sensitivities by check-pointing the model sensitivities into calibration.

We developed, in the previous chapter, functionality to obtain the microbucket images in constant time. We check-point this result into calibration to obtain images, what Dupire calls a superbucket.

We are missing one piece of functionality: our IVS images is defined in derived IVS classes, from a set of parameters, which nature depends on the concrete IVS. For instance, the Merton IVS is parameterized with a continuous volatility, jump intensity, and the mean and standard deviation of jumps. The desired derivatives are not to the parameters of the concrete IVS, but to a discrete set of implied Black and Scholes market-implied volatilities, irrespective of how these volatilities are produced or interpolated.

To achieve this result, we are going to use a neat technique that professional financial system developers typically apply in this situation: we are going to define a risk surface:


such that if we denote images the implied volatilities given by the concrete IVS, our calculations will not use these original implied volatilities, but implied volatilities shifted by the risk surface:


Further, we interpolate the risk surface images from a discrete set of knots:


that we call the risk view. All the knots are set to 0, so:


so the results of all calculations remain evidently unchanged by shifting implied volatilities by zero, but in terms of risk, we get:


The risk view does not affect the value, and its derivatives exactly correspond to derivatives to implied volatilities, irrespective of how these implied volatilities are computed.

We compute sensitivities to implied volatilities as sensitivities to the risk view:


Risk views apply to bumping as well as AAD and are extremely useful, in many contexts, to aggregate risks over selected market instruments.

In the context of Dupire's model, we apply a risk view over an IVS fed to Dupire's formula. Dupire's formula depends on the first- and second-order derivatives of call prices, so the risk view must be differentiable. (Bi-)linear interpolation is not an option. We must implement a smooth interpolation. A vast amount of smooth interpolations exist in literature, but what we need is a localized one, otherwise the resulting risk spills over the volatility surface. For these reasons, we implement a well-known, simple, localized and efficient smooth interpolation algorithm called smoothstep, presented in many places, including Wikipedia's “Smoothstep” article. Like linear interpolation, smoothstep interpolation finds images such that images, and, unlike linear interpolation, which returns:


where images, smoothstep returns:


Practically, we upgrade the images function of Chapter 6 to implement either linear or smoothstep interpolation. We also produce a two-dimensional variant:


Armed with smooth interpolation, we can define the RiskView object in ivs.h. Note that (contrarily to the IVS), the risk view is templated since we will be computing derivatives to its knots, and we want to do that with AAD:


This code should be self-explanatory; the risk view is nothing more than a two-dimensional interpolation object with convenient accessors and iterators and knots set to zero. The method images modifies one knot by a small amount images, from zero to images, so we can apply bump risk.

The next step is to effectively incorporate the risk view in the calculation. We extend the methods images and images on the base IVS so they may be called with a risk view.


The modification is minor. A shift, interpolated from the risk view, is added to the implied volatility for the computation of call prices, hence local volatilities. Since the risk view is set to zero, this doesn't modify results, but it does produce risk with respect to the risk view's knots.

Finally, we apply the same minor modification to the calibration functions in McMdlDupire.h so they accept an optional risk view:



We now have all the pieces to compute superbuckets by check-pointing. We build a higher-level function in main.h that executes the steps of our check-pointing algorithm:


The first check-pointing step is compute local volatilities by calibration (after initialization of the tape) and store the calibrated model in memory.


Next, we compute the microbucket:


where images is essentially a wrapper around images from the previous chapter, with an interface specific to Dupire that returns a delta and a microbucket matrix:


The next part is the crucial one for the check-pointing algorithm: we clean the tape, build the risk view (which puts it on tape), and conduct calibration again, this time in instrumented mode:


We seed the calibration tape with the microbucket obtained earlier:


and propagate back to the risk view:


This completes the computation. We pick the desired derivatives as the adjoints of the knots in the risk view, clean the tape, and return the results:


We illustrated two powerful and general techniques for the production of financial risk sensitivities: the risk view, which allows to aggregate risks over instruments selected by the user, irrespective of calibration or the definition of the market; and check-pointing, which separates the differentiation of simulations from the differentiation of calibration, so that the differentiation of simulations may be performed with path-wise AAD, with limited RAM consumption, in parallel and in constant time.

Finally, note that we left delta unchanged through calibration. We are returning the Dupire delta, the sensitivity to the spot with local volatility unchanged. With the superbucket information, we could easily adjust delta for any smile dynamics assumption (sticky, sliding, …). There is no strong consensus within the trading and research community as to what the “correct” delta is, but most convincing research points to the Dupire delta as the correct delta within a wide range of models; see, for instance, [99].

Finite difference superbucket risk

As a reference, and in order to test the results of the check-pointing algorithm, we implement a superbucket bump risk, where we differentiate, with finite differences, the whole process calibration + simulation in a trivial manner in main.h:



We start with the superbucket of a European call of maturity 3 years, strike 120, over a risk view with 14 knot strikes, every 10 points between 50 and 180, and 5 maturities every year between 1y and 5y.

We define the European option market as a Merton market with volatility 15, jump intensity 5, mean jump −15 and jump standard deviation 10. We calibrate a local volatility matrix with 150 spots between 50 and 200, and 60 times between now and 5 years.6 We simulate with 300,000 Sobol points in parallel over 312 (biweekly) times steps.

The Merton price is 4.25. Dupire's price is off two basis points at 4.23. The corresponding Black and Scholes implied volatility is 15.35. The Black and Scholes vega is 59. We should expect a superbucket with 59 on the strike 120, maturity 3 years, and zero everywhere else. The superbucket is obtained in around two seconds. Almost all of it is simulation. Calibration and check-pointing are virtually free.

With the improvements of Chapter 15, the superbucket is produced in one second.

The resulting superbucket is displayed on the chart below.

Chart displaying the resulting superbucket with 59 on the strike 120, maturity 3 years, and zero everywhere else, obtained in two seconds.

This is a good-quality superbucket, especially given it was obtained with simulations in just over a second. Superbuckets traditionally obtained with FDM are of similar quality and also take around a second to compute. Once again, we notice that AAD and parallelism bring FDM performance to Monte-Carlo simulations.

With the same settings (10 points spacing on the risk view), we compute the superbucket for a 2 years 105 call (first chart) and a 2.5 years 85 call (second chart). The results are displayed below. The calculation is proportionally faster for lower maturities, linearly in the total number of time steps. We see that vega is correctly interpolated over the risk view (with limited spilling that eventually disappears as we increase the number of paths and time steps).

Chart with the same settings (10 points spacing on the risk view) to compute the superbucket for a 2 years 105 call.
Chart with the same settings (10 points spacing on the risk view) to compute the superbucket for a 2.5 years 85 call.

Finally, with a maturity of 3 years, strike 120, and a (biweekly monitored) barrier of 150 (with a barrier smoothing of 1), we obtain the following. The calculation time is virtually unchanged, the barrier monitoring cost being essentially negligible.

Chart for a virtually unchanged calculation time obtained with a maturity of 3 years, strike 120, and a (biweekly monitored) barrier of 150 (with a barrier smoothing of 1).

This barrier superbucket has the typical, expected shape for an up-and-out call: positive vega concentrated at maturity on the strike, negative vega, also concentrated at maturity (with some spilling over the preceding maturity on the risk view from the interpolation), below the barrier, partly unwound by (perhaps counterintuitive but a systematic observation nonetheless) positive vega on the barrier.

Comparing with a bump risk for performance and correctness, we find that finite differences produce a very similar risk report in 45 around seconds. For a 3y maturity, it could be reduced to 30 seconds by only bumping active volatilities. We are computing “only” 42 risk sensitivities (14 strikes and 3 maturities up to 3y on the risk view), so AAD acceleration is less impressive here: times 30, probably down to times 20 with a smarter implementation of the bump risk.

However, in this particular case, AAD risk is also much more stable. The results of the bumped superbucket depend on the size of the bumps and the spacing of the local volatility and the risk view, in an unstable, explosive manner. It frequently produces results in the thousands in random cells where vega is expected in the tens. AAD superbuckets are resilient and stable, because derivatives are computed analytically, without ever actually changing the initial conditions.

Finally, the quality of the superbucket is dependent on how sparse is the risk view, and rapidly deteriorates when more strikes and maturities are added to it. This is a problem with superbuckets known in the industry: to obtain decent superbuckets over a thinly spaced risk view forces to increase the time steps at the expense of speed. It helps to implement simulation schemes more sophisticated than Euler's.


Iterative calibration

We mentioned that explicit, analytic calibration was an exceptional feature of Dupire's model. Other models are typically calibrated numerically. In general terms, calibration proceeds as follows. To calibrate the images parameters images of a model (say the images local volatilities in a simulation model a la Dupire) to a market (say an IVS with a risk view images), pick images instruments (say European calls of different strikes and maturities) and find the model parameters so that the model price of these instruments matches their market price.

Denote images the market price of the instrument images and images its model price with parameters images. The functions images and images are always explicit and generally analytic or quasi-analytic, depending on the nature of the market and model.

We call calibration error for instrument images the quantity:


where images is the weight of instrument images in the calibration. Since images and images are explicit, it follows that images is also explicit.

Calibration consists in finding the images that minimizes the norm of images. The optimal images is a function of the market images:


The minimization is conducted numerically, with an iterative procedure. Numerical Recipes [20] provides the code and explanations for many common optimization routines. The most commonly used in the financial industry is Levenberg and Marquardt's algorithm from their chapter 15.

It follows that images is a function images of images, just like in the explicit case, but here this function is defined implicitly as the result of an iterative procedure.

Differentiation through calibration

As before, we compute a price in the calibrated model:


and assume that we effectively differentiated the valuation step so we know:


Our goal, as before, is to compute the market risks:


The difference is that images is now an implicit function that involves a numerical fit, and it is not advisable to differentiate it directly, with AAD or otherwise.

First, numerical minimization is likely to take several seconds, which would not be RAM or cache efficient with AAD. Second, numerical minimization always involves control flow. AAD cannot differentiate control flow, and bumping through control flow is unstable. Finally, in case images, the result is a best fit, not a perfect fit, and to blindly differentiate it could produce unstable sensitivities.

It follows that we compute the Jacobian of images without actually differentiating images, with a variant of the ancient Implicit Function Theorem, as described next. It also follows that check-pointing is no longer an option. We must compute the Jacobian of imagesimages and calculate a matrix product to find images. The matrix product must be calculated efficiently; we refer to our Chapter 1.

Note that although we cannot implement check-pointing in this case, we still want to separate the differentiation of images from the differentiation of images, because then we can compute images efficiently with parallel, path-wise AAD

The Implicit Function Theorem

We want to compute the Jacobian images without actually differentiating images. Remember that:


and that:


is an explicit function that may be safely differentiated with AAD or otherwise.

To find images as a function of the differentials of images, we demonstrate a variant of the Implicit Function Theorem, or IFT.

The result of the calibration images that realizes the minimum satisfies:


It follows that:


Differentiating with respect to images, we get:


Assuming that the fit is “decent,” the errors at the optimum are negligible compared to the derivatives so we can drop the left term and get:


For clarity, we denote images the derivative of images with respect to its first variable images, and images its derivative to the second variable images. Then, dropping the approximation in the equality:


And it follows that:


The Jacobian of the calibration, that is, the matrix of the derivatives of the calibrated model parameters to the market parameters, is the product of the pseudo-inverse of the differentials images of the calibration errors to the model parameters, by the differentials images of the errors to the market parameters.

The differentials images and images are computed by explicit differentiation, and the Jacobian of images is computed with linear algebra without ever actually differentiating through the minimization. Finally, we have an expression for the market risks:


We call this formula the IFT. We note that this is the same formula as the well-known normal equation for a multidimensional linear regression:


The market risks correspond to the regression of the model risks of the transaction onto the market risks of the calibration instruments. When images is square of full rank (the calibration is a unique perfect fit), that expression simplifies into:


In addition, in the case of a perfect fit, calibration weights are superfluous, hence:


and finally:


This formula further illustrates that we are projecting the model risks of the transaction onto the market risks of the calibration instruments. Note that in addition to dealing with weights, the complete IFT formula simply replaces the inverse in the simplified formula by a pseudo-inverse. It is best to use the complete IFT formula in all cases, the pseudo-inverse stabilizing results when images approaches singularity.

We refer to [94] for an independent discussion of differentiation through calibration and application of the IFT.

We have derived two very different means of computing market risks out of model risks in this chapter: with the extremely efficient check-pointing technique when model parameters are explicitly derived from market parameters, and with the IFT formula when model parameters are calibrated to the market prices of a chosen set of instruments. These two methodologies combine to provide the means to separate the differentiation of the valuation, from the propagation of risks to market parameters. This separation is crucial, because it allows to differentiate valuation efficiently with parallel AAD in isolation, and then propagate the resulting derivative sensitivities to market variables with check-pointing when possible, or IFT otherwise.

Model hierarchies and risk propagation

We mentioned in Chapter 4 that linear markets, IVS, and dynamic models really form a hierarchy of models, where the parameters of a parent model are derived, or calibrated, from the parameters of its child models.

For instance, a one-factor Libor Market Model (LMM, [29]) is parameterized by a set of initial forward rates images and a volatility matrix for these forward rates images.

Neglecting Libor basis in the interest of simplicity, the initial rates images are derived from the continuous discount curve images on the linear rate market (LRM) by the explicit formula:


where images is the duration of the forward rates, typically 3 or 6 months. The LRM constructs the discount curve by interpolation, which is a form of calibration, to a discrete set of par swap rates and other market-traded instruments. The reality of modern LRMs is much more complicated due to basis curves and collateral discounting, see [39], but they are still constructed by fitting parameters to a set of market instruments, hence, by calibration.

The LMM volatility surface images is calibrated to an IVS images, which, in this case, represents swaptions of strike images, maturity images into a swap of lifetime images. This IVS itself typically calibrates to a discrete set of swaption-implied volatilities, where the corresponding forward swap rates are computed from the LRM's discount curve through the formula:


After the model is fully specified and calibrated, it may be applied to compute the value images of a (perhaps exotic) transaction through (parallel) simulations.

Illustration of an entire process for the valuation of a transaction, including calibrations, reminiscent of the DAGs. The nodes do not represent elementary mathematical operations, but successive transformations of parameters culminating into a valuation on the top node.

This figure above illustrates the entire process for the valuation of a transaction, including calibrations. This figure is reminiscent of the DAGs of Chapter 9. The nodes here do not represent elementary mathematical operations, but successive transformations of parameters culminating into a valuation on the top node. Derivative sensitivities are propagated through this DAG, from model risks to market risks, in the exact same way that adjoints are propagated through a calculation DAG: in the reverse order from evaluation. AAD over (parallel) simulations results in derivatives to model parameters. From there, sensitivities to parameters propagate toward the leaves, applying IFT for traversing calibrations and check-pointing for traversing derivations. The risk propagation results in the desired market risks: derivatives to swap rates and swaption-implied volatilities.

Illustration of a reverse order transaction for parallel simulations with sensitivities to parameters propagating toward the leaves, applying IFT for traversing calibrations and check-pointing for traversing derivations.

Although we are not developing the code for the general risk propagation process (this is thousands of lines of code and the potential topic of a dedicated publication), we point out that this algorithm is at the core of a well-designed derivatives risk management system.


