2

Solving Inverse Problems by Approximate Bayesian Computation

 

Manuel Chiachío-Ruano,* Juan Chiachío-Ruano and María L. Jalón

University of Granada, Spain.

* Corresponding author: [email protected]

This chapter aims at supplying information about the theoretical basis of Approximate Bayesian Computation (ABC), which is an efficient computational tool to solve inverse problems without the need to formulate, nor evaluate the likelihood function. By ABC, the posterior PDF can be computed in those cases where the likelihood function is intractable, impossible to formulate, or computationally demanding. Several ABC pseudo-codes are included in this chapter and an example of application is provided. Finally, the ABC-SubSim algorithm, which was initially proposed by Chiachío et al. [SIAM Journal of Scientific Computing, Vol. 36, No. 3, pp. A1339–A1358], is explained within the context of an example of application.

 

2.1 Introduction to the ABC method

As explained in Chapter 1, the Bayesian inverse problem aims at updating the a priori information about a set of parameters θΘ ⊂ ℝd for a parameterised model class Mj, based on the available information from the system response contained in the data DD ⊂ ℝ. This is achieved by Bayes’ theorem which yields the posterior PDF p(θ|D, Mj) of the model specified by θ in the model class Mj. However, in engineering practise, there are situations where the Bayesian inverse problem involves a likelihood function that is not completely known or it is computationally unaffordable, perhaps because it requires the evaluation of an intractable multi-dimensional integral [127]. Approximate Bayesian Computation (ABC) was conceived with the aim of evaluating the posterior PDF in those cases where the likelihood function is intractable [182]. In the Bayesian literature, such a method is also called as a likelihood-free computation algorithm, since it circumvents the explicit evaluation of the likelihood function by using a stochastic simulation approach. In this section, the method ABC is briefly described.

Let x ∈ ℝ denote a simulated outcome from p(x|θ,Mj), the stochastic forward model for model class Mj parameterised by θ, formerly explained in Chapter 1, Equation (1.19). ABC aims at evaluating the posterior p(θ|D,Mj) ∝ p(D|θ,Mj)p(θ|Mj) by applying Bayes’ theorem to the pair (θ, x) ∈ Θ × D ⊂ ℝd +:

p(θ,x|D)p(D|x,θ)p(x|θ)p(θ) (2.1)

In the last equation, the conditioning on model class Mj has been omitted for simplicity, given that the ABC theory is valid irrespective of a chosen model class. The basic form of the ABC algorithm to generate samples from the posterior given by Equation (2.1), is a rejection algorithm that consists in jointly generating θ′ ~ p(θ) and x′ ~ p(x|θ), and accepting them conditional on the equality x = D being fulfilled. This is due to the fact that the PDF p(D| x, θ) in Equation (2.1) gives higher density values for the posterior in those regions where x is close to D. Of course, obtaining the sample x = D is unlikely in most of the cases, and it is only feasible if D consists of a finite set of values rather than a continuous region in ℝ. To address the above difficulty, two main approximations have been conceived in ABC theory [128]: a) replace the equality x = D by the approximation x ≈ D and introduce a tolerance parameter ϵ that accounts for the quality of such approximation through a suitable metric ρ; and b) introduce a low-dimensional vector of summary statistics η(·) that allows a weak comparison of the closeness between x and D. Through this approach, the posterior p(θ, x|D) in Equation (2.1) is approximated by pϵ(θ, x|D), which assigns higher probability density to those values of (θ, x) ∈ Θ × D that satisfy the condition ρ(η(x), η(D) ≤ ϵ.

The standard version of the ABC algorithm defines an approximate likelihood function given by pϵ(D|θ, x) ≜ P(xBϵ(D)|θ, x) [46], where Bϵ(D) is a region of the data space D defined as

Bϵ(D)={xD:ρ(η(x),η(D))ϵ} (2.2)

In the expression of the approximate likelihood function and also in what follows, p(·) is adopted to denote probability whereas p(·) denotes a PDF. Thus, from Bayes’ theorem, the approximate posterior pϵ(θ, x|D) can be obtained as

pϵ(θ,x|D)P(xΒϵ(D)|x,θ)p(x|θ)p(θ) (2.3)

The approximate likelihood can be formulated as P(xBϵ(D)|x, θ) = IBϵ(D) (x), with IBϵ(D) (x) being an indicator function for the set Bϵ(D) that assigns the unity when ρ(η(x), η(D)) ≤ ϵ, and 0 otherwise. It follows that the approximate posterior pϵ(θ, x|D) can be readily computed as

pϵ(θ,x|D)p(x|θ)p(x|θ)IIBϵ(D)(x) (2.4)

Since the ultimate interest of the Bayesian inverse problem is typically the posterior of model parameters pϵ(θ|D), it can be obtained by marginalising the approximate posterior PDF in Equation (2.4)

pϵ(θ|D)p(θ)Dp(x|θ)IIBϵ(D)(x)dx=P(xBϵ(D)|θ)p(θ) (2.5)

Note that this integration need not be done explicitly since samples from this marginal PDF are obtained by taking the θ-component of the samples from the joint PDF in Equation (2.4) [151]. A pseudo-code implementation of ABC algorithm is given below as Algorithm 3.

Example 4 Let us consider a 2 [m] length column with 0.4 [m] square cross section, which is loaded with F = 1 [kN] at the top, as illustrated in Figure 2.1. Let also consider that, for some reason, the column is made of a degrading material so that its Young’s modulus decreases at an unknown constant rate ξ from an initial value E0 = 40 [MPa], such that

En=eξEn1+νn (2.6)

where En is the Young’s modulus at time or instant n ∈ ℕ expressed in weeks, and vn is an unknown model error term, which is assumed to be distributed as a zero-mean Gaussian with uncertain standard deviation, that is, vn ~ N (0, σ). Next, a sensor is assumed to be placed at the top of the column to register deflections, and the following measurement equation can be considered:

δn=f(En)+wn (2.7)

where f : ℝ≥0 → ℝ≥0 is a mathematical function that provides the deflection of the column as a function of En. Assuming the linear elasticity theory, this function can be expressed as f = FL33EnI, where I is the inertia momentum of the cross section. In Equation (2.7), the term wn is the measurement error, which is assumed to be negligible as compared to the model error term, so that it is subsumed into the error term vn. In this example, the degradation rate and the standard deviation of the error term are selected as the uncertain model parameters, so that θ = {θ1, θ2} = {ξ, σ}, whose prior information can be represented by the uniform piece-wise PDFs p(θ1) = U [0.0001, 0.02], and p(θ2) = U[0.01, 2], respectively. The data in this example are given as a recorded time-history of deflections over a period of time T = 200 weeks, that is, D = {δn,meas}n=0200. These data are synthetically generated from Equations (2.6) and (2.7) considering θtrue = (0.005, 0.1), shown in Figure (2.2), panels (a) and (b). The ABC-rejection algorithm is adopted with K = 20, 000 samples to obtain the approximate posterior of θ based on the referred data. The results are shown in Figure 2.2c.

Figure 2.1: Scheme of structure used for Example 4.

Figure 2.2: Output of the ABC rejection algorithm in application to Example 4. In panels (a) and (b), the history plot of measured deflections and stiffness values based on θtrue, are shown. Circles represents values in the θ-space.

In Figure 2.2, the grey circles represent samples from the prior, whilst the darker-colored circles represent samples from θ distributed according to the approximate posterior PDF pϵ(θ|D) (recall Eq. (2.5)), where ϵ = 1100, using a ℓ1 distance as metric, that is ρ=n=0200En,trueEn. A total amount of 5, 000 samples lie within the approximate posterior samples, which populate a region close to the θtrue values. Observe that, in average, the approximate posterior gives us a region of plausible values of θθtrue, which in view of Figure 2.2 is relatively large. Hence, better results would be expected if smaller ϵ values are considered, as will be shown further below.

Note that the quality of the posterior approximation in ABC depends on a suitable selection of the metric ρ, the tolerance parameter ϵ and, of special importance, the summary statistic η(·) [65]. The choice of the tolerance parameter ϵ is basically a matter of the amount of computational effort that the modeller is willing to expend. Indeed, for ϵ sufficiently small η(x) → η(D), and thus all accepted samples simulated according to Equation (2.5) come from the closest approximation to the posterior density p(θ|D), where the exactness is achieved when η(·) is a sufficient statistic. This desirable fact is at the expense of a high computational effort to get η(x) = η(D) under the stochastic forward model p(x|θ,Mj), usually prohibitive. On the contrary, as ϵ → ∞, all accepted simulations x come from the prior density. Hence, the choice of the tolerance parameter ϵ reflects a trade-off between computational cost and accuracy of the ABC posterior approximation.

For that reason, several computational improvements have been proposed in the literature addressing this trade-off. Among them, combining ABC with MCMC methods (commonly called the ABC-MCMC method) has demonstrated to be efficient [128] in those cases where the probability content of the posterior is concentrated over a small region in relation to a diffuse prior. In fact, in ABC-MCMC, a proposal PDF q(θ′ |θ(ζ)) is used to sample a new parameter θ based on a previous accepted one θ(ζ), targeting the stationary distribution pϵ(θ|D). A pseudo-code is provided below as Algorithm 4.

It should be noted that when the likelihood function is approximated as pϵ(D| x, θ) = IBϵ(D) (x), as in our case, the acceptance probability α (Step 3 in Algorithm 4) can be rewritten as the product of a MCMC acceptance ratio and the indicator function as follows

α=min{1,p(θ)q(θn1)|θp(θ(n1))q(θ|θ(n1))}IIBϵ(D)(x) (2.8)

where Equation (2.8) clearly shows that the dependence upon ϵ in the indicator function may lead to an inefficient algorithm for a good approximation of the true posterior. In fact, given that α can only be non-zero if IBϵ(D) (x) = 1 (i.e. ρ(η(x), η(D)) ≤ ϵ), then the Markov chain may persist in distributional tails for long periods of time if ϵ is sufficiently small, due to the acceptance probability being zero in Step 3 of Algorithm 4. Again, despite some attempts to effectively overcome this limitation [28, 169], the desirable fact ϵ → 0 in ABC again implies heavy computation.

For that reason, a branch of computational techniques have emerged in the literature to obtain high accuracy (ϵ → 0) with a feasible computational cost by combining sequential sampling algorithms adapted for ABC [55]. These techniques share a common principle of achieving computational efficiency by learning about intermediate target distributions determined by a decreasing sequence of tolerance levels ϵ1 > ϵ2 > ;… > ϵm = ϵ, where the last is the desired tolerance ϵ. The reader is referred to [46] for an overview of the main contributions to the literature on this topic. Within the available methods, the so-called ABC-SubSim algorithm published by Chiachío et al. in [46] has demonstrated efficiency as compared with the available ABC algorithms in the literature [188]. The methodological bases of ABC-SubSim algorithm are provided below.

 

2.2 Basis of ABC using Subset Simulation

2.2.1 Introduction to Subset Simulation

Subset Simulation (SS) [14] is an efficient method for simulation of rare-events in high-dimensional spaces, which makes this method applicable to a broad range of areas of science and engineering [17, 47]. By Subset Simulation, conditional samples can be efficiently generated corresponding to specified levels of a performance function g : Z ⊂ ℝd → ℝ in a progressive manner, converting a problem involving rare-event simulation into a sequence of problems involving higher probability events [14]. This method was originally proposed to compute small failure probabilities encountered in reliability analysis of engineering systems (e.g. [16, 47]). For illustration purposes, the Subset Simulation method is presented in this section using its primary aim of small failure probabilities estimation. In the next section, it will be specialised for ABC simulation.

Let us consider that F is a failure region in a z-space, corresponding to the exceedance of the performance function g above some specified threshold value b

F={zΖ:g(z)>b} (2.9)

Let us now assume that F can be defined as the intersection of m regions F = ∩j=1m Fj, such that they are arranged as a nested sequence F1F2 … ⊃ Fm−1Fm = F, where Fj = {zZ : g(z) > bj}, with bj+1 > bj, such that p(zj|Fj) ∝ p(z)IFj (z), j = 1,…, m. The term p(z) is to denote the probability model for z. When the event Fj holds, then {Fj−1,…,F1} also hold, henceforth P(Fj|Fj−1,…,F1) = P(Fj|Fj−1), so that

P(F)=P(j=1mFj)=P(F1)=j=1mP(Fj|Fj1) (2.10)

where, for simpler notation, we use P(F) ≜ P(zF) and P(Fj|Fj−1) ≜ P(zFj | zFj−1), with P(zFj|zFj−1) being the conditional failure probability at the (j−1)th conditional level. Notice that although the probability P(F) can be relatively small, by choosing the intermediate regions appropriately, the conditional probabilities involved in Equation (2.10) can be made large, thus avoiding the simulation of rare events.

In the last equation, apart from P(F1) (which can be readily estimated by the standard Monte Carlo method (MC)), the remaining factors cannot be efficiently estimated because of the conditional sampling involved. However, MCMC methods can be used for sampling from the PDF p(zj−1|Fj−1) when j ≥ 2, although it is at the expense of generating K-dependent samples, giving

P(Fj|Fj1)P¯j=1Kk=1KIIFj(zj1(k)),zj1(k)p(zj1|Fj1) (2.11)

where IFj(zj−1(k)) is the indicator function for the region Fj, j = 1,…, m, that assigns a value 1 when g(zj−1(k)) > bj, and 0 otherwise. Observe that it is possible to obtain Markov chain samples that are generated at the (j −1)th level which lie in Fj, so that they are distributed as p(·|Fj). Hence these samples provide “seeds” for generating more samples according to p(zj|Fj) by using MCMC sampling. Because of the way the seeds are chosen, Subset Simulation exhibits the benefits of perfect sampling [207, 151] which is an important feature to avoid wasting samples during a burn-in period, in contrast to other MCMC algorithms such as the MH algorithm.

As explained below, Fj is actually chosen adaptively based on the samples {zj−1(k), k = 1,…, K} from p(z|Fj−1) in such a way that there are exactly NP0 of these samples in Fj (so j = P0 in Equation (2.11). Next, MCMC is applied to obtain (1/P0 − 1) samples from p(zj|Fj) starting at each seed, so that a total of N samples are obtained in Fj. Repeating this process, one can compute the conditional probabilities of the higher-conditional levels until the final region of interest Fm = F has been reached. Note that the intermediate threshold value bj defining Fj is obtained in an adaptive manner as the [(1 − P0)K]th largest value of the sample values g(zj−1(k)), k = 1,…, K, in such a way that the sample estimate of P(Fj|Fj−1) in Equation (2.11) is equal to P0. See Figure 2.3 for an illustrative example of the Subset Simulation method.

In Figure 2.3, panels (a) & (b) represent the initial set of samples of dimension two simulated according to any specified model, and the selection of seeds whereby F1 is defined, respectively. In panel (c), new samples simulated according to p(z|F1) are represented along with the seeds for defining the subsequent intermediate failure region, which is represented using samples in panel (d). Panel (e) represents the final step where the whole set of samples arranged through subsets is showed along with the sequence of intermediate threshold levels until the final one (solid line) is reached.

Figure 2.3: Conceptual example of Subset Simulation.

It should be noted that the choice of an adequate P0 has a significant impact on the efficiency of Subset Simulation. Indeed, a small value for the conditional probability (P0 → 0) ensures that the distance between consecutive intermediate levels bjbj−1 becomes too large, which leads to a rare-event simulation problem. If, on the contrary, the intermediate threshold values were chosen too close (P0 → 1), the algorithm would require more simulation levels m (and hence, large computational effort) to progress towards F. In the literature about Subset Simulation, several works have reported about the scaling strategies for P0. The value P0 = 0.1 was recommended in the original presentation of Subset Simulation [14], and more recently in [207], the range 0.1 ≤ P0 ≤ 0.3 was proposed as near optimal range of values after a rigorous sensitivity study of Subset Simulation.

As stated before, to draw samples from the conditional PDF p(zj|Fj), MCMC methods like Metropolis-Hastings [131] are adequate. In the original version of Subset Simulation [14], the Modified Metropolis Algorithm (MMA) was proposed and it was shown to work well for very high dimensions (e.g. 103 - 104). In MMA, a univariate proposal PDF is chosen for each component of the parameter vector and each candidate component is accepted or rejected separately, instead of drawing a full parameter vector from a multi-dimensional PDF as in the original algorithm. Later in [16], grouping of the parameters was considered when constructing a proposal PDF to allow for the case where small groups of components in the parameter vector are highly correlated. An appropriate choice for the proposal PDF for ABC-SubSim is introduced in the next section. Further insights for Subset Simulation using MMA as a conditional sampler are provided by Zuev et al. [207] where an exhaustive sensitivity study of MMA is presented along with enhancements for the Subset Simulation method.

2.2.2 Subset Simulation for ABC

In this section, the Subset Simulation method is described as a specialised efficient sampler for ABC. To this end, let us define the joint state-parameter vector z = (x, θ) ∈ Z ⊂ ℝ+d, where x are simulated outcomes from the model class parameterised by θ, so that p(z) = p(x|θ)p(θ). Let also Fj in Section 2.2.1 be replaced by a nested sequence of regions Zj, j = 1…, m, in Z defined by

Zj={(θ,x):ρ(η(x),η(D))ϵj} (2.12)

arranged so that Z1, Z2,…, ⊃ ZjZj +1,…,⊃ Zm = Z, and ρ a metric on the set {η(x) : xD}. In ABC based on Subet Simulation, the sequence of tolerances ϵ1,…, ϵj,…, ϵm, with ϵj +1 < ϵj, is chosen adaptively where the number of levels m is chosen so that ϵmϵ, a specified tolerance.

As stated by Equation (2.4), the sequence of intermediate posteriors p(θ, x|Zj), j = 1,…, m can be obtained by Bayes’ theorem as

p(θ,x|Zj)=P(Zj|θ,x)p(x|θ)p(θ)P(Zj)IIZj(θ,x)p(x|θ)p(θ) (2.13)

where IZj (θ, x) is the indicator function for the set Zj. Observe that when ϵ → 0, Zm represents a small closed region in Z and hence the probability P(Zm) is very small under the model p(θ, x) = p(x|θ)p(θ). In this situation, using MCMC sampling directly is not efficient for the reason described above. Thus, this is the point at which the efficiency of Subset Simulation is exploited for ABC, given that such a small probability P(Zm) is converted into a sequence of larger conditional probabilities, as stated in Equation (2.10).

 

2.3 The ABC-SubSim algorithm

The ABC-SubSim algorithm was conceived by combining the ABC principles with the technique of Subset Simulation for efficient rare-event simulation, first developed by Au and Beck [14]. Algorithm 5 provides a pseudo-code implementation of ABC-SubSim that is intended to be sufficient for most applications. The algorithm is implemented such that a maximum allowable number of simulation levels (m) is considered in case the specified ϵ is too small. Recent improvements for adaptive selecting ϵ have appeared in the literature [186, 187]

One of the key aspects of ABC-SubSim is the selection of the ϵj values. Indeed, these values are adaptively chosen as in Subset Simulation [14], so that the sample estimate j of P(Zj|Zj−1) satisfies j = P0 [46]. By this way, the intermediate tolerance value ϵj can be simply obtained as the 100P0 percentile of the set of distances given by ρ(η(xj−1(k), η(D)), k = 1,…, K, arranged in increasing order. Moreover, observe in Algorithm 5 that P0 is chosen such that KP0 and 1/P0 are integers, and so the size of the subset of samples generated in Zj−1 that lie in Zj is known in advance and equal to KP0, which provides a significant advantage for the implementation. These KP0 samples in Zj are used as seeds for KP0 Markov chains of length 1P0, where the new (1P01) samples in Zj in each chain can be readily generated by MMA [14]. Hence the total number of samples of (θ, x) lying in Zj is K, but KP0 of them were generated at the (j −1)th level. Because of the way the seeds are chosen, ABC-SubSim exhibits the benefits of perfect sampling [207, 151], which is an important feature to avoid wasting samples during a burn-in period, in contrast to ABC-MCMC.

Finally, it is important to remark that the important control parameters to be chosen in ABC-SubSim are P0 and σ(j)q, the standard deviation in the Gaussian proposal PDF in MMA at the jth level. For P0, the recommendation is to choose it within the interval [0.1, 0.3], and more specifically P0 = 0.2 [46]. In regards to the optimal variance of the local proposal PDF for the MMA sampler, the rule is to adaptively choose the standard deviation σ(j)q of the jth intermediate level so that the monitored acceptance rate lies within the interval [0.2, 0.4] based on an initial chain sample of small length (e.g. 10 states). The reader is referred to [186] for a detailed study about adaptive scaling of the ABC-SubSim algorithm, and also to [207], where the optimal scaling is addressed for the general Subset Simulation method.

Example 5 Let us consider again the example of the column given in Example 4. In this case, we want to obtain samples from the posterior θ ~ pϵ(θ|D), where ϵ is much smaller than that obtained in Example 4. The ABC-SubSim algorithm is used with K = 5, 000, P0 = 0.2, and m = 4 simulation levels, so that the final subset produces K = 5, 000 samples distributed according to pϵm (θ|D). The results are shown in Figure 2.4. In Figure 2.4c, the intermediate posterior samples are superimposed in increasing gray tones to reveal the uncertainty reduction. The final subset is depicted using darker circles. ABC-SubSim uses 17,000 simulations to generate 5,000 samples representing a final posterior that is better aligned with the values of θtrue. Moreover, to demonstrate the efficacy of the algorithm, the model is evaluated by picking a sample θ̂ABC-SS from pϵ(θ|D). Recall that in both cases, the final posterior has the same amount of samples, namely, 5,000 samples. The results for stiffness and deflections are presented superimposed in Figures 2.4a and 2.4b for better interpretation. Note that the results are very satisfactory in the sense that ABC-SubSim can reconstruct the true structural behaviour with high precision in comparison with the ABC-rejection algorithm with the same computational cost. In fact, in both cases, the computational burden of the algorithms are almost equivalent (indeed, ABC-SubSim uses 17,000 samples to produce a final posterior of 5,000 samples, whilst ABC-rejection needs 20,000), while the accuracy obtained by ABCSubSim is, by far, superior to ABC-rejection, which corroborates the findings originally observed in [46].

Figure 2.4: Output of the ABC-SubSim algorithm in application to Example 5. In panels (a) and (b), the history plot of synthetic deflections and stiffness values based on θtrue, are shown together with the simulated plots of stiffness and deflections using the model. Grey circles represents values in the θ-space.

 

2.4 Summary

This chapter has shown that ABC methods are efficient for giving a solution to inverse problems where the likelihood function is unknown or directly computationally intractable. The resulting posterior PDF is obtained for a given tolerance value which provides a measure of how close the simulations are obtained using the posterior samples with respect to the data. The original ABC (rejection) algorithm is shown to be simple to implement but computationally inefficient overall when small tolerance values are adopted. Since the publication of the first ABC methodology, new ABC variants have been proposed to tackle the computational costs when adopting small tolerances, with the ABC-SubSim being a highly efficient one, based on the study published in [46], and further corroborated in works like [186, 188]. The ABC-SubSim algorithm has been explained in this chapter within the context of a structural engineering example based on a cantilever column under a constant lateral load, whose material is subject to a degradation process. The algorithm has been able to infer the damage model parameters with high fidelity with a reasonable computational cost irrespective of the big uncertainty considered. Moreover, the ABC-SubSim algorithm has been compared with the former ABC-rejection algorithm under similar computational cost, showing that ABC-SubSim can obtain a much finer posterior inference. Further research should be focused on investigating ways to adaptively scale the ABC-SubSim meta parameters, which have a strong influence on the posterior quality and the computational cost. Also, another fruitful research would be to explore the use of ABC-SubSim as a surrogate model for the evaluation of complex degradation engineering processes where multidimensional model parameters are involved.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset