CHAPTER 8

Monte Carlo Simulation

8.1 Introductory Remarks

Very often a reliability goal function cannot be expressed in a convenient analytical form and makes even calculation of reliability decrements practically impossible. For instance, such situations arise when system units are mutually dependent or their reliability simultaneously depends on some common environmental factors (temperature, mechanical impacts, etc.). In these cases, the Monte Carlo simulation is usually used for reliability indices calculation. However, the problem arises: how can one use the Monte Carlo simulation for optimization?

Roughly speaking, the idea is in observing the process of the spare unit expenditure (replacement of failed units) until specified restrictions allow one to do so. This may be a simulation process or an observation of the real deployment of the system. After the stopping moment, we start another realization of simulation process or observation of the real data. When the appropriate statistical data are collected, the process of finding the optimal solution starts.

Avoiding a formal description of the algorithm, let us demonstrate it on numerical examples which will make the idea of the method and its specific technique clearer.

8.2 Formulation of Optimal Redundancy Problems in Statistical Terms

Standard methods do not give a solution if the goal function is the mean time to failure:

(8.1) c8-math-0001

or if units are dependent, for instance, via a vector of some external factors g (temperature, humidity, vibration, etc.):

(8.2) c8-math-0002

where F(g) is the distribution function of some external parameter g and G is its domain.

8.3 Algorithm for Trajectory Generation

For solution of the formulated problems, we need to have data obtained from real experiment (or system deployment) or from Monte Carlo simulation of the system model. Though this procedure is routine, we will briefly describe it for the presentation closeness. The procedure is as follows.

Consider a series system of n units. (For simplicity of explanation of the algorithm, we will assume that the units are independent. However, everything described below can be easily extended to the general case: it will effect only a mechanism of random sequence generation.)

Let us consider the process of spare unit expenditure as the process of changing the system states and the total cost of spare units at sequential replacement moments. After failure each unit is immediately replaced with a spare one. Let c8-math-5001 be the moment of the kth replacement during the jth Monte Carlo experiment. The number of spare units of type i spent at moment c8-math-5002 is denoted c8-math-5003.

An initial state at c8-math-5004 is:

c8-math-5005

The total cost of spare units at the initial moment is C0 = 0. (Sometimes it might be reasonable to consider the initial cost of the system with no spare units as C0, that is, c8-math-5006 where ki is the number of units of type i within equipment before reliability improvement.)

Consider the step-by-step procedure of generating trajectories ψ(s), s = 1, 2, …. We begin with ψ(1) but the corresponding superscript, (1), will be omitted for the sake of convenience.

Step 1. Generate random time to failure (TTF) for each unit, and define bi1 = ξi, that is bi1 is the moment of the earliest failure (and instantaneous replacement) of the ith unit. The current moment (for every unit i) at the beginning of any trajectory ψ(s) is bi0 = 0.
Step 2. Determine the moment of the occurrence of the first event (first replacement) within the first realization ψ(1) as t1 = min1≤inbi1.
Step 3. Assign to the corresponding unit (for which the moment of failure is the earliest one) a specific number i = i1.
Step 4. Put into the spare units counter a new value c8-math-5008.
Step 5. Rename remaining xi0 as follows: xi0 = xi1 for all i ≠ i1.
Step 6. Calculate a new value of the system cost c8-math-5009.
Step 7. Generate the next random TTF for unit i1, c8-math-5010.
Step 8. Calculate the next event occurring due to unit i1: c8-math-5011.
Step 9. Rename the remaining values bi1 = bi2 for all i ≠ i1.

Thus completes the first cycle. GOTO step 2, that is, find t2 = min1≤inbi2, and so on, until stopping the first realization.

The type of problem to be solved determines the stopping rule of each realization.

Stopping rule for the inverse problem of optimal redundancy: The process is stopped at the moment tN when the total cost of spare units exceeds the permitted C0.

Stopping rule for the direct problem of optimal redundancy: The simulation process for each realization stops at the moment tM−1 < t*≤tM where t* equals the required operational time t0 (if the reliability index is the probability of failure free operation) or t* is the required system's mean time to failure (MTTF).

After the termination of generating the first trajectory, (1), we start to generate (2) by the same rules. The number of needed realizations, N, is determined by the required accuracy of statistical estimates.

Thus, each trajectory j represents a set of the following data:

c8-math-5013

where c8-math-5014 is the set of spare units at moment c8-math-5015, that is, c8-math-5016.

After the description of the Monte Carlo simulating process, let us consider the optimization problems themselves. We can make an important remark: previously all problems were formulated in probabilistic terms, but dealing with statistical (empirical) functions has to be specific. The following problems are reformulated in an appropriate way.

8.4 Description of the Idea of the Solution

Assume that we need to supply some system with spare units for a specified period of time. We have no prior knowledge of units' reliability but we have an opportunity to observe a real process (or simulation) of failure occurrence.

Consider the direct problem of optimal redundancy. What shall we do in this case? We observe the process of spare unit expenditure during time t*. This process can be described as a random path—call it trajectory—in a discrete n-dimensional space, X. An illustration of such a process in a two-dimensional case is presented in Figure 8.1.

FIGURE 8.1 Example of a two-dimensional trajectory.

c8-fig-0001

Let us observe N such trajectories, j = 1, 2, … , N, in an n-dimensional space where n is the number of unit types. Each realization is stopped when the total cost of spare units exceeds the permitted amount, that is, each trajectory reaches or even penetrates a hyperplane:

(8.3) c8-math-0003

determined by the restriction on the total system cost (an example for the two-dimensional case is given in Fig. 8.2).

FIGURE 8.2 Example of two-dimensional trajectory inside hyperplane C(x1, x2) = C0.

c8-fig-0002

After this, in the same n-dimensional space, we construct the hypercube xr, r = 1, 2, … , such that each of its vertices is lying under the hyperplane. In Figure 8.2 there are two such hypercubes, though there could be many of them: actually, in this case all pairs (x1, x2) that belong to hyper plane C(x1, x2) = C0 could be vertices of such a hypercube.

Denote the maximum time that trajectory (j) is spending within the hypercube χr by c8-math-5017. Introduce an indicator:

(8.4) c8-math-0004

Among all hypercubes above we choose the hypercube χr that maximizes the frequency of failure-free operation during required interval t0 under the cost restrictions:

(8.5) c8-math-0005

where C0 is the admissible redundant group cost.

Maximization of the system average time to failure is reached by the hypercube c8-math-5018 that corresponds to the solution of the following problem of the conditional optimization:

(8.6) c8-math-0006

Now consider the direct problem of optimal redundancy. In this case, the required time of the system failure-free operation equals t0. We observe the process of spare units expenditure N times until the system failure-free time exceeds t0 and record trajectories (j), j = 1, 2, … , N, in an n-dimensional space. Afterwards we construct such hypercubes χr, r = 1, 2, … , in the same n-dimensional space that includes (covers) R0 × 100% of all trajectories where R0 is the specified level of the reliability index. Among all hypercubes described above, we choose the one that is characterized by the minimum total cost. In other words, the hypercube c8-math-5019 must satisfy the solution of the following problem:

(8.7) c8-math-0007

Now let the specified requirement be given for the system average time to failure, T*. The hypercube c8-math-5020 presenting the solution must be chosen corresponding to the solution of the problem:

(8.8) c8-math-0008

Of course, one should take into account that the operation with the frequency differs from the operation with the probability. The proposed solution is asymptotically accurate. Thus, all these arguments are satisfactory only for a large enough sample size.

8.5 Inverse Optimization Problem

8.5.1 System Successful Operation versus System Cost

We need to find xopt that satisfies Equation (8.5). The algorithm for solution is as follows.

Step 1. Choose a hypercube χ1 whose diagonal vertex is lying on or under the hyperplane in Equation (8.3).
Step 2. Take the first realization, ψ(1), obtained with the help of the Stopping Rule 1. Find moment c8-math-5021 when this trajectory “punctures” the hypercube χ1. This corresponds to the moment c8-math-5022 where

c8-math-5023

and where χi1 is a component of χ1.
Step 3. Assign to c8-math-5024 a value 1 or 0 by using the indicator function in Equation (8.4).
Step 4. Add c8-math-5025 to the value in the counter (initial value equals 0) of successful trajectories.
Repeat the procedure from step 2 for the next realization, ψ(2).
After the analyzing of all trajectories, we calculate the frequency of successful trajectories, c8-math-5026 for the chosen hypercube χ1. Then we find such a hypercube χK that is characterized by the maximum value of c8-math-5027 For this purpose, we can use a random search, steepest descent, or other numerical optimization procedure in the discrete space of the trajectories of spare unit expenditure.

Example 8.1
Consider a series system of n = 3 units. For the sake of simplicity of illustrative calculations and possibility to compare an obtained solution with analytical solution, assume that the system units are independent and ci = c for all i, 1 ≤ i ≤ 3. Let the unit time to failure (TTF) be distributed exponentially with parameters λ1 = 1, λ2 = 0.5, and λ3 = 0.25, respectively. The specified time of failure-free operation is t0 = 1. Admissible total cost of a spare unit is equal to 4.
In the left three columns of Table 8.1, there are random exponentially distributed times to failure, ξi, for all three units. In the next three columns there are corresponding sequences of replacement times θi(k): θi(k) = ξi1 + ξi2 + … +ξik. In other words, θi(k) is a random survival time of the ith redundant group consisting of one main and k-1 spare units.

TABLE 8.1 Random TTF and Replacement Time for 10 Monte Carlo Realizations

c8-tbl-0001_1.jpgc8-tbl-0001_2.jpg
We will use the same random numbers for all of the examples below. This leads to a dependence of the results obtained for different problems, but our main goal is to illustrate the algorithm of the solution with the use of a numerical example, rather than to execute an accurate statistical experiment.


Solution to Example 8.1
(1) Consider Realization 1 from Table 8.1. First, take values ξi of the first row of the left block of columns: ξ11 = 0.07, ξ21 = 0.75, and ξ31 = 0.24. Denote them θ1(1), θ2(1), and θ3(1), respectively, and set them into the first row of the right block of columns (“Replacement time”). Find the minimum value: min {θ1(1), θ2(1), θ3(1)} = θ1(1) = 0.07.
(2) Next take the value ξ12 = 0.53 in the column “TTF; Unit 1.” Form a new value: θ1(2) = θ1(1) + ξ12 = 0.07 + 0.53 = 0.6. Rename θ2(1) = θ2(2) and θ3(1) = θ3(2). Set this value into the second place in the column “Replacement time; Unit 1.”
(3) Find the minimum value: min {θ1(2), θ2(2) , θ3(2)} = θ3(2) = 0.24. Repeat step 2 until the total cost of each system equals 7. (For the case ci = c, it means that all 7 units are spent.) As the result, we spent three units of type 1, no units of type 2, and two units of type 3. In this particular case, the system TTF does not reach the specified time t0 = 1.
(4) Repeat steps 1 to 3 with the remaining realizations from Table 8.1 and complete Table 8.2.
(5) Notice that units marked with “*” in Table 8.2 are auxiliary, that is, in each particular case they are not necessary because t0 has been reached before all permitted resources were spent.
(6) In Table 8.3, list all vectors: c8-math-5028, k = 1, 2, … , 10, that are obtained from Table 8.2 after exclusion of the marked units (see Table 8.3).
Realization #1 is not taken into account because its TTF < 1.
(7) Order each component of these vectors separately (see Table 8.4). In other words, Table 8.4 shows the frequency with which a corresponding number of spare units of each type has been met during 10 realizations of the Monte Carlo simulation. We see that the use of the vector (3, 3, 2) of spare units for this realization will lead to 1 failure in 10 experiments. However, the total system cost equals 8 units. So, the next step is to find the best way of reducing the total system cost.
(8) Put the number of realizations for which TTF has not reached t0 = 1 into the failure counter. In our case there is only one such realization with TTF ≤ 1.
(9) Exclude from Table 8.2 all vectors which correspond to the realizations mentioned in step 7.
(10) Find which unit in Table 8.4 has the smallest number of the use of largest values of c8-math-5029. In our example, three units of type 1 were used in three realizations, three units of type 1 were used once, and two units of type 3 were used four times. (We exclude from consideration Realization 1 since it did not deliver TTF ≥ 1.) In this case we exclude one unit of type 2 because in this case we gain one unit of cost and “loss” only one realization.
(11) Add number of units excluded at step 9 into the counter of system failures.
(12) Check if the system cost equal to or less than C0 = 7. If “No,” correct Table 8.3 by exclusion of the vector 1 and continue the procedure from step 6. If “Yes,” stop the procedure.
(13) Calculate the ratio of realization without failure (the total number of realization minus the number of failures from the counter) to the total number of performed realizations.
In the example considered the final solution is (3, 2, 2).

TABLE 8.2 Initial Experiment with Exclusion of “Extra Units” (Marked with “*”)

Unit 1Unit 2Unit 3
Realization 1
0.070.750.24
0.63.43
0.66
1.19
Realization 2
0.420.134.92
0.581.28
1.03
1.31*
Realization 3
0.623.473.22
4.286.2*7.14*
9.39*
Realization 4
1.455.850.51
2.58*9.15
3.85*
4.29*
Realization 5
0.320.220.54
1.080.372.06
1.87
Realization 6
0.112.035.54
1.132.52*
2.01*
2.41*
Realization 7
1.220.112.69
3.09*1.022.79*
3.13*
Realization 8
0.271.4922.49
0.712.02*
1.45
2.2*
Realization 9
0.461.557.9
1.526.35*
3.42*
3.59*
Realization 10
0.831.080.58
1.232.84*4.33
2.23*

TABLE 8.3 Realization of Units Spent (Corrected for t0 ≥ 1)

c8-tbl-0003.jpg

TABLE 8.4 Ordered Numbers of the Use of Units of Different Types

x1x2x3
111
211
211
211
211
2(1)2
322
322
322
(4)3(2)

Note: Numbers in parentheses correspond to the first realization, which was not taken into account.


As a direct calculation using tables of Poisson distribution shows, this vector delivers the probability 0.804. Of course, such a coincidence with the observed frequency equal to 0.8 in a particular statistical experiment is not a proof of the method. However, multiple results obtained by the proposed method for other examples show a proper closeness to the exact solution, even for a relatively small sample size.

The asymptotical convergence of the solution to the optimal one was proved by Gordienko and Ushakov (1978).

8.5.2 System Average Time to Failure versus System Cost

We need to find c8-math-5030 that satisfies the solution of Equation (8.6). In this case, the algorithm almost completely coincides with the one described above. The only difference is in the absence of step 3. At step 4 we directly place c8-math-5031 in a counter of the survival time. After analyzing all of the trajectories, the estimate of the MTTF for the hypercube χ1 is calculated as

(8.9) c8-math-0009

After this, we perform the analogous calculations for other hypercubes, finding those which are characterized by the maximum estimate of MTTF. The search for the maximum can be performed in the same way as was done previously.


Example 8.2
We will consider the same data as in the example above. The system is again allowed to have at most 7 units in total.
Repeat steps from 1 to 4 of the process described in Section 8.3. In other words, we assume that Table 8.2 is constructed. For the solution of this problem, we will use all the data from Table 8.2. The continuation of the algorithm for this case is as follows.
Consider the vectors of Table 8.2. In this case, the components marked with “*” are included. Those vectors are obtained in the imitation process until 7 units of price are spent. Now extract corresponding values from the right side of Table 8.1 (see Table 8.5). In this table we can see how long each unit was operating.
On the basis of Table 8.5, we compose Table 8.6. In each position of this table we have the total sum of the time spent during all realizations. First of all, for independent and identical units, these values depend on the number of realizations where this unit was observed. (In the general case, where units are different and could be dependent, the number of such realizations might not be a dominant parameter.) These values from the bottom show how much we will lose by excluding a unit.
It is clear that the loss will be less if we leave x1 = 3, x2 = 2, and x3 = 2. By eliminating them, we can decrease the total system cost by up to 7 units of price.
The time to failure for the system in each realization is calculated as the minimal value among those, which are restricted by vector (x1 = 3, x2 = 2, x3 = 2), that is, for the kth realization,

c8-math-5032

These values can be found in the right-hand column of Table 8.6. The results are shown in Table 8.7. These values allow calculating the mean time to failure of the investigated system.

TABLE 8.5 Random Time to Failure for Each Realization until Expenditure of Seven Units

Unit 1Unit 2Unit 3
Realization 1
0.070.750.24
0.533.19
0.06
0.53
Realization 2
0.420.134.92
0.161.15
0.45
0.28
Realization 3
0.623.473.22
3.662.723.92
5.11
3.9
Realization 4
1.455.850.51
1.138.64
1.27
0.45
Realization 5
0.320.220.54
0.750.151.53
1.49
Realization 6
0.112.035.54
1.030.48
0.88
0.39
Realization 7
1.220.112.69
1.870.910.1
2.11
Realization 8
0.271.4922.49
0.440.53
0.74
0.76
Realization 9
0.461.557.9
1.064.8
1.9
0.17
Realization 10
0.831.080.58
0.41.763.76
1

TABLE 8.6 Sum of the Times Spent by Units on the Specified Positions

Unit 1Unit 2Unit 3
5.7716.6848.63
11.0312.521.14
11.413.6*3.9*
2.58*

* Units that are eliminated (x1 = 4, x2 = 3, and x3 = 3).

TABLE 8.7 Time to Failure for 10 Realizations Picked up for Vector (3, 2, 2)

Realization numberTTF
10.66
21.03
36.2
43.85
50.37
62.01
71.02
81.45
93.42
102.23

8.6 Direct Optimization Problem

8.6.1 System Cost versus Successful Operation

We need to find c8-math-5033 which satisfies the solution of Equation (8.5). The algorithm of solution in this case is as follows.

Step 1. Construct a realization of the first trajectory of the spare unit expenditure until c8-math-5034 exceeds the specified value of operational time t0. Memorize the number of spare units spent, c8-math-5035, i = 1, 2, …, n. Continue this procedure until all of N required trajectories are constructed.
Step 2. Construct a hypercube χ1 whose edges χi1 are found as χi1 = max1≤jNxij that is, χi1 is the maximum number of spare units of type i observed during all N realizations. (It means that for this particular sample of realizations, all of them will lay within a hypercube that is stocked with enough spare units so that we would not observe any system failure.)
Step 3. Calculate the system cost for the hypercube χ(1) for which all trajectories have the survival time no less than t0:

c8-math-5037

Step 4. Calculate for each i:

c8-math-5038

where c8-math-5039 shows how many numbers equal to χi1 exist for a unit of type i and c8-math-5040 is the value of the system cost decrease if we reject to use max1≤jNχij and will use the next value in the descending order.
Step 5. Find the type of units that correspond to the maximum value of c8-math-5042 and name it as i1, that is, this number corresponds to the following condition:

c8-math-5043

Step 6. Exclude c8-math-5044 units of type i1 and form a new value:

c8-math-5045

Step 7. Rename remaining numbers:

c8-math-5046

Step 8. Calculate the system successful operation index after the exclusion of νi1 units of type i1:

c8-math-5047

Step 9. Calculate the system spare units' cost after the exclusion of νi1 units of type i1:

c8-math-5048

After these steps we have a new hypercube χ2:

c8-math-5049

Repeat the procedure from step 5 until the system spare units' cost is equal to or smaller than the given restriction.

Example 8.3
Let us take c8-math-5050. In the previous example we found that the vector of spare units (4, 3, 2) satisfies 100% of successful realizations. So, if we take a vector (3, 3, 2) it will satisfy the condition c8-math-5051 Now we need to find the lower 90% confidence limit for the frequency 0.9 obtained in 10 experiments. This limit can be found with the use of the Clopper-Pearson method.
In this particular case it is easier to make direct calculations. If we choose the estimate of the searched probability equal to 0.9 then the probability that we will observe no fewer than 8 successes is equal to:

c8-math-5052

So, in the process of decreasing the number of used spare units, we must stop after the first exclusion, that is, the solution in this case is (3, 3, 2).
Thus, the solutions of direct and inverse problems of optimal redundancy are different, though they should coincide. The variation lies in the difference of approaches: having the restriction on the system cost, we maximize the possible observed frequency; in the latter case we consider minimization of the system cost under the condition that the level of probability is guaranteed. This difference will be smaller if the number of realizations is larger.
We could solve the problem above with an iteration procedure using the solution of the direct problem of optimal redundancy. The use of the “fork method” is convenient in this case. We find the solution, c8-math-5053 for some cost restriction, say, c8-math-5054, and calculate the value c8-math-5055. If R1 < R* we chose c8-math-5056 and continue the procedure; if R1 > R* we chose c8-math-5057 and also continue the procedure. For the next steps, we can use a simple linear approximation:

c8-math-5058

where subscript k stands for the current step, subscript k − 1 for the previous step, and subscript k + 1 for the next step.


8.6.2 System Cost versus Average Time to Failure

We need to find c8-math-5059, which satisfies the solution of Equation (8.6). We could not find a convenient procedure for solving this particular problem. One might consider using an interactive procedure using the sequential solution of the second direct problem considered above. For instance, we can fix some restriction on the system cost, say, c8-math-5060, and find the corresponding optimal solution for c8-math-5061. If this value is smaller than the required c8-math-5062, it means that the system cost must be increased, say, up to some c8-math-5063. If c8-math-5064, one must choose c8-math-5065. This procedure continues until a satisfactory solution is obtained. At an intermediate step L for choosing c8-math-5066, one can use the linear extrapolation method. For example, assume that in first situation described above, the value c8-math-5067 is still less than c8-math-5068. Then the value of c8-math-5069 can be chosen from the following equation:

c8-math-5070

Obviously, one can also use the procedure similar to that in a solution of the direct problem. However, one should somehow find an initial hypercube and construct all trajectories within it. (There is no stopping rule in this case.) Then one should construct a system of embedded hypercubes and again use the steepest descent.

While solving this problem one must remember that the condition c8-math-5071 can be considered only in a probabilistic sense.

Chronological Bibliography

Ushakov, I. A., and Yasenovets, A. V. 1977. “Statistical methods of solving problems of optimal standby.” Soviet Journal of Computer and System Sciences, no. 6.

Ushakov, I. A., and Gordienko, E. I. 1978. “On statistical simulation approach to solution of some optimization problems.” Elektronische Informationsverarbeitung und Kybernetik, no. 3.

Ushakov, I. A., and Gordienko, E. I. 1978. “Solution of some optimization problems by means of statistical simulation.” Electronosche Infdormationsverarbeitung und Kybernetik, no. 11.

Mohan, C., and Shanker, K. 1988. “Reliability optimization of complex systems using random search technique.” Microelectronics and Reliability, no. 28.

Boland, P. J., El-Neweihi, E., and Proschan, F. 1992. “Stochastic order for redundancy allocations in series and parallel systems.” Advances in Applied Probability, no. 1.

Zhao, R., and Liu, B. 2003. “Stochastic programming models for general redundancy-optimization problems.” IEEE Transaction on Reliability, no. 52.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset