10. Risk

As demonstrated in Chapter 9, every project always has several design options that offer different combinations of time and cost. Some of these options will likely be more aggressive or riskier than other options. In essence, each project design option is a point in a three-dimensional space whose axes are time, cost, and risk. Decision makers should be able to take the risk into account when choosing a project design option—in fact, they must be able to do so. When you design a project, you must be able to quantify the risk of the options.

Most people recognize the risk axis but tend to ignore it since they cannot measure or quantify it. This invariably leads to poor results caused by applying a two-dimensional model (time and cost) to a three-dimensional problem (time, cost, and risk). This chapter explores how to measure risk objectively and easily using a few modeling techniques. You will see how risk interacts with time and cost, how to reduce the risk of the project, and how to find the optimal design point for the project.

Choosing Options

The ultimate objective of risk modeling is to weigh project design options in light of risk as well as time and cost so as to evaluate the feasibility of these options. In general, risk is the best criterion for choosing between options.

For example, consider two options for the same project: The first option calls for 12 months and 6 developers, and the second option calls for 18 months and 4 developers. If this is all that you know about the two options, most people will choose the first option since both options end up costing the same (6 man-years) and the first option delivers much faster (provided you have the cash flow to afford it). Now suppose you know the first option has only a 15% chance of success and the second option has a 70% chance of success. Which option would you choose? As an even more extreme example, suppose the second option calls for 24 months and 6 developers with the same 70% chance of success. Although the second option now costs twice as much and takes twice as long, most people will intuitively choose that option. This is a simple demonstration that often people choose an option based on risk, rather than based on time and cost.

Prospect Theory

In 1979, the psychologists Daniel Kahneman and Amos Tversky developed prospect theory,a one of the most important concepts in behavioral psychology on decision making. Kahneman and Tversky discovered that people make decisions based on the risk involved as opposed to the expected gain. Given a measurable identical loss or gain, most people disproportionally suffer more for the loss than they would enjoy for the same gain. As a result people seek to reduce the risk as opposed to maximize gains, even when it would logically be better to take the risk. This observation went against conventional wisdom that held that people act rationally to maximize their gains based on expected value. Prospect theory underscores the importance of adding risk to time and cost in the decision-making process. In 2002, Daniel Kahneman won the Nobel Memorial Prize in Economics for his work developing prospect theory.

a. Daniel Kahneman and Amos Tversky, “Prospect Theory: An Analysis of Decision under Risk,” Econometrica, 47, no. 2 (March 1979): 263–292.

Time–Risk Curve

Just as the project has a time–cost curve, it also has a time–risk curve. The ideal curve is shown in Figure 10-1 by the dashed line.

Figure 10-1 Ideal time–risk curves

As you compress the project, the shorter project design solutions carry with them an increased level of risk, and the rate of increase is likely nonlinear. This is why the dashed line in Figure 10-1 curves up toward the vertical risk axis and relaxes downward with time. However, this intuitive dashed line is wrong. In reality, a time–risk curve is a logistic function of some kind, the solid line in Figure 10-1.

The logistic function is a superior model because it more closely captures the general behavior of risk in complex systems. For example, if I were to plot the risk of me burning dinner tonight due to compressing the normal preparation time, the risk curve would look like the solid line in Figure 10-1. Each compression technique—such as setting the oven temperature too high, placing the tray too close to the heating element, choosing easier-to-cook but more flammable food, not preheating the oven, and so on—increases the risk of burning dinner. As shown by the solid line, the risk of a burnt dinner due to the cumulative compression at some point is almost maximized and even flattens out, because dinner is certain to burn. Similarly, if I decide not to even enter the kitchen, then the risk would drop precipitously. If the risk was dictated by the dashed line, I would always have some chance of not burning dinner since I could always keep increasing the risk by compressing it further.

Note that the logistic function has a tipping point where the risk drastically increases (the analog to the decision to enter the kitchen). The dashed line, by contrast, keeps increasing gradually and does not have a noticeable tipping point.

Actual Time–Risk Curve

It turns out that even the logistic function in Figure 10-1 is still an idealized time–risk curve. The actual time–risk curve is more like that shown in Figure 10-2. The reason for the shape of this curve is best explained by overlaying it with the project’s direct cost curve. Since the project behavior is three-dimensional, Figure 10-2 relies on a secondary y-axis for the risk.

Figure 10-2 Actual time–cost–risk curve

The vertical dashed line in Figure 10-2 indicates the duration of the normal solution as well as the minimum direct cost solution for the project. Note that the normal solution usually trades some amount of float to reduce staffing. The reduction in float manifests in an elevated level of risk.

To the left of the normal solution are the shorter, compressed solutions. The compressed solutions are also riskier, so the risk curve increases to the left of the normal solution. The risk rises and then levels off (as is the case with the ideal logistic function). However, unlike the ideal behavior, the actual risk curve gets maximized before the point of minimum duration and even drops a bit, giving it a concave shape. While such behavior is counterintuitive, it occurs because in general, shorter projects are somewhat safer, a phenomenon I call the da Vinci effect. When investigating the tensile strength of wires, Leonardo da Vinci found that shorter wires are stronger than longer wires (it is because the probability of a defect is proportional to the length of the wire).1 In analogy, the same is true for projects. To illustrate the point, consider two possible ways of delivering a 10-man-year project: 1 person for 10 years or 3650 people for 1 day. Assuming both are viable projects (that the people are available, that you have the time, and so on), the 1-day project is much safer than the 10-year project. The likelihood of something bad happening in a single day is open for debate, but it is a near certainty with 10 years. I provide a more quantified explanation for this behavior later in this chapter.

1. William B. Parsons, Engineers and Engineering in the Renaissance (Cambridge, MA: MIT Press, 1939); Jay R. Lund and Joseph P. Byrne, Leonardo da Vinci’s Tensile Strength Tests: Implications for the Discovery of Engineering Mechanics (Department of Civil and Environmental Engineering, University of California, Davis, July 2000).

To the right of the normal solution, the risk goes down, at least initially. For example, giving an extra week to a one-year project will reduce the risk of not meeting that commitment. However, if you keep giving the project more time, at some point Parkinson’s law will take effect and drastically increase the risk. So, to the right of the normal solution, the risk curve goes down, becomes minimized at some value greater than zero, and then starts climbing again, giving it a convex shape.

Risk Modeling

This chapter presents my techniques for modeling and quantifying risk. These models complement each other in how they measure the risk. You often need more than one model to help you choose between options—no model is ever perfect. However, each of the risk models should yield comparable results.

Risk values are always relative. For example, jumping off a fast-moving train is risky. However, if that train is about to go over a cliff, jumping is the most sensible thing to do. Risk has no absolute value, so you can evaluate it only in comparison with other alternatives. You should therefore talk about a “riskier” project as opposed to a “risky” project. Similarly, nothing is really safe. The only safe way of doing any project is not doing it. You should therefore talk about a “safer” project rather than a “safe” project.

Normalizing Risk

The whole point of evaluating risk is to be able to compare options and projects, which requires comparing numbers. The first decision I made when creating the models was to normalize risk to the numerical range of 0 to 1.

A risk value of 0 does not mean that the project is risk-free. A risk value of 0 means that you have minimized the risk of the project. Similarly, a risk value of 1 does not mean that the project is guaranteed to fail, but simply that you have maximized the risk of the project.

The risk value also does not indicate a probability of success. With probability, a value of 1 means a certainty, and a value of 0 means an impossibility. A project with a risk value of 1 can still deliver, and a project with a risk value of 0 can still fail.

Risk and Floats

The floats of the various activities in the network provide an objective way of measuring the risk of the project, and the previous chapters have referred to floats when discussing risk. Two different project design options will differ in their floats and, therefore, may drastically differ in their risk as well. As an example, consider the two project design options shown in Figure 10-3.

Figure 10-3 Two project options

Both of these options are valid project design options for building the same system. The only information available in Figure 10-3 is the color-coded floats of the two networks. Now, ask yourself: With which project would you rather be involved? Everyone to whom I have shown these two charts preferred the greener option on the right-hand side of Figure 10-3. What is interesting is that no one has ever asked what the difference in duration and cost between these two options was. Even when I volunteered that the greener option was both 30% longer and more expensive, that information did not affect the preference. No one chose the low-float, high-stress, and high-risk project shown on the left in Figure 10-3.

Design Risk

Your project faces multiple types of risk. There is staffing risk (Will the project actually gets the level of staffing it requires?). There is duration risk (Will the project be allowed the duration it requires?). There is technological risk (Will the technology be able to deliver?). There are human factors (Is the team technically competent and can they work together?). There is always an execution risk (Can the project manager execute correctly the project plan?).

These types of risk are independent of the kind of risk you assess using floats. Any project design solution always assumes that the organization or the team will have what it takes to deliver on the planned schedule and cost and that the project will receive the required time and resources. The remaining type of risk pertains to how well the project will handle the unforeseen. I call this kind of risk design risk.

Design risk assesses the project’s sensitivity to schedule slips of activities and to your ability to meet your commitments. Design risk therefore quantifies the fragility of the project or the degree to which the project resembles a house of cards. Using floats to measure risk is actually quantifying that design risk.

Risk and Direct Cost

The project risk measurements usually correlate to the direct cost and duration of the various solutions. In most projects, the indirect cost is independent of the project risk. The indirect cost keeps mounting with the duration of the project even if the risk is very low. Therefore, this chapter refers to only direct cost.

Criticality Risk

The criticality risk model attempts to quantify the intuitive impression of risk when you evaluate the options of Figure 10-3. For this risk model you classify activities in the project into four risk categories, from most to least risk:

  • Critical activities. The critical activities are obviously the riskiest activities because any delay with a critical activity always causes schedule and cost overruns.

  • High risk activities. Low float, near-critical activities are also risky because any delay in them is likely to cause schedule and cost overruns.

  • Medium risk activities. Activities with a medium level of float have medium level of risk and can sustain some delays.

  • Low risk activities. Activities with high floats are the least risky and can sustain even large delays without derailing the project.

You should exclude activities of zero duration (such as milestones and dummies) from this analysis because they add nothing to the risk of the project. Moreover, unlike real activities, they are simply artifacts of the project network.

Chapter 8 showed how to use color coding to classify activities based on their float. You can use the same technique for evaluating the sensitivity or fragility of activities by color coding the four risk categories. With the color coding in place, assign a weight to the criticality of each activity. The weight acts as a risk factor. You are, of course, at liberty to choose any weights that signify the difference in risk. One possible allocation of weights is shown in Table 10-1.

Table 10-1 Criticality risk weights

Activity Color

Weight

Black (critical)

4

Red (high risk)

3

Yellow (medium risk)

2

Green (low risk)

1

The criticality risk formula is:

Risk=WCN*C+WRN*R+WYN*Y+WGN*GWCN*

where:

  • WC is the weight of the black, critical activities.

  • WR is the weight of red, low-float activities.

  • WY is the weight of yellow, medium-float activities.

  • WG is the weight of green, high-float activities.

  • NC is the number of the black, critical activities.

  • NR is the number of red, low-float activities.

  • NY is the number of yellow, medium-float activities.

  • NG is the number of green, high-float activities.

  • N is the number of activities in the project (N = NC + NR + NY + NG).

Substituting the weights from Table 10-1, the criticality risk formula is:

Risk=N4C+3N*R+2N*Y+1N*G4N*

Applying the criticality risk formula to the network in Figure 10-4 yields:

Figure 10-4 Sample network for risk calculation

Risk=4*6+3*4+2*2+1*44*16=0.69

Criticality Risk Values

The maximum value of the criticality risk is 1.0; it occurs when all activities in the network are critical. In such a network, NR, NY, and NG are zero, and NC equals N:

Risk=WC*N+WR*0+WY*0+WG*0WC*N=WCWC=1.0

The minimum value of the criticality risk is WG over WC; it occurs when all activities in the network are green. In such a network, NC, NR, and NY are zero, and NG equals N:

Risk=WC*0+WR*0+WY*0+WG*NWC*N=WGWC

Using the weights from Table 10-1, the minimum value of risk is 0.25. The criticality risk, therefore, can never be zero: A weighted average such as this will always have a minimum value greater than zero as long as the weights themselves are greater than zero. This is not necessarily a bad thing, as the project risk should never be zero. The formula implies the lowest range of risk values is too low to achieve, which is reasonable since anything worth doing requires risk.

Choosing Weights

As long as you can rationalize your choice of weights, the criticality risk model will likely work. For example, the set of weights [21, 22, 23, 24] is a poor choice because 21 is only 14% smaller than 24; thus, this set does not emphasize the risk of the green versus the critical activities. Furthermore, the minimum risk using these weights (Wg /Wc) is 0.88, which is obviously too high. I find the weights set [1, 2, 3, 4] to be as good as any other sensible choice.

Customizing Criticality Risk

The criticality risk model often requires some customization and judgment calls. First, as mentioned in Chapter 8, the ranges of the various colors (the criteria for red, yellow, and green activities) must be appropriate for the duration of your project. Second, you should consider defining very-low-float or near-critical activities (such as those with 1 day of float) as critical because these basically have the same risk as critical activities. Third, even if some activities’ floats are not near-critical, you should examine the chain on which the activities reside and adjust it accordingly. For example, if you have a year-long chain of many activities and the chain has only 10 days of float, you should classify each activity on the chain as a critical activity for risk calculation. A slip with one activity up that chain will consume all float, turning all downstream activities into critical activities.

Fibonacci Risk

The Fibonacci series is a sequence of numbers in which every item in the series equals the sum of the previous two, with the exception that the first two values are defined as 1.

Fibn=Fibn1+Fibn2Fib2=Fib1=1

This recursive definition yields the series of 1, 1, 2, 3, 5, 8, 13, ….

The ratio between two (sufficiently large) consecutive Fibonacci numbers is an irrational number known as phi (the Greek letter φ), whose value is 1.618..., and the series is expressed as:

Fibi = φ *Fibi-1

Since ancient times, φ has been known as the golden ratio. It is observed throughout nature and human enterprises alike. Two famous (and quite disparate) examples based on the golden ratio are the way the invertebrate nautilus’s shell spirals and the way markets retrace their former price levels.

Notice that the weights in Table 10-1 are similar to the beginning values of the Fibonacci series. As an alternative to Table 10-1, you can choose any four consecutive members from the Fibonacci series (such as [89, 144, 233, 377]) as weights. Regardless of your choice, when you use them to evaluate the network in Figure 10-4, the risk will always be 0.64 because the weights maintain the ratio of φ. If WG is the weight of the green activities, the other weights are:

WY=φWG*WR=φ2WG*WC=φ3WG*

and the criticality risk formula can be written as:

Risk=φ3W*GN*C+φ2W*GN*R+φ W*GN*Y+WGN*Gφ3W*GN*

Since WG appears in all elements of the numerator and the denominator, the equation can be simplified:

Risk=φ3N*C+φ2N*R+φ N*Y+NGφ3N*

Approximating the value of φ, the formula is reduced to:

Risk=4.24N*C+2.62N*R+1.62N*Y+NG4.24N*

I call this risk model the Fibonacci risk model.

Fibonacci Risk Values

The maximum value that the Fibonacci risk formula can reach is 1.0 in an all-critical network. The minimum value that it can reach is 0.24 (1/4.24), slightly less than the minimum criticality risk model value of 0.25 (when using the set [1, 2, 3, 4] for weights). This supports the notion that risk has a natural lower limit of about 0.25.

Activity Risk

The criticality risk model uses broad risk categories. For example, if you define float greater than 25 days as green, then two activities—one with 30 days of float and the other with 60 days of float—will be placed in the same green bin and will have the same risk value. To better account for the risk contribution of each individual activity, I created the activity risk model. This model is a far more discrete than the criticality risk model.

The activity risk formula is:

Risk=1F1+...+Fi+...+FNMN*=1Σi=1NFiM*N

where:

  • Fi is the float of activity i.

  • N is the number of activities in the project.

  • M is the maximum float of any activity in the project or Max(F1, F2, …, FN).

As with the criticality risk, you should exclude activities of zero duration (milestones and dummies) from this analysis.

Applying the activity risk formula to the network in Figure 10-4 yields:

Risk=130+30+30+30+10+10+5+5+5+53016*=0.67

Activity Risk Values

The activity risk model is undefined when all activities are critical. However, at the limit, given a large network (large N) that includes only one noncritical activity with float M, the model approaches 1.0:

Risk1F1MN*=1MMN*=11N10=1.0

The minimum value of the activity risk is 0 when all activities in the network have the same level of float, M:

Risk=1Σi=1NMMN*=1M*NM*N=11=0

While activity risk can in theory reach zero, in practice it is unlikely that you will encounter such a project because all projects always have some non-zero amount of risk.

Calculation Pitfall

The activity risk model works well only when the floats of the projects are more or less uniformly spread between the smallest float and the largest float in the network. An outlier float value that is significantly higher than all other floats will skew the calculation, producing an incorrectly high-risk value. For example, consider a one-year project that has a single week-long activity that can take place anywhere between the beginning and the end of the project. Such an activity will have almost a year’s worth of float, as illustrated in the network in Figure 10-5.

Figure 10-5 Network with outlier high float activity

Figure 10-5 shows the critical path (bold black) and many activities with some color-coded level of float (Fi) below. The activity shown above the critical path itself is short but has an enormous amount of float M.

Since M is much larger than any other Fi, the activity risk formula yields a number approaching 1:

>> FiRisk=1Σi=1NFiMN*1Fi*NM*N1FiM10=1.0

The next chapter demonstrates this situation and provides an easy and effective way of detecting and adjusting the float outliers.

The activity risk also produces an incorrectly low activity risk value when the project does not have many activities and the floats of the noncritical activities are all of similar or even have identical value. However, except for these rare, somewhat contrived examples, the activity risk model measures the risk correctly.

Criticality Versus Activity Risk

For decent-size real-life projects, the criticality and activity risk models yield very similar results. Each model has pros and cons. In general, criticality risk reflects human intuition better, while activity risk is more attuned to the differences between individual activities. Criticality risk modeling often requires calibration or judgment calls, but it is indifferent to how uniformly the floats are spread. Activity risk is sensitive to the presence of large outlier floats, but it is easy to calculate and does not require much calibration. You can even automate the adjustment of float outliers.

Compression and Risk

As discussed previously, risk decreases slightly with high compression, reflecting the intuitive observation that shorter projects are safer. Quantified risk modeling offers an explanation for this phenomenon. The only practical way of highly compressing a software project is to introduce parallel wor0k. Chapter 9 listed several ideas for engaging in parallel work, such as splitting activities and performing the less-dependent phases in parallel to other activities or introducing additional activities that enable the parallel work. Figure 10-6 shows this effect in a qualitative manner.

Figure 10-6 High compression makes the network more parallel

Figure 10-6 depicts two networks, with the bottom diagram being the compressed version of the top diagram. The compressed solution has fewer critical activities, a shorter critical path, and more noncritical activities in parallel. When measuring the risk of such compressed projects, the presence of more of activities with float and fewer critical activities will decrease the risk value produced by both the criticality and activity risk models.

Execution Risk

While the design risk of a highly parallel project may be lower than the design risk of a less compressed solution, such a project is more challenging to execute because of the additional dependencies and the increased number of activities that need to be scheduled and tracked. Such a project will have demanding scheduling constraints and require a larger team. In essence, a highly compressed project has converted design risk into execution risk. You should measure the execution risk as well as the design risk. A good proxy for the expected execution risk is the complexity of the network. Chapter 12 discusses how to quantify execution complexity.

Risk Decompression

While compressing the project is likely to increase the risk, the opposite is also true (up to a point): By relaxing the project, you can decrease its risk. I call this technique risk decompression. You deliberately design the project for a later delivery date by introducing float along the critical path. Risk decompression is the best way to reduce the project’s fragility, its sensitivity to the unforeseen.

You should decompress the project when the available solutions are too risky. Other reasons for decompressing the project include concerns about the present prospects based on a poor past track record, facing too many unknowns, or a volatile environment that keeps changing its priorities and resources.

As discussed in Chapter 7, a classic mistake when trying to reduce risk is to pad estimations. This will actually make matters worse and decrease the probability of success. The whole point of decompression is to keep the original estimations unchanged and instead increase the float along all network paths.

At the same time, you should not over-decompress. Using the risk models, you can measure the effect of the decompression and stop when you reach your decompression target (discussed later in this section). Excessive decompression will have diminishing returns when all activities have high float. Any additional decompression beyond this point will not reduce the design risk, but will increase the overall overestimation risk and waste time.

You can decompress any project design solution, although you typically decompress only the normal solution. Decompression pushes the project a bit into the uneconomical zone (see Figure 10-2), increasing the project’s time and cost. When you decompress a project design solution, you still design it with the original staffing. Do not be tempted to consume the additional decompression float and reduce the staff—that defeats the purpose of risk decompression in the first place.

How To Decompress

A straightforward way of decompressing the project is to push the last activity or the last event in the project down the timeline. This adds float to all prior activities in the network. In the case of the network depicted in Figure 10-4, decompressing activity 16 by 10 days results in a criticality risk of 0.47 and an activity risk of 0.52. Decompressing activity 16 by 30 days results in a criticality risk of 0.3 and an activity risk of 0.36.

A more sophisticated technique is to also decompress one or two key activities along the critical path, such as activity 8 in Figure 10-4. In general, the further down the network you decompress, the more you need to decompress because any slip in an upstream activity can consume the float of the downstream activities. The earlier in the network you decompress, the less likely it is that all of the float you have introduced will be consumed.

Decompression Target

When decompressing a project, you should strive to decompress until the risk drops to 0.5. Figure 10-7 demonstrates this point on the ideal risk curve using a logistic function with asymptotes at 1 and 0.

Figure 10-7 The decompression target on the ideal risk curve

When the project has a very short duration, the value of risk is almost 1.0, and the risk is maximized. At that point the risk curve is almost flat. Initially, adding time to the project does not reduce the risk by much. With more time, at some point the risk curve starts descending, and the more time you give the project, the steeper the curve gets. However, with even more time, the risk curve starts leveling off, offering less reduction in risk for additional time. The point at which the risk curve is the steepest is the point with the best return on the decompression—that is, the most reduction in risk for the least amount of decompression. This point defines the risk decompression target. Since the logistic function in Figure 10-7 is a symmetric curve between 0 and 1, the tipping point is at a risk value of exactly 0.5.

To determine how the decompression target relates to cost, compare the actual risk curve with the direct cost curve (Figure 10-8). The actual risk curve is confined to a narrower range than the ideal risk curve and never approaches either 0 or 1, although it behaves similarly to a logistic function between its maximum and minimum values. As discussed at the beginning of this chapter, the steepest point of the risk curve (where concave becomes convex) is at minimum direct cost, which coincides with the decompression target (Figure 10-8).

Figure 10-8 Minimum direct cost coincides with risk at 0.5

Since the risk keeps descending to the right of 0.5, you can think of 0.5 as a minimum decompression target. Again, you should monitor the behavior of the risk curve and not over-decompress.

If the minimum direct cost point of the project is also the best point risk-wise, this makes it the optimal design point for the project, offering the least direct cost at the best risk. This point is neither too risky nor too safe, benefiting as much as possible from adding time to the project.

Risk Metrics

To end this chapter, here are a few easy-to-remember metrics and rules of thumb. As is the case with every design metric, you should use them as guidelines. A violation of the metrics is a red flag, and you should always investigate its cause.

  • Keep risk between 0.3 and 0.75. Your project should never have extreme risk values. Obviously, a risk value of 0 or 1.0 is nonsensical. The risk should not be too low: Since the criticality risk model cannot go below 0.25, you can round the lower possible limit of 0.25 up to 0.3 as the lower bound for any project. When compressing the project, long before the risk gets to 1.0 (a fully critical project), you should stop compressing. Even a risk value of 0.9 or 0.85 is still high. If the bottom quarter of 0 to 0.25 is disallowed, then for symmetry’s sake you should avoid the top quarter of risk values between 0.75 and 1.0.

  • Decompress to 0.5. The ideal decompression target is a risk of 0.5, as it targets the tipping point in the risk curve.

  • Do not over-decompress. As discussed, decompression beyond the decompression target has dismissing returns, and over-decompression increases the risk.

  • Keep normal solutions under 0.7. While elevated risk may be the price you pay for a compressed solution, it is inadvisable for a normal solution. Returning to the symmetry argument, if risk of 0.3 is the lower bound for all solutions, then risk of 0.7 is the upper bound for a normal solution. You should always decompress high-risk normal solutions.

You should make both risk modeling and risk metrics part of your project design. Constantly measure the risk to see where you are and where you are heading.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset