12. Advanced Techniques

Project design is an intricate discipline, and the previous chapters covered only the basic concepts. This was deliberate, to allow for a moderate learning curve. There is more to project design, and in this chapter you will find additional techniques useful in almost any project, not just the very large or the very complex. What these techniques have in common is that they give you a better handle on risk and complexity. You will also see how to successfully handle even the most challenging and complex projects.

God Activities

As the name implies, god activities are activities too large for your project. “Too large” could be a relative term, when a god activity is too large with respect to other activities in the project. A simple criterion for such a god activity is having an activity with a duration that differs by at least one standard deviation from the average duration of all the activities in the project. But god activities can be too large in absolute respect. Durations in the 40–60 days range (or longer) are too large for a typical software project.

Your intuition and experience may already tell you to avoid such activities. Typically, god activities are mere placeholders for some great uncertainty lurking under the cover. The duration and effort estimation for a god activity is almost always low-grade. Consequently, the actual activity duration may be longer, potentially enough to derail the project. You should confront such dangers as soon as possible to ensure that you have a chance of meeting your commitments.

God activities also tend to deform the project design techniques shown in this book. They are almost always part of the critical path, rendering most critical path management techniques ineffective because the critical path’s duration and its position in the network gravitate toward the god activities. To make matters worse, the risk models for projects with god activities result in misleadingly low risk numbers. Most of the effort in such a project will be spent on the critical god activities, making the project for all practical purposes a high-risk project. However, the risk calculation will be skewed lower because the other activities orbiting the critical god activities will have high float. If you removed these satellite activities, the risk number would shoot up toward 1.0, correctly indicating the high risk resulting from the god activities.

Handling God Activities

The best course of action with god activities is to break them down into smaller independent activities. Subdividing god activities will markedly improve the quality of the estimation, reduce the uncertainty, and provide the correct risk value. But what if the scope of work is truly huge? You should treat such activities as mini-projects and compress them. Start by identifying internal phases of the god activities and finding ways of working in parallel across these phases inside each god activity. If that is not possible, you should look for ways of making the god activities less critical by getting them out of the way of the other activities in the project.

For instance, developing simulators for the god activities reduces other activities’ dependencies on the god activities themselves. This will enable working in parallel to the god activities, making the god activities less (or maybe not at all) critical. Simulators also reduce the uncertainty of the god activities by placing constraints on them that reveal hidden assumptions, making the detailed design of the god activities easier.

You should also consider ways of factoring the god activities into separate side projects. Factoring into a side project is important especially if the internal phases of the god activity are inherently sequential. This makes project management and progress tracking much easier. You must design integration points along the network to reduce the integration risk at the end. Extracting the god activities this way tends to increase the risk in the rest of the project (the other activities have much less float once the god activities are extracted). This is typically a good thing because the project would otherwise have deceptively low risk numbers. This situation is so common that low risk numbers are often a signal to look for god activities.

Risk Crossover Point

The case study in Chapter 11 used the simple guidelines of risk lower than 0.75 and higher than 0.3 to include and exclude project design options. You can be more precise than general rules of thumb when deciding on your project design options.

In Figure 11-33, at the point of minimum direct cost and immediately to its left, the direct cost curve is basically flat, but the risk curve is steep. This is an expected behavior because the risk curve typically reaches its maximum value before the direct cost reaches its maximum value with the most compressed solutions. The only way to achieve maximum risk before maximum direct cost is if, initially, left of minimum direct cost, the risk curve rises much faster than the direct cost curve. At the point of maximum risk (and a bit to its right), the risk curve is flat or almost flat, while the direct cost curve is fairly steep.

It follows that there must be a point left of minimum direct cost where the risk curve stops rising faster than the direct cost curve. I call that point the risk crossover point. At the crossover point, the risk approaches its maximum. This indicates you should probably avoid compressed solutions with risk values above the crossover. In most projects, the risk crossover point will coincide with the value of 0.75 on the risk curve.

The risk crossover point is a conservative point both because it is not at maximum risk and because it is based on the behavior of the risk and direct cost, rather than an absolute value of risk. That said, given the track record of most software projects, a bit of caution is never a bad thing.

Deriving the Crossover Point

Finding the risk crossover point requires comparing the rate of growth of the direct cost curve and the risk curve. You can do that analytically using some basic calculus, graphically in a spreadsheet, or using a numerical equation solver. The files accompanying this chapter contain all three techniques almost in a template manner so that you can easily find the risk crossover point.

The rate of growth of a curve is expressed by its first derivative, so you have to compare the first derivative of the risk curve with the first derivative of the direct cost curve. The risk model in the example project of Chapter 11 is in the form of a polynomial of the third degree with the following form:

y=ax3+bx2+cx+d

The first derivative of that polynomial is in the form of this second-degree polynomial:

y=3ax2+2bx+c

With the example project, the risk formula is:

R=0.01t30.36t2+3.67t11.07

Therefore, the first derivative of the risk is:

R=0.03t20.72t+3.67

With the example project, the direct cost formula is:

C=0.99t221.32t+136.57

Therefore, the first derivative of the direct cost is:

C=1.98t21.32

There are two issues you need to overcome before you can compare the two derivative equations. The first issue is that the ranges of values between maximum risk and minimum direct cost in both curves are monotonically decreasing (meaning the rates of growth of the two curves will be negative numbers), so you must compare the absolute values of the rates of growth. The second issue is that the raw rates of growth are incompatible in magnitude. The risk values range between 0 and 1, while the cost values are approximately 30 for the example project. To correctly compare the two derivatives, you must first scale the risk values to the cost values at the point of maximum risk.

The recommended scaling factor is given by

F=R(tmr)C(tmr)

where:

  • tmr is the time for maximum risk.

  • R(tmr) is the project’s risk formula value at tmr.

  • C(tmr) is the project’s cost formula value at tmr.

The risk curve is maximized when the first derivative of the risk curve, R', is zero. Solving the project’s risk equation for t when R' = 0 yields a tmr of 8.3 months. The corresponding risk value, R, is 0.85, and the corresponding direct cost value is 28 man-months. The ratio between these two values, F, is 32.93, the scaling factor for the example project.

The acceptable risk level for the project occurs when all of the following conditions are met:

  • Time is to the left of the point of minimum risk of the project.

  • Time is to the right of the point of maximum risk of the project.

  • Risk rises faster than cost in absolute value to scale.

You can put these conditions together in the form of this expression:

F|R|>|C|

Using the equations for the risk and direct cost derivatives as well as the scaling factor yields:

32.93*|0.03t20.72t+3.67|>|1.98t21.32|

Solving the equation provides the acceptable range for t:

9.03<t<12.31

The result is not one, but two crossover points, at 9.03 and 12.31 months. Figure 12-1 visualizes the behavior of the scaled risk and cost derivatives in absolute value. You can clearly see that the risk derivative in absolute value crosses over the cost derivative in absolute value in two places (hence crossover points).

Figure 12-1 Risk crossover points

Math aside, the reason why there are two risk crossover points has to do with the semantics of the points from a project design perspective. At 9.03 months, the risk is 0.81; at 12.31 months, the risk is 0.28. Superimposing these values on the risk curve and the direct cost curve in Figure 12-2 reveals the true meaning of the crossover points.

Figure 12-2 Risk inclusion and exclusion zones

Project design solutions to the left of the 9.03-month risk crossover point are too risky. Project design solutions to the right of the 12.31-month risk crossover point are too safe. In between the two risk crossover points, the risk is “just right.”

Acceptable Risk and Design Options

The risk values at the crossover points of 0.81 and 0.28 agree closely with the rules of thumb of 0.75 and 0.30. For the example project, the acceptable risk zone includes the first compressed solution, the normal solution, and the decompression points of D4, D3, and D2 (see Figure 11-35). All of these points are practical design options. “Practical” in this context means the project stands a reasonable chance of meeting its commitments. The more compressed solutions are too risky, and the D1 point is too safe. You can further select between the decompression points by finding the best decompression target.

Finding the Decompression Target

As Chapter 10 pointed out, the risk level of 0.5 is the steepest point in the risk curve. This makes it the ideal decompression target because it offers the best return—that is, for the least amount of decompression, you get the most reduction in risk. This ideal point is the tipping point of risk, and therefore it is the minimum point of decompression.

If you have plotted the risk curve, you can see where that tipping point is located, and if you have one, select a decompression point at the tipping point, or more conservatively, to its right. This technique was used in Chapter 11 to recommend D3 in Figure 11-29 as the decompression target. However, merely eyeballing a chart is not a good engineering practice. Instead, you should apply elementary calculus to identify the decompression target in a consistent and objective manner.

Given that the risk curve emulates a standard logistic function (at least between minimum and maximum risk), the steepest point in the curve also marks a twist or inflection point in the curve. To the left of that point the risk curve is concave, and to the right of it the risk curve is convex. Calculus tells us that at the inflection point, where concave becomes convex, the second derivative of the curve is zero. The ideal risk curve and its first two derivatives are shown graphically in Figure 12-3.

Figure 12-3 The inflection point as decompression target

Using the example project from Chapter 11 to demonstrate this technique, you have the risk equation as polynomial of the third degree. Its first and second derivatives are:

y=ax3+bx2+cx+dy=3ax2+2bx+cy"=6ax+2b

Equating the second derivative to zero provides this formula:

x=b3a

Since the risk model is:

R=0.01t30.36t2+3.67t11.07

the point at which the second derivative is zero is at 10.62 months:

t=0.363*0.01=10.62

At 10.62 months, the risk value is 0.55, which differs only 10% from the ideal target of 0.5. When plotted on the discrete risk curves in Figure 12-4, you can see that this value falls right between D4 and D3, substantiating the choice in Chapter 11 of D3 as the decompression target.

Figure 12-4 Decompression target on the risk curves

Unlike in Chapter 11, which used visualization of the risk chart and a judgment call to identify the tipping point, the second derivative provides an objective and repeatable criterion. This is especially important when there is no immediately obvious visual risk tipping point or when the risk curve is skewed higher or lower, making the 0.5 guideline unusable.

Geometric Risk

The risk models presented in Chapter 10 all use a form of arithmetic mean of the floats to calculate the risk. Unfortunately, the arithmetic mean handles an uneven distribution of values poorly. For example, consider the series [1, 2, 3, 1000]. The arithmetic mean of that series is 252, which does not represent the values in the series well at all. This behavior is not unique to risk calculations, and any attempt at using an arithmetic mean in the face of very uneven distribution will yield an unsatisfactory result. In such a case it is better to use a geometric rather than an arithmetic mean.

The geometric mean of a series of values is the product of multiplying all the values in the series of n values and then taking the nth root of the multiplication. Given a series of values a1 to an, the geometric mean of that series would be:

Mean=a1*a2*...*ann=Πi=1nain

For example, while the arithmetic mean of the series [2, 4, 6] is 4, the geometric mean is 3.63:

Mean=2*4*63=3.63

The geometric mean is always less or equal to arithmetic mean of the same series of values:

Πi=1nainΣi=1nain

The two means are equal only when all values in the series are identical.

While initially the geometric mean looks like an algebraic oddity, it shines when it comes to an uneven distribution of values. In the geometric mean calculation, extreme outliers have much less effect on the result. For the example series of [1, 2, 3, 1000], the geometric mean is 8.8 and is a better representation of the first three numbers in the series.

Geometric Criticality Risk

As with the arithmetic criticality risk, you can use the float color coding and the corresponding number of activities to calculate the geometric criticality risk. Instead of multiplying the float weight by the number of activities, you raise it to that power. The geometric criticality formula is:

Risk=(WC)NC*(WR)NR*(WY)NY*(WG)NGNWC

where:

  • WC is the weight of critical activities.

  • WR is the weight of red activities.

  • WY is the weight of yellow activities.

  • WG is the weight of green activities.

  • NC is the number of critical activities.

  • NR is the number of red activities.

  • NY is the number of yellow activities.

  • NG is the number of green activities.

  • N is the number of activities in the project (N = NC + NR + NY + NG).

Using the example network of Figure 10-4, the geometric criticality risk is:

Risk=46*34*22*14164=0.60

The corresponding arithmetic criticality risk for the same network is 0.69. As expected, the geometric criticality risk is slightly lower than the arithmetic criticality risk.

Risk Value Range

Like the arithmetic criticality risk, the geometric criticality risk has the maximum value of 1.0 when all activities are critical and a minimum value of WG over WC when all activities in the network are green:

Risk=(WC)0*(WR)0*(WY)0*(WG)NNWC=1*1*1*(WG)NNWC=WGWC

Geometric Fibonacci Risk

You can use the Fibonacci ratio between criticality weights to produce the geometric Fibonacci risk model. Given this definition of weights:

WY=φ*WGWR=φ2*WGWC=φ3*WG

The geometric Fibonacci formula is:

Risk=(φ3*WG)NC*(φ2*WG)NR*(φ*WG)NY*(WG)NGNφ3*WG=φ3NC+2NR+NY*WGNC+NR+NY+NGNφ3*WG=φ3NC+2NR+NY*WGNNφ3*WG=φ3NC+2NR+NYNφ3=φ3NC+3NR+NYN3

Risk Value Range

Like the arithmetic Fibonacci risk, the geometric Fibonacci risk has the maximum value of 1.0 when all activities are critical and a minimum value of 0.24 (φ –3) when all activities in the network are green.

Geometric Activity Risk

The geometric activity risk formula uses a geometric mean of the floats in the project. Critical activities have zero float, which creates a problem because the geometric mean will always be zero. The common workaround is to add 1 to all values in the series and subtract 1 from the resulting geometric mean.

The geometric activity risk formula is therefore:

Risk=1Πi=1N(Fi+1)1NM

where:

  • Fi is the float of activity i.

  • N is the number of activities in the project.

  • M is the maximum float of any activity in the project or Max(F1, F2, …, FN).

Using the example network of Figure 10-4, the geometric activity risk would be:

Risk=11*1*1*1*1*1*31*31*31*31*11*11*6*6*6*611630=0.87

The corresponding arithmetic activity risk for the same network is 0.67.

Risk Value Range

The maximum value of the geometric activity model approaches 1.0 as more activities become critical, but it is undefined when all activities are critical. The geometric activity risk has a minimum value of 0 when all activities have the same level of float. Unlike the arithmetic activity risk, with the geometric activity risk there is no need to adjust outliers of abnormally high float, and the floats do not need to be uniformly spread.

Geometric Risk Behavior

Both the geometric criticality risk and the geometric Fibonacci risk models yield results that are very similar to their arithmetic counterparts. However, the geometric activity formula does not track well with its arithmetic kin, and its value is much higher across the range. The result is that the geometric activity risk values typically do not conform to the risk value guidelines provided in this book.

Figure 12-5 illustrates the difference in behavior between the geometric risk models by plotting all of the risk curves of the example project from Chapter 11.

Figure 12-5 Geometric versus arithmetic risk models

You can see that the geometric criticality and geometric Fibonacci risk have the same general shape as the arithmetic models, only slightly lower, as expected. You can clearly observe the same risk tipping point. The geometric activity risk is greatly elevated, and its behavior is very different from the arithmetic activity risk. There is no easily discernable risk tipping point.

Why Geometric Risk?

The near-identical behavior of the arithmetic and geometric criticality (as well as Fibonacci) risk models illustrates that it does not matter much which one you use. The differences do not justify the time and effort involved in building yet another risk curve for the project. If anything, just for the sake of simplicity when explaining risk modeling to others, you should choose the arithmetic model. The geometric activity risk is clearly less useful than the arithmetic activity risk, but its utility in one case is why I decided to discuss geometric risk.

Geometric activity risk is the last resort when trying to calculate the risk of a project with god activities. Such a project in effect has very high risk since most of the effort is spent on the critical god activities. As explained previously, due to the size of the god activities, the other activities have considerable float, which in turn skews the arithmetic risk lower, giving you a false sense of safety. In contrast, the geometric activity risk model provides the expected high risk value for projects with god activities. You can produce a correlation model for the geometric activity risk and perform the same risk analysis as with the arithmetic model.

Figure 12-6 shows the geometric activity risk and its correlation model for the example project presented in Chapter 11.

Figure 12-6 Geometric activity risk model

The point of maximum risk, 8.3 months, is shared by both the arithmetic and geometric models. The minimum decompression target for the geometric activity model (where the second derivative is zero) comes at 10.94 months, similar to the 10.62 months of the arithmetic model and just to the right of D3. The geometric risk crossover points are 9.44 months and 12.25 months—a slightly narrower range than the 9.03 months and 12.31 months obtained when using the arithmetic activity risk model. As you can see, the results are largely similar for the two models, even though the behavior of the risk curve is very different.

Of course, instead of finding a way to calculate the risk of a project with god activities, you should fix the god activities as discussed previously. Geometric risk, however, allows you to deal with things the way they are, not the way they should be.

Execution Complexity

In the previous chapters, the discussion of project design focused on driving educated decisions before work starts. Only by quantifying the duration, cost, and risk can you decide if the project is affordable and feasible. However, two project design options could be similar in their duration, cost, and risk, but differ greatly in their execution complexity. Execution complexity in this context refers to how convoluted and challenging the project network is.

Cyclomatic Complexity

Cyclomatic complexity measures connectivity complexity. It is useful in measuring the complexity of anything that you can express as a network, including code and the project.

The cyclomatic complexity formula is:

Complexity=EN+2*P

For project execution complexity:

  • E is the number of dependencies in the project.

  • N is the number of activities in the project.

  • P is the number of disconnected networks in the project.

In a well-designed project, P is always 1 because you should have a single network for your project. Multiple networks (P > 1) make the project more complex.

To demonstrate the cyclomatic complexity formula, given the network in Table 12-1, E equals 6, N is 5, and P is 1. The cyclomatic complexity is 3:

Complexity=65+2*1=3

Table 12-1 Example network with cyclomatic complexity of 3

ID

Activity

Depends On

1

A

 

2

B

 

3

C

1,2

4

D

1,2

5

E

3,4

Project Type and Complexity

While there is no direct way to measure the execution complexity of the project, you can use the cyclomatic complexity formula as its proxy. The more internal dependencies the project has, the riskier and more challenging it is to execute. Any of these dependencies can be delayed, causing cascading delays in multiple other places in the project. The maximum cyclomatic complexity of a project with N activities is on the order of N2, a project where every one of the activities depends on all the other activities.

In general, the more parallel the project, the higher its execution complexity will be. At the very least, it is challenging to have a larger staff available in time for all the parallel activities. The parallel work (and the additional work required to enable the parallel work) increases both the workload and the team size. A larger team will be less efficient and more demanding to manage. Parallel work also results in higher cyclomatic complexity because the parallel work increases E faster than it increases N. At the extreme, a project with N activities starting at the same time and finishing together, where each activity is independent of all other activities and the activities are all done in parallel, has a cyclomatic complexity of N + 2. Such a project has a huge execution risk.

In much the same way, the more sequential the project, the simpler and less complex it will be to execute. At the extreme, the simplest project with N activities is a serial string of activities. Such a project has the minimum possible cyclomatic complexity of exactly 1. Subcritical projects with very few resources tend to resemble such long strings of activities. While the design risk of such a subcritical project is high (approaching 1.0), the execution risk is very low.

Empirically, I find that well-designed projects have a cyclomatic complexity of 10 or 12. While this level may seem low, you must understand that the chance of meeting your commitments is disproportionally related to the execution complexity. For example, a project with cyclomatic complexity of 15 may be only 25% more complex than a project with cyclomatic complexity of 12, but the lower-complexity project may be twice as likely to succeed. High execution complexity is therefore positively correlated to the likelihood of the failure. The more complex the project, the more likely you are to fail to meet your commitments. In addition, successfully delivering on one complex project is no guarantee that you will be able to repeat that success with another complex project.

It is certainly possible to repeatedly deliver projects with high cyclomatic complexity levels, but it takes time to build such capabilities across the organization. It requires a sound architecture, great project design within the risk guidelines, a team whose members are used to working together and are at peak productivity, and a top-notch project manager who pays meticulous attention to details and proactively handles conflicts. Lacking these ingredients, you should take active steps to reduce the execution complexity using the design-by-layers and network-of-networks techniques described later on in this chapter.

Compression and Complexity

Complexity tends to increase with compression and is likely to do so in a nonlinear manner. Ideally, the complexity of your project design solutions as a function of their durations will look like the dashed curve of Figure 12-7.

Figure 12-7 Project time–complexity curve

The problem with such a classic nonlinear behavior is it does not account for compressing the project by using more skilled resources without any change to the project network. The dashed line also presumes that complexity can be further reduced with ever-increasing allocation of time, but, as previously stated, complexity has a hard minimum at 1. A better model of the project complexity is some kind of a logistic function (the solid line in Figure 12-7).

The relatively flat area of the logistic function represents the case of working with better resources. The sharp rise on the left of the curve corresponds to parallel work and compressing the project. The sharp drop on the right of the curve represents the project’s subcritical solutions (which also take considerably more time). Figure 12-8 demonstrates this behavior by plotting the complexity curve of the example project from Chapter 11.

Figure 12-8 The example project time–complexity curve

Recall from Chapter 11 that even the most compressed solution was not materially more expensive than the normal solution. Complexity analysis reveals that the true cost of maximum compression in this case was a 25% increase in cyclomatic complexity—an indicator that the project execution is far more challenging and risky.

Very Large Projects

The project design methodology in this book works well regardless of scale. It does, however, become more challenging as the project gets bigger. There is a maximum capacity of the human brain to maintain a mental picture of the details, constraints, and interdependencies within the project. At some project size, you will lose your ability to design the project. Most people can design a project that has up to 100 activities or so. With practice, this number can increase. A well-designed system and project make it possible to handle even a few hundreds of activities.

Megaprojects with many hundreds or even thousands of activities have their own level of complexity. They typically involve multiple sites, dozens or hundreds of people, huge budgets, and aggressive schedules. In fact, you typically see the last three in tandem because the company first commits to an aggressive schedule and then throws people and money at the project, hoping the schedule will yield.

The larger the project becomes, the more challenging the design and the more imperative it is to design the project. First, the larger the project, the more is at stake if it fails. Second, and even more importantly, you have to plan to work in parallel out of the gate because no one will wait 500 years—or, for that matter, even 5 years—for delivery. Making things worse, with a megaproject the heat will be on from the very first day, because such projects place the future of the company at stake, and many careers are on the line. You will be under the spotlight with managers swarming around like angry yellow jackets.

Almost without exception, all megaprojects end up as megafailures. Size maps directly to poor outcomes.1 The larger the project, the larger the deviation will be from its commitments, with longer delays and higher and higher costs incurred relative to the initial schedule and budget. Megaprojects are modern-day failed ziggurats on a biblical scale.

1. Nassim Nicholas Taleb, Antifragile (Random House, 2012).

Complex Systems and Fragility

The fact that large projects are ordained to fail is not an accident, but rather a direct result of their complexity. In this context, it is important to distinguish between complex and complicated. Most software systems are complicated, not complex. A complicated system can still have deterministic behavior, and you can understand its inner workings exactly. Such a system will have known repeatable responses to set inputs, and its past behavior is indicative of its future behavior. In contrast to a complicated system, the weather, the economy, and your body are complex systems. Complex systems are characterized by lack of understanding of the internal mechanism at play and inability to predict behavior. This complex behavior is not necessarily due to numerous complicated internal parts. For example, three bodies orbiting one another are a complex nondeterministic system. Even a simple pendulum with a pivot is a complex system. While both of these examples are not complicated, they are still complex systems.

In the past, complex software systems were limited to mission-critical systems, where the underlying domain was inherently complex. Over the past two decades, due to increased systems connectivity, diversity, and the scale of cloud computing, enterprise systems and even just regular software systems now exhibit complex system traits.

A fundamental attribute of complex systems is that they respond in nonlinear ways to minute changes in the conditions. This is the last-snowflake effect, in which a single additional flake can cause an avalanche on a snow-laden mountain side.

The single snowflake is so risky because complexity grows nonlinearly with size. In large systems, the increase in complexity causes a commensurate increase in the risk of failure. The risk function itself can be a highly nonlinear function of complexity, akin to a power law function. Even if the base of the function is almost 1, and the system grows slowly in size (one additional line of code at a time or one more snowflake on the mountain side), over time the growth in complexity and its compounding effect on risk will cause a failure due to a runaway reaction.

Complexity Drivers

Complexity theory2 strives to explain why complex systems behave as they do. According to complexity theory, all complex systems share four key elements: connectivity, diversity, interactions, and feedback loops. Any nonlinear failure behavior is the product of these complexity drivers.

2. https://en.wikipedia.org/wiki/Complex_system

Even if the system is large, if the parts are disconnected, complexity will not raise its head. In a connected system with n parts, connectivity complexity grows in proportion to n2 (a relationship known as Metcalfe’s law3). You could even make the case for connectivity complexity on the order of nn due to ripple effects, where any single change causes n changes and each of those causes n additional changes, and so on.

3. https://en.wikipedia.org/wiki/Metcalfe's_law

The system can still have connected parts and not be that complex to manage and control if the parts are clones or simple variations of one another. On the other hand, the more diverse the system is (such as having different teams with their own tools, coding standards, or design), the more complex and error prone that system will be. For example, consider an airline that uses 20 different types of airplanes, each specific for its own market, with unique parts, oils, pilots, and maintenance schedules. This very complex system is bound to fail simply because of diversity. Compare that with an airline that uses just a single generic type of airplane that is not designed for any market in particular and can serve all markets, passengers, and ranges. This second airline is not just simpler to run: It is more robust and can respond much more quickly to changes in the marketplace. These ideas should resonate with the advantages of composable design discussed in Chapter 4.

You can even control and manage a connected diverse system as long as you do not allow intense interactions between the parts. Such interactions can have destabilizing unintended consequences across the system, often involving diverse aspects such as schedule, cost, quality, execution, performance, reliability, cash flow, customer satisfaction, retention, and morale. Unabated, these changes will trigger more interactions in the form of feedback loops. Such feedback loops magnify the problems to the point that input or state conditions that were not an issue in the past become capable of taking the system down.

Size, Complexity, and Quality

The other reason large projects fail has to do with quality. When a complex system depends on the completion of a series of tasks (such as a series of interactions between services or activities in a project), and when the failure of any task causes failure of the whole, any quality issue produces severe side effects, even if the components are very simple. This was demonstrated in 1986 when a 30-cent O-ring brought down a $3 billion space shuttle.

When the quality of the whole depends on the quality of all the components, the overall quality is the product of the qualities of the individual elements.4 The result is highly nonlinear decay behavior. For example, suppose the system performs a complex task composed of 10 smaller tasks, each having a near-perfect quality of 99%. In that case the aggregate quality is only 90% (0.9910 = 0.904).

4. Michael Kremer, “The O-Ring Theory of Economic Development,” Quarterly Journal of Economics 108, no. 3 (1993): 551-575.

Even this assumption of 99% quality or reliability is unrealistic because most software units are never tested to within 99% of all possible inputs, all possible interactions with all connecting components, all possible feedback loops of state changes, all deployments and customer environments, and so on. The realistic unit quality figures are probably lower. If each unit was tested and qualified within a 90% level, the system quality drops to 35%. A 10% decrease in quality per component degrades the overall outcome by 65%.

The more components the system has, the worse the effect becomes, and the more vulnerable the system becomes to any quality issues. This explains why large projects often suffer from poor quality to the point of being unusable.

Network of Networks

The key to success in large projects is to negate the drivers of complexity by reducing the size of the project. You must approach the project as a network of networks. Instead of one very large project, you create several smaller, less complex projects that are far more likely to succeed. The cost will typically increase at least by a little, but the likelihood of failure will decrease considerably.

With a network of networks, there is a proviso that the project is feasible, that it is somehow possible to build the project in this way. If the project is feasible, then there is a good probability that the networks are not tightly coupled and that the segmentation into separate subnetworks is possible. Otherwise, the project is destined to fail.

Once you have the network of networks, you design, manage, and execute each of them just like any other project.

Designing a Network of Networks

Since you do not know in advance if the segmentation is possible or what the network of networks looks like, you must engage in a preliminary mini-project whose mission is to discover the network of networks. There is never just one way of designing the network of networks; indeed, there are usually multiple possibilities for shape and structure. These possibilities are hardly ever equivalent because a few of them are likely to be easier to deal with than others. You must compare and contrast the various options.

As with all design efforts, your approach to designing the network of networks should be iterative. Start by designing the megaproject, and then chop it into individual manageable projects along and beside the mega critical path. Look for junctions where the networks interact. These junctions are a great place to start with the segmentation. Look for junctions not just of dependencies but also of time: If an entire set of activities would complete before another set could start, then there is a time junction, even if the dependencies are all intertwined. A more advanced technique is to look for a segmentation that minimizes the total cyclomatic complexity of the network of networks. In this case, P greater than 1 is acceptable for the total complexity, while each subnetwork has P of 1.

Figure 12-9 shows an example megaproject, and Figure 12-10 shows the resulting three independent subnetworks.

Figure 12-9 An example megaproject

Figure 12-10 The resulting network of networks

Quite often, the initial megaproject is just too messy for such work. When this is the case, investing the time to simplify or improve the design of the megaproject will help you identify the network of networks. Look for ways to reduce complexity by introducing planning assumptions and placing constraints on the megaproject. Force certain phases to complete before others start. Eliminate solutions masquerading as requirements.

The diagram in Figure 12-9 underwent several complexity reduction iterations to reach the state shown. The initial diagram was incomprehensible and unworkable.

Decoupling Networks

The network of networks will likely include some dependencies that scuttle the segmentation or somehow prevent parallel work across all networks, at least initially. You can address these by investing in the following network-decoupling techniques:

  • Architecture and interfaces

  • Simulators

  • Development standards

  • Build, test, and deployment automation

  • Quality assurance (not mere quality control)

Creative Solutions

Although there is no set formula for constructing the network of networks, the best guideline is to be creative. You will often find yourself resorting to creative solutions to nontechnical problems that stifle the segmentation. Perhaps political struggles and pushback concentrate parts of the megaproject instead of distributing them. In such cases, you need to identify the power structure and defuse the situation to allow for the segmentation. Perhaps cross-organizational concerns involving rivalries prevent proper communication and cooperation across the networks, manifesting as rigid sequential flow of the project. Or maybe the developers are in separate locations, and management insists on providing some work for each location, in a functional way. Such decomposition has nothing to do with the correct network of networks or where the real skills reside. You may need to propose a massive reorganization, including the possibility of relocating people, to have the organization reflect the network of networks, rather than the other way around (more on this topic in the next section on countering Conway’s law).

Perhaps some legacy group is mandated to be part of the project due to personal favors. Instead of segmentation, this creates a choke point for the project because everything else now revolves around the legacy group. One solution might be to convert the legacy group into a cross-network group of domain expert test engineers.

Finally, try several renderings of the network of networks by different people, for the simple reason that some may see simplicity where others do not. Given what is at stake, you must pursue every angle. Take your time to carefully design the network of networks. Avoid rushing. This will be especially challenging since everyone else will be aching to start work. Due to the project’s size, however, certain failure lurks without this crucial planning and structuring phase.

Countering Conway’s Law

In 1968, Melvin Conway coined Conway’s law,5 which states that organizations that design systems always produce designs that are copies of the communication structures of these organizations. According to Conway, a centralized, top-down organization can produce only centralized, top-down architectures—never distributed architectures. Similarly, an organization structured along functional lines will conceive only functional decompositions of systems. Certainly, in the age of digital communication, Conway’s law is not universal, but it is common.

5. Melvin E. Conway, “How Do Committees Invent?,” Datamation, 14, no. 5 (1968): 28-31.

If Conway’s law poses a threat to your success, a good practical way to counter it is to restructure the organization. To do so, you first establish the correct and adequate design, and then you reflect that design in the organizational structure, the reporting structure, and the communication lines. Do not shy away from proposing such a reorganization as part of your design recommendations at the SDP review.

Although Conway referred originally to system design, his law applies equally well to project design and to the nature of the network. If your project design includes a network of networks, you may have to accompany your design with a restructuring of the organization that mimics those networks. The degree to which you will have to counter Conway’s law even in a regular-size project is case-specific. Be aware of the organizational dynamics and devise the correct structure if your observation (or even your intuition) is telling you it is necessary.

Small Projects

On the other side of the scale from very large projects are small (or even very small) projects. Counterintuitively, it is important to carefully design such small projects. Small projects are even more susceptible to project design mistakes than are regular-size projects. Due to their size they respond much more to changes in their conditions. For example, consider the effects of assigning a person incorrectly. With a team of 15 people, such a mistake affects about 7% of the available resources. With a team of 5 people, it affects 20% of the project resources. A project may be able to survive a 7% mistake, but a 20% mistake is serious trouble. A large project may have the resource buffer to survive mistakes. With a small project, every mistake is critical.

On the positive side, small projects may be so simple that they do not require much project design. For example, if you have only a single resource, the project network is a long string of activities whose duration is the sum of duration across all activities. With very minimal project design, you can know the duration and cost. There is also no need to build the time–cost curve or calculate the risk (it will be 1.0). Since most projects have some form of a network that differs from a simple string or two, and since you should avoid subcritical projects, in a practical sense you almost always design even small projects.

Design by Layers

All the project design examples so far in this book have produced their network of activities based on the logical dependencies between activities. I call this approach design by dependencies. There is, however, another option—namely, building the project according to its architecture layers. This is a straightforward process when using The Method’s architectural structure. You could first build the Utilities, then the Resources and the ResourceAccess, followed by Engines, Managers, and Clients, as shown in Figure 12-11. I call this technique design by layers.

Figure 12-11 Project design by layers

As shown in Figure 12-11, the network diagram is basically a series of pulses, each corresponding to a layer in the architecture. While the pulses are sequential and often serialized, internally each pulse is constructed in parallel. The Method’s adherence to the closed architecture principle enables this parallel work inside a pulse.

When designing by layers, the schedule is similar to the same project designed by dependencies. Both cases result in a similar critical path composed of the components of the architecture across the layers.

Caution

When designing by layers, do not forget to add nonstructural activities, such as explicit integration and system testing, to the network.

Pros and Cons

A downside to designing by layers is the increase in risk. In theory, if all services in each layer are of equal duration, then they are all critical, and the risk number approaches 1.0. Even if that is not the case, any delay in the completion of any layer immediately delays the entire project because subsequent pulses are put on hold. When designing by dependencies, however, only the critical activities run such risk of delaying the project. The best (and nearly mandatory) way of addressing the high risk of a design-by-layers project is to use risk decompression. Because almost all activities will be critical or near critical, the project will respond very well to decompression, as all activities in each pulse gain the additional float. To further compensate for the implicit risk of design by layers, you should decompress the project so that its risk is less than 0.5, perhaps to 0.4. This level of decompression suggests that projects designed by layers will take longer than projects designed by dependencies.

Designing by layers can increase the team size and, in turn, the direct cost of the project. With design by dependencies, you find the lowest level of resources allowing unimpeded progress along the critical path by trading float for resources. With design by layers, you may need as many resources as are required to complete the current layer. The team has to work in parallel on all activities within each pulse and complete all of them before beginning the next pulse. You must assume all the components in the current layer are required by the next layer.

With that in mind, designing by layers has the clear advantage of producing a very simple project design to execute. It is the best antidote for a complex project network and can reduce overall cyclomatic complexity by half or more. In theory, since the pulses are sequential, at any moment in time the project manager has to contend with only the execution complexity of each pulse and the support activities. The cyclomatic complexity of each pulse roughly matches the number of parallel activities. In a typical Method-based system, this cyclomatic complexity is as low as 4 or 5, whereas the cyclomatic complexity of projects designed by dependencies can be 50 or more.

Many projects in the software industry tolerate both schedule slips and overcapacity; therefore their real challenge is complexity, not duration or cost. When possible, with Method-based systems, I prefer designing by layers to address the otherwise risky and complex execution. As with most things when it comes to project design, designing by layers is predicated on having the right architecture in the first place.

You can combine the techniques of both design by layers and design by dependencies. For example, the example project in Chapter 11 moved all of the infrastructure Utilities to the beginning of the project, despite the fact that their logical dependencies would have allowed them to take place much later in the project. The rest of the project was designed based on logical dependencies.

Layers and Construction

Designing and building by layers is a perfect example of the design rule from Chapter 4: Features are always and everywhere aspects of integration, not implementation. Only after all the layers are complete can you integrate them into features. This implies that designing by layers is well suited to regular projects rather than larger and more complex projects with multiple independent subsystems. To return to the house analogy, with a simple house the construction is always by layers—typically the foundation, plumbing, walls, roof, and so on. With a large multistory building, each floor is its own separate project that contains plumbing, walls, ceiling, and other tasks.

A final observation is that designing the project by layers basically breaks the project into smaller subprojects. These smaller projects are done sequentially and are separated by junctions of time. This is akin to breaking a megaproject into smaller networks and carries very similar benefits.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset