Chapter 5 When It Really Does Have to Work

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

There are projects in which some, maybe even most, of the possible outcomes are so threatening that their occurrence cannot be tolerated. Should something go wrong—should it not go to plan—there is no mitigation available. If you are driving a car and the engine malfunctions, it can be annoying, even frightening, but it’ll be a whole lot more final if the engine malfunctioning is in a spacecraft!

There are degrees of criticality, ranging from safety-critical performance in a nuclear power station to life-and-death rescue missions, to correct compliance with regulations set out in legislation. In each case, project failure always incurs severe penalties.

In these projects, the avoidance of risk drives the planning. The constraint at the top of the hierarchy is ‘quality’—it is total conformance to a pre-specified capability. This forces a modification to the usual planning process. The focus is to avoid the possibility of events occurring that cannot be managed; it is on the use of processes where the known performance indicates very high levels of reliability with no surprises, and inevitably, it is on testing.

The gold standard for testing is verification. Verification is a technique to prove that the processes used are the right processes used in the right way. This is quality assurance, and it ensures that any errors that arise will be known and predictable.

The other approach is testing by validation. This quality control technique compares a product’s performance against its predicted performance, but only in those situations that the tester thought of and could find a way to simulate. In practice, this is a risky approach!

Testing—A Possible Solution?

Let’s consider this situation. Three months after the national qualification grades were released in the UK there were still several thousand individuals who had not received their results. It was a national scandal, with questions raised by politicians as to whether the body responsible should keep its role.

The problem was caused, as is not unusual, by a newly upgraded IT system. It just didn’t work as specified, and it cost the Head of IT his job. The mess was sorted out manually for that year but the next years’ results would be ready in a matter of months. It was made quite plain to the new appointee, Sally, that any failures next year could not and would not be tolerated. The CEO was clear that it could be the end of the organization.

So Sally initiated the zero errors no exceptions (ZENO) project. This was not a maintenance project to fix bugs. The primary outcome of the project was to prove to the stakeholders (internal and external) that the poor performance of the system would not, could not, happen again.

It has long been known that coding generates errors at a predictable ratio of total coding. Different languages and different generators have different gearing ratios, but the curve is generic. Clearance rates vary according to the aggressiveness of the testing, but once again, the general shape of the curve remains constant. The one niggling problem is the inability of any validation testing regime to demonstrate zero defects. It only takes one disconfirming instance to show something held to be true is false, while thousands of confirming cases do not prove it. Reality can be so asymmetrical!

The vital management decision, therefore, is when to declare the system ‘safe’ to go live. Where do you draw the line? What would be acceptable to the ZENO project’s stakeholders? How can you know?

The project plan was based on two models: the defects removal curve and a V-development model.

The defect removal curve model, shown in Figure 5.1, requires that test cases are generated in numbers and designed to challenge the logic, and test the input and the output flows identified by the requirements. Good tests are those that identify a defect; poor tests are those that do not—an interesting inversion for some practitioners. And everything is measured: type of defect, severity, location, and rate. In the final analysis, it is the rate that matters. When the rate dips below and stays below an agreed rate the product is ready for release to the next stage.

Figure 5.1 Idealized defect removal curve

Sally combined the defect removal curve with a V-model for determining when tests should be created. In the waterfall model of product development, testing and test cases are created late in the product life cycle, after the design and development stages are complete. It seems natural to batch the work up this way as the appropriate resources are available. In the V-model, however, the sequence and the way resources are engaged changes—the planning is different. While development activities follow the V down and up, test creation—and some of the testing—doesn’t. It goes across, as shown in Figure 5.2.

Figure 5.2 The V-development model

This approach meant that the project had to regress the product all the way back to the original requirements. It will come as no surprise to those working in IT that finding these requirements was not as easy as it ought to have been. Additionally, in a year, the user requirements had morphed—what was considered as ‘good performance’ had changed. Perhaps the most critical success factor for the ZENO project was persuading the internal stakeholders that testing was not simply a technical problem. They would be part of the solution.

The way tests were constructed and approved was also new to the organization. ‘Use cases’ are common analytical tools nowadays and they are a good way of making the process of identifying requirements and business rules ‘real’ to the user community. What is less common is to combine them with developing detailed test criteria at the same time. This was done using Gilb and Finzi’s (1988) ‘clear-the-fog-from-the-target’ principle. They assert that: “All critical attributes can be specified in measurable, testable terms, and the worst-acceptable level can be identified.” Table 5.1 illustrates how this was applied in the ZENO designs.

Table 5.1 A non-functional requirement

Effective	Scale	Test	Planned value	Currently	Standard value	Risk
Accurate	%	% age filled forms correctly processed from the test population	99%	84%	84%	H
Independent	Time	Hours to solo operation	5 hours	1.5 days	2 hours	L
Efficient	Count	Number of filled forms processed in one hour from the test population	9	5	9	M

Each critical capability was documented, and the measurable performance criteria agreed with the users. Following the Gilb and Finzi (1988) approach, the project split tests into two types. Tests that proved that a capability was present or not—that is giving a simple binary outcome were called ‘functional requirements.’ When tests were designed to find out ‘how much’ of a capability was present, these were identified as ‘non-functional requirements (NFRs).’

Suppose a system should provide the capability to divide two numbers. A simple test can demonstrate conformance to this requirement. But, for most users, it is how accurately, how quickly, and perhaps how easy it is to use the division function, that matters. These three are all NFRs. Determining precisely what is wanted when scalable terms like accuracy, speed, and ease of use are being tested is achieved by defining them solely and entirely by a set of tests and agreed results.

The names of the attributes are just labels. So, in the example in Table 5.1, what the NFR ‘effective’ means isn’t that someone is ‘accurate, independent, and efficient.’ It exactly means that someone is ‘effective’ when he or she can achieve the planned results for all of the three tests set out in the table.

The impact of using what was a testing schedule as the basis for defining the work packages and the sequencing and pacing of the work in the ZENO project was profound. Tasks were sized, based on the tests and severity of test values defined by the users. The pace of work, identifying when the next work package could be opened by a resource, was determined by the clearance rate of defects.

After four months of focused testing the defect curves for the various capabilities of the system and of the system as a whole all fell below the ‘accept’ line and Sally used this to gain agreement to put the system live.

The company is still there, with its reputation restored: as is Sally!

Process Conformance—the Only Solution

There are situations and projects where quality control on its own is not an option.

The need was to decommission a Magnox nuclear power station site. Safety was of paramount importance: for the owners, the project team, the contractors, and the public. It was set up as a project—there was a budget, and there was an end date; with a schedule of work, a sponsor, and all the other trappings. The various tasks, activities, and processes were set out in considerable detail—and strictly following the procedure was the only rule. It was only by following due process that it could be proved that all known possible risks had been addressed.

So when it became apparent, as work progressed, that neither the budget nor the end date was even remotely feasible, there was no contest; no wringing of hands. The project charged straight through these ‘constraints,’ because in that hierarchy staying safe was the only thing that mattered. In that regard, the project was planned and run more like a continuous operation than a bounded piece of work. The fundamental approach adopted was to use a verification process to prove safe conduct. In such a circumstance, ‘process is king.’

Planning—the Only Way Out

In July 2018, the world held its breath as a skilled team fought to bring 12 boys and their coach out of a water-drowned cave. A documentary setting out the planning and approach can be found by scanning this QR code (Figure 5.3), or by typing the URL shown into a browser. It is on YouTube.

Figure 5.3 Cave rescue YouTube documentary

The planning is a perfect example of the planning approach when there is only one acceptable outcome for a project. That outcome wasn’t, by the way, that there would be no fatalities. Firstly, there was a fatality, the tragic loss of the brave Thai SEAL. Secondly, and this is abundantly clear in the documentary, all of the planning was done under the acceptance that there was likely to be casualties among the rescued. However, and this is also clear, the condition that had to be met by the planning was that no deaths would be caused by the process of extraction.

Let’s look at the planning and why it was carried out the way it was.

First, there are no standard processes or work practices to implement. So, it’s not like the Magnox problem. The crucial planning decision was to get world experts in this specialist field of cave diving rescue and to ensure that the tasks were allocated to the most suitable expert.

Second, there were risks that they could not handle. One of them was if a boy panicked while being shepherded out in the very frightening conditions. No management action guaranteed a good result. This meant that a way had to be found that eliminated the possible event called ‘boy panics.’ The strategy adopted of avoiding the risk event led to a solution in which the boys were sedated. This, in turn, required that an innovative solution is found: how to move sedated bodies safely while submerged and in the pitch black?

The third observation is that having found a potential solution—a five-point attachment facemask, and a carrying strap—the approach was tested repeatedly with volunteers in a swimming pool. The planning had triggered a sandbox moment, no improvising! And the introduction of this solution underpinned much of the rest of the planning. The critical risk translated to making sure that the precious mask was never accidentally dislodged throughout the transit because that would break the constraint: no deaths caused by the extraction.

It is well worth watching the short film as it is an example of courage, humans at their best, and a tour de force in project management planning where failure is not an option under challenging circumstances.

The Role of Risk in Planning

As we so often have to repeat when teaching risks in our workshops, risk management is not a back-covering exercise. We do not create long lists of risks so when the problem arises the project manager can respond with “I told you so!” If a risk is on the risk log it means you have chosen to do something about it; and that means costs, time and management attention. As we see in the Project Mission ModelTM, risks contribute to our understanding of the scope of the project—what deliverables and activities we must include. So, when you agree with the client or other stakeholders that a risk should be included on the log, you are agreeing as to what risk strategy, and what set of management actions have a place in the plan.

The normal way of dealing with risk financially is through the allocation of contingency—a budgetary allowance for the known-unknowns—Chapter 4 on budgeting describes this in more detail. In our workshops, project managers complain that while they may highlight risks to stakeholders, the next natural step of making allowances for them in the scope, budget, or schedule is blocked. It turns out; nobody likes paying up front to prevent or reduce the probability of something that might happen. It seems to rankle in the mind of some operational managers who would rather wait and see if it happens, and if it does, then… No wonder projects can find themselves ‘up the creek without a paddle!’

In end-date and cost-constrained projects described in Chapters 3 and 4, the ‘wait until it happens’ approach is untenable, but, in mission-critical projects, it can result in, literally, death. The infeasibility of fix-on-failure in mission-critical projects justifies and forces the special approach to planning required. What might appear as a contingency and therefore optional in other projects is in this type of project fully incorporated in the budget. In mission-critical projects, the level of contingency set aside will either be considerably less or zero.

Planning When the Tolerance of Risk Is Low

When the top constraint is the need for a specific capability to be provably present, whether it is a student awards system, a decommissioned nuclear power plant, or 12 boys and their coach safe, then making trade-offs is almost always inappropriate.

In each of the stories in this chapter, the project had to find solutions that addressed critical risks: risks so perilous that it was better to consider them as CSFs. Each project had to achieve an outcome without fail, or it would be a failure, a concept closely aligned with a CSF.

The three project managers worked closely with SMEs, sometimes world experts, to understand the nature of the risks and to develop acceptable project strategies. (The term ‘acceptable’ is defined solely in terms of the degree to which the actions reduce the probability, or the impact, of the risk.) In the three projects, the principal constraint is that failure is not an option and that influences the CPPRRSS process (Figure 5.4). The planning sequence is a tightly coupled iteration between risks and processes. And process is king. Innovation, if accepted at all, is kept in a sandbox until it has been proved, and only used if essential to the completion of the project. Project management in these types of projects is perhaps more than in any other, an exercise in a disciplined approach.

Figure 5.4 Planning steps for mission-critical projects

Project Management When the Tolerance of Risk Is Low

The real professional is one who knows the risks, their degree, their causes, and the action necessary to counter them, and shares this knowledge with his colleagues and clients. (Gilb and Finzi 1988).

Much of our work on the development of organizational and personal project management capability over the last 25 years was fueled by research we undertook on behalf of CSC—the giant US computer company—in 1989. The problems facing us were “Why is project performance so unpredictable?” and “When running projects that are mission-critical, are there special people that should be assigned as the project manager?”

Research conducted on some of CITI’s data at the University of Limerick (Leonard and Willis 2000) investigated whether domain specialism was valuable in high performing projects managers. The results are interesting. They found that when projects were complex and driven by general constraints such as time and cost, projects were more likely to be successful when run by experienced project managers rather than by SMEs. So the prejudice which leads to suggesting that all engineering projects being better run by engineers; all IT projects being better run by IT specialists, is ill-founded.

However, when the constraints are related to the content of the project, the weight of evidence supports the proposition that the project is better run by SMEs with project management understanding. The mission-critical planning discussed here gives us some insights into why this might be:

In all three projects, the project manager drew upon very specialist expertise—in the case of the cave rescue, the best expert in the world. As stated in the documentary, “You don’t make your best cave diver the leader of the project—they need to be in the cave” But, the project leader, the US Captain, had extensive experience of diving and had been involved in cave rescue situations. This meant he was able to command the respect of all of the team players, and he could evaluate and prioritize the input from his team. In these situations, whose voice should be listened to is a critical judgment, and that takes subject matter expertise.
“This is the way it is going to be.” At some point on these projects, the project manager has to take on the responsibility for the decision: ‘it is going to be this way.’ In the case of the ZENO project, Sally had to convince the Board and many other stakeholders that they were going to have to do something they felt had already been done, but that this time it would work. Her position as an expert in the field of software development and testing was crucial to deciding on the approach, and in being able to convince others.
In projects like the Magnox decommissioning, the weight of decision making is against innovation. Only use tried and tested processes, and if none is available, then the deployment of any new means is going to be a very cautious endeavor. There is ‘no learn on the job.’ As we see in the cave rescue, even something as simple as carrying the boys was repeatedly trialed in a swimming pool before the rescue. This caution is a characteristic of SMEs. Their very expertise is built on a disciplined approach and a value system that respects the prior experience and proven procedures used by their fellow professionals.

Reflections

With any luck, you will never have to run a mission-critical project, but there are aspects of the approach to their management that are worth considering when planning your projects. For example:

Do others consider you primarily a domain (technical) expert or a project manager?
In what ways do you think this affects the way you approach planning and executing your project?
Do you find that you are routinely adopting ‘make good’ (i.e., fix-on-failure) as the favored risk strategy?
What do you do to gain acceptance for strategies such as risk reduction (paying ahead of time to reduce the likelihood of an event) and risk protection (paying ahead of time to reduce the impact should it happen)?
Have you found yourself discouraging innovative approaches when planning and executing a project? Can you explain why?
How do you go about getting your stakeholders and requirement givers to clarify what their acceptance criteria are for the capabilities your project has to deliver?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5 When It Really Does Have to Work

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 5 When It Really Does Have to Work