Chapter 2. Derive Metrics from Your Measurement Goals

Only good questions deserve good answers.

Oscar Wilde

Best Practice:

  • Apply the Goal-Question-Metric approach to determine meaningful metrics for achieving your goals.

  • Make the assumptions behind your metrics explicit and avoid common metric pitfalls.

  • This helps to manage the development process and improves the quality of its outcomes.

Every development process is aimed at gradual improvement, of the process as well as the product delivered. In order to determine progress, all kinds of measurements and metrics can be used, but not all measurements and metrics are meaningful in the specific context of an organization.

The Goal-Question-Metric approach (hereafter GQM) provides a simple structure for arriving at the right measurements for your context. These measurements allow you to manage the development process by assessing its status and progress, and to know when your goals are reached.

The GQM approach is a simple top-down approach: we start with goals, then questions that need answering to determine whether goals are being met, and metrics that can help answer these questions. So there might be one goal with multiple questions with which each question has multiple metrics to answer. The GQM idea itself is very simple. GQM’s purpose is to keep questioning whether you are coming closer to your goals.

Motivation

On the face of it, GQM looks rather trivial. Of course metrics should answer the right questions. Using metrics for managing the development process provides you with facts instead of impressions. Unfortunately, it is fairly common not to use software-related metrics at all. So exactly how useful are metrics? That depends on your goals. This simple reasoning is the main advantage of using GQM: it gives a structured approach toward the right metrics instead of choosing metrics off the cuff.

From practice we know that choosing the right metrics in the right context is not trivial. If software metrics are used at all, they are often used without keeping the goal in mind, leading to negative side effects. With GQM you get the right clarity and methodology in order to avoid metric pitfalls, the four most common of which are as follows:1

Metric in a bubble

A metric is used without the proper context. You should track metrics over time or compare a specific metric to other cases. A metric is only valuable in comparison.

Treating the metric

The team optimizes the metrics but does not solve the problem that the metric is meant to signal. You should have a clear notion of the problem you address with a metric, and how the metric helps you in solving it.

One-track metric

A sole metric is used, which may lead to wrong conclusions. For example, measuring the total volume of a codebase, without measuring the degree of duplication, may give a false impression of progress. You should use several metrics that jointly address a goal (i.e., a problem that you want to solve).

Metrics galore

Too many metrics are used, while many of them do not add value. Typically this leads to a team ignoring the metrics, because they cannot interpret them meaningfully. You should limit the number of metrics you use, for instance, by not adding a metric that correlates too much with one you are already using.

In the next section, we explain how GQM works and how you can define meaningful metrics.

How to Apply the Best Practice

To help you understand applying the best practice, imagine that you are leading a team to build a new system. Let us assume this concerns a modern web application that is well suited to Agile development. As an example, consider a system providing video streaming services to users over the Web.

Now suppose that development started not too long ago. Before, developers in your organization were accustomed to classic waterfall development (developing based on detailed designs and specifications in long iterations). Now you have decided that for this development Agile is suitable. You suggest dividing work into two teams of seven people. Team members have different expertise and experience, yet all of them have at least half a year of development experience in the system’s main programming language. You do not intend to follow the Agile Manifesto to the letter, but want to adhere to basic Agile practices:

  • You do development work in fixed-time iterations (sprints).

  • You demonstrate functionality after each sprint to user representatives.

  • You work with a prioritized backlog of system requirements.

  • You assign a Product Owner role, as a representative of “the business” that represents the interests of the users.

  • Development work is peer reviewed before it is pushed to the main development line (the trunk).

The team feels that it takes an unusual amount of time to change functionality. Your initial diagnosis is that it may be due to high code complexity (many logical decision paths), so you are interested in tracking the complexity of the codebase.

Goal

Consider that with measurements you want to obtain meaningful values about relevant characteristics. To make sure that you are measuring the right things, a measurement should follow from a goal that it helps to achieve.

So the first step is to determine a specific goal. Every goal should have five characteristics. This helps you define a goal in a meaningful way. A goal should clearly describe these elements:

Object of examination

What are we looking at? In this case, that refers to the software that you are managing.

Purpose

Why are we examining the object?

Focus

What attribute or characteristic of the object are we interested in? Depending on your measurement purpose, a characteristic could be, for example, performance, maintainability, functional fit, or some other attribute.

Viewpoint

Who is the primary target? This is about perspective of the goal and may be you as an observer.

Environment

In which context does the object exist?

This can be written down by filling out a table or in the form of a sentence. By starting to fill out a table you can obtain a sentence that includes all aspects. Table 2-1 shows an example.

Table 2-1. Example for defining a goal for reducing code complexity
Topic Response

Object of examination

Total codebase

Purpose

To reduce

Focus

Complexity of units

Viewpoint

Development team

Environment

Streaming application development

In sentence form, the goal would be “to analyze the total codebase of the streaming application in order to reduce the complexity of units from the viewpoint of the development team.”

Filling out a table may appear to be a tedious task, but it is not that much work and helps to make problems explicit. Therefore, it aids in creating the right questions.

Question

The next step is to formulate questions related to your goal that help achieve that goal. There is no strict bound on the number or type of questions. As an example, consider the questions in Table 2-2.

Table 2-2. Questions in this GQM model
Question Description

Q1

How is the complexity of units in the codebase distributed?

Q2

Does complexity in units stay the same or change over time?

Q3

Is there a correlation between complexity of units and time to market?

With these questions in mind, we can start to think of the right metrics.

Metric

Consider that metrics can appear in different forms. A metric can be either objective (its value is not dependent on who measures it) or subjective (its value is dependent on who measures it and tends to involve an opinion or estimation). In addition, it might be direct (its value originates directly from observation) or derived (its value is calculated or deduced in some way).

Suppose that we use a uniform measure of the complexity of code units (e.g., McCabe complexity2). Then we can use a tool to go through the codebase and rank the units in terms of their complexity. That would be an objective, direct measurement that quickly answers Q1. Once we start using the metric over time we can confidently answer Q2. For instance, we can measure the number of units that become more complex relative to the total number of units.

To measure the correlation between unit complexity and time to market we need a measure of time to market. This could be the total time needed from receiving requirements, up to user acceptance of its implementation. That means that you need two observations: when receiving the requirement and after putting code into production. After every release, you could measure the correlation in some form (e.g., by calculating a correlation metric such as the Spearman coefficient of the two measurements3). Then you can observe whether there is indeed a relationship and whether that relationship changes over time.

Consider how important it is to make your assumptions explicit. The preceding correlation is a deduction of the metrics of unit complexity and time to market. It is slightly subjective, as it depends on how you define when functionality “hits the market.” When explicit, you can consistently interpret the measurements.

Of course, not all requirements are of equal functional and technical difficulty. Therefore you can choose to measure deviations of the estimations that the development team makes when planning implementation of requirements. That has the risk of essentially measuring estimation skills, not code complexity. The relationship between unit complexity and development team estimates could be included as a metric itself, but here we choose to ignore that.

Table 2-3 summarizes the metrics that complete the GQM model.

Table 2-3. Metrics in this GQM model
Metric # Metric description Corresponding question

M1

Unit complexity in the codebase

Answers Q1

M2

Complexity of units written for new functionality

Answers Q2 and Q3

M2a

Percentage of code with certain complexity (e.g., volume
percentage of units with McCabe complexity of 5)

Answers Q2

M3

Time from specification to acceptance of functionality

Answers Q3

The team lead can now decide upon what and how to measure. In this particular case of complexity in the codebase, the development team may decide to actively reduce that complexity (e.g., by splitting complex units into simpler ones). But if the complexity seems to be uncorrelated with the time to market, Q3 is answered and M3 becomes irrelevant. Such metrics also require careful consideration. For the purpose of testing and maintenance, average complexity in a codebase is much less important than “hotspots” of complexity that are hard to understand and test. For identifying hotspots you would need to define categories of what is, for example, “Good,” “OK,” “Hard to test,” and “Untestable.”

Goals may also change: if the system owner turns out to be satisfied with the time to market, there is no urgency to reduce complexity given this particular goal and the team may ask itself “why are we measuring again?”. Obviously, for the sake of maintainability, there are still good reasons to keep a codebase as simple as possible. In practice it is common to reason back from those metrics to determine whether they actually bring you closer to your goals.

Important

If a metric appears not to help in achieving your goal, remove it. If a specific GQM model becomes obsolete, do not use it until you need it again.

Make Assumptions about Your Metrics Explicit

Simplifying a metric itself is easier than trying to correct the metric for all possible objections. The most common objection is that the metric does not measure what it promises to measure, because of various circumstances. But that does not render the metric useless. You should make assumptions explicit about the context in which the metric is valid. And of course, circumstances may change, making assumptions invalid. With clear assumptions, this relationship is clear.

Let us discuss some common assumptions underlying software development metrics (these are not necessarily true, but they help you in making your own assumptions explicit):

Assumption: Measurements are comparable

For practical purposes, you will need to assume that measurements have comparable meanings. When comparing data we at least need to assume that it is administered completely (no details are left out), correctly (intended data is recorded), and consistently (data is administered in the same manner/format).

As an example, consider that you want to measure issue resolution time (issues here defined as including both defects and features). You can greatly simplify measurements when you assume that issues are of approximately the same work size. Consider what this means for the process of registering issues.

This assumption forms a good reason to administer issues in approximately equal sizes. To achieve this, issues should somehow reflect how much work they take to solve. This may be a simple classification such as low, medium, or high effort or something more detailed (e.g., hours, story points, etc.).4

Then we also need to assume that the classification is scaled such that scales reflect the same size. This is to avoid that “low” and “medium” effort are about the same amount of work, but “high” is tenfold that. In this example, the difference between classifications such as “low” and “medium” effort is equal to the difference between “medium” and “high” effort.

Assumption: Trends are more important than precise facts

A typical assumption is that a trend line that compares measurements over time is more meaningful than the precise numbers. The trend will reflect how things are changing, instead of precise facts. This implies that outliers should have a small impact.

Assumption: Averages are more important than outliers

Trends focus on average movements. Outliers are sometimes just that, outliers. So this assumption is not always appropriate. An outlier may signify a problem, for example, when it takes extraordinarily long to run certain tests or to solve bugs. However, to simplify your measurements, you may assume that some circumstances are not worth correcting for. Consider measuring issue resolution time. Using business hours between identification and resolution would be more accurate than the number of calendar days between them. However, you may choose to ignore the difference when you assume that the impact is small and that the distribution is consistent.

These assumptions are about the metrics and their underlying behavior. Let us move on to the usage of these metrics.

Find Explanations Instead of Judgments When Metrics Deviate from Expectations

When you find a metric that deviates from the average, resist the urge to judge it as good or bad too soon. It is more important to find explanations for why metrics deviate. That way you can put the metrics into practice for achieving their goal.

Consider the goal in the GQM reasoning that wants to understand how team productivity is influenced. Let us say that the issue resolution time had grown last month. Not necessarily deteriorated, but grown. Investigate this difference by asking yourself why this deviation may have occurred:

  • What are the most plausible explanations?

  • Are there reasons to believe that the assumptions are not valid anymore?

  • Is this period different from other periods? Are there special circumstances (e.g., holidays, inexperienced new team members)?

Using Norms with the GQM Approach

In many cases, metrics become useful and supportive only when there is a clear definition of good and bad values—in other words, there has to be a norm. You can add norms to the GQM models by defining them for each metric. In the example in this chapter, a norm on complexity could be that the McCabe unit complexity should be below 5, in 75% of the code (Figure 2-1). Once you agree upon a norm, you can visualize a metric as, for example, a distribution of different risk categories, to see what percentage of the units is too complex. Such a norm may also serve as the basis of a Definition of Done (DoD, see Chapter 3).

bmso 0201
Figure 2-1. Example of code volume distribution of unit complexity

A variation on this is one of shifting norms. Then you define an end goal, such as “the unit test coverage of our system should be 80%” and continuously move the norm to the latest and highest state. That would show whether you are “setting new records.”

Common Objections to GQM

The GQM approach is a powerful way to determine the right metrics for your problem. Often, measurement goals are phrased based on the software metrics that are already available in tooling. With GQM in mind, this is clearly the wrong order of things: metrics should support decision making, instead of metrics defining what is important. In practice this means that you may need to install different tooling for measuring purposes.

Important

Goals come before questions, questions come before metrics. Measure what you need to measure instead of measuring what you can measure.

Objection: Yes, Good Metric, but We Cannot Measure This

“That is a good metric, but it we will not be able to measure it correctly enough to rely on it.”

Even when the GQM approach is performed to define metrics, in our practice we often encounter objections against measuring in general. Typically the objections are not about the metrics themselves. They tend to take the form of “Yes, it is a useful metric, but we cannot use it because…”. In Table 2-4, common objections are discussed.

Table 2-4. Common objections to metrics and possible solutions
Objection Possible solution

Measurements may be interpreted in the wrong manner by management.

Help management to interpret them and simplify the measurements.

Metrics may act as a wrong incentive (leading to “treating the metric”).

Do not make the metric itself a KPI/metric for job evaluation.

Data is registered in an inconsistent/incomplete/incorrect manner.

Automate measurements, help your colleagues to register consistently by defining a step-by-step process.

Goal for measuring is unclear or outdated.

Revisit the GQM model.

This metric is not appropriate or specific for our goals.

Explain the goal and assumptions of the metric (using GQM). Explain whether alternatives (if any) are equally simple to gather and use. Possibly revisit the GQM model.

These objections may all be relevant in a particular situation, but having the metric and using it are separate things. It may be fine to have an inconsistent metric as long as you acknowledge that it is inconsistent and is for rough trend analysis only. A typical example is “those” team members that are lenient in registering their hours (late, incomplete, or undetailed). If you acknowledge that, you should either focus on administrative discipline to get the metrics done, or accept it as given and look more at “the bigger picture” instead of precise values.

In general, having a measurement is just another data point. In that sense we believe it is better than having no measurement at all and fully relying on gut feeling. Metrics may be used or ignored, as not all metrics are “born equal” in terms of quality.

We will use the GQM approach throughout the book. As such, you will see GQM return in all the subsequent chapters.

1 Originally published in the following article: Eric Bouwers, Joost Visser, Arie van Deursen, “Getting what you measure,” Communications of the ACM, Vol.55, No.7, p.54–59, 2012.

2 McCabe complexity is a standard measure for the complexity of code; it can be interpreted as the minimum number of test cases that you need to write for a unit and it is based on the number of decision statements in which code can take more than one direction.

3 As an example, the Spearman coefficient is a standard measure for correlation, but its mechanics are outside the scope of this book.

4 See Chapter 3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset