Evidence-based software engineering

T. Dybå*; G.R. Bergersen; D.I.K. Sjøberg*,    * SINTEF ICT, Trondheim, Norway
University of Oslo, Norway

Abstract

A decade ago, Kitchenham, Dybå and Jørgensen coined the term and provided the foundations for evidence-based software engineering (EBSE). A trilogy of papers was written for researchers, practitioners, and educators. They suggested that practitioners consider EBSE as a mechanism to support and improve their technology adoption decisions, and that researchers should use systematic literature reviews as a methodology for performing unbiased aggregation of empirical results. This spurred significant international activity, and a renewed focus on research methods and theory, and on the future of empirical methods in SE research.

Keywords

Evidence-based software engineering (EBSE); Systematic literature reviews (SLRs); Contextualizing; GRADE; Reversing effect theory

Introduction

I believe in evidence. I believe in observation, measurement, and reasoning, confirmed by independent observers. I'll believe anything, no matter how wild and ridiculous, if there is evidence for it. The wilder and more ridiculous something is, however, the firmer and more solid the evidence will have to be.

Isaac Asimov

A decade ago, Kitchenham, Dybå and Jørgensen coined the term and provided the foundations for evidence-based software engineering (EBSE). A trilogy of papers was written for researchers [1], practitioners [2], and educators [3]. They suggested that practitioners consider EBSE as a mechanism to support and improve their technology adoption decisions, and that researchers should use systematic literature reviews as a methodology for performing unbiased aggregation of empirical results. This spurred significant international activity, and a renewed focus on research methods and theory, and on the future of empirical methods in SE research [4].

The Aim and Methodology of EBSE

EBSE aims to improve decision making related to software development and maintenance by integrating the best current evidence from research with practical experience and human values.

Based on the stages in evidence-based medicine, EBSE involves five steps [2]:

1. Ask an answerable question. The main challenge in this step is to convert a practical problem into a question that’s specific enough to be answered, but not so specific that you don’t get any answers. Partitioning the question into the main intervention or action you’re interested in, the context or specific situation of interest, and the main outcomes or effect of interest, makes it easier not only to go from general problem descriptions to specific questions, but also to think about what kind of information you need to answer the question.

2. Find the best evidence. Finding an answer to your question includes selecting an appropriate information resource and executing a search strategy. There are several information sources you can use. You can, for example, get viewpoints from your customers or the software’s users, colleagues or an expert, use what you’ve learned as a student or in professional courses, use your own experience, or search for research-based evidence.

3. Critically appraise the evidence. Unfortunately, published research isn’t always of good quality; the problem under study might be unrelated to practice, or the research method might have weaknesses such that you can’t trust the results. To assess whether research is of good quality and is applicable to practice, you must be able to critically appraise the evidence.

4. Apply the evidence. To employ the evidence in your decision-making, you integrate it with your practical experience, your customers’ requirements, and your knowledge of the concrete situation’s specific circumstances, and then you apply it in practice. However, this procedure isn’t straightforward and, among other things, depends on the type of technology you’re evaluating.

5. Evaluate performance. The final step is to consider how well you perform each step of EBSE and how you might improve your use of it. In particular, you should ask yourself how well you’re integrating evidence with practical experience, customer requirements, and your knowledge of the specific circumstances. You must also assess whether the change has been effective.

It’s important to note that recommendations on evidence-based medicine tend to be context independent and implicitly universal, while software engineering prescriptions are contingent and sensitive to variation in the organizational context.

Contextualizing Evidence

What works for whom, where, when, and why is the ultimate question of EBSE [5]. Still, the empirical research seems mostly concerned with identifying universal relationships that are independent of how work settings and other contexts interact with the processes important to software practice. Questions of “What is best?” seem to prevail. For example, “Which is better: pair or solo programming? Test-first or test-last?”

However, just as the question of whether a helicopter is better than a bicycle is meaningless, so are these questions, because the answers depend on the settings and goals of the projects studied. Practice settings are rarely, if ever, the same. For example, the environments of software organizations differ, as do their sizes, customer types, countries or geography, and history. All these factors influence engineering practices in unique ways. Additionally, the human factors underlying the organizational culture differ from one organization to the next, and also influence the way software is developed.

We know these issues and the ways they interrelate are important for the successful uptake of research into practice. However, the nature of these relationships is poorly understood. Consequently, we can’t a priori assume that the results of a particular study apply outside the specific context in which it was run. Taking something out of context leads to misunderstanding; for instance, “I am attached to you” has very different meanings for a person in love and a handcuffed prisoner.

Strength of Evidence

Several systems exist for making judgments about the strength of evidence of a body of knowledge. Most of these systems suggest that the strength of evidence can be based on a hierarchy with systematic reviews and randomized experiments at the top and evidence from observational studies and expert opinion at the bottom. The inherent weakness such evidence hierarchies is that randomized experiments are not always feasible and that, in some instances, observational studies may provide better evidence.

To cope with these weaknesses of evidence hierarchies, the GRADE working group ([6], see Table 1) grades the overall strength of evidence of systematic reviews as high, moderate, low, or very low.

Table 1

Definitions used for grading the strength of evidence [6]

HighFurther research is very unlikely to change our confidence ibvn the estimate of effect
ModerateFurther research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate
LowFurther research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate
Very lowAny estimate of effect is very uncertain

According to GRADE, the strength of evidence can be determined on the basis of the combination of four key elements: study design (there were only two randomized trials in the review), study quality (how well methods were described, how issues of bias, validity, and reliability were addressed, and how methods of data collection and analysis were explained), consistency (the similarity of estimates of effect across studies), and directness (the extent to which the people, interventions, and outcome measures are similar to those of interest). Combining these four components, Dybå and Dingsøyr [7] found that the strength of the evidence regarding the benefits and limitations of agile methods, and for decisions related to their adoption, was very low. Hence, they concluded that any estimate of effect that is based on evidence of agile software development from research is very uncertain. However, their systematic review was published in 2008, and many more empirical studies of agile software development have been published since then.

Evidence and Theory

Consider, for example, the “reversing effect theory” (Cf. Sjøberg et al. this book). Why would this theory be worthy of being believed by contemporary software engineers? A plausible answer is that the evidence possessed by contemporary software engineers strongly suggests that it is true. Paradigmatically, those theories that are worthy of being believed enjoy such status in virtue of the availability of evidence sufficient to justify belief in their truth.

Evidence that supports or tells in favor of a given theory supports that theory. On the other hand, evidence that tells against a theory weakens that theory. Of course, a given piece of evidence might confirm or disconfirm a theory to a greater or lesser degree. “Verification” signifies the maximal degree of confirmation: evidence may support a theory in the sense it conclusively establishes that the theory in question is true (even though one can never be absolutely certain). At the opposite end of the spectrum, falsification signifies the maximal level of disconfirmation: evidence falsifies a theory just in case it conclusively establishes that the theory in question is false [8].

In considering questions about how a given body of evidence bears on a theory, it is crucial to distinguish between the balance of the evidence and its weight. Intuitively, the balance of the evidence concerns how decisively the evidence tells for or against the theory. On the other hand, the weight of the evidence is a matter of how substantial the evidence is. However, for any body of evidence, there will always be alternative theories that constitute equally good explanations of that evidence [8].

References

[1] Kitchenham B.A., Dybå T., Jørgensen M. Evidence-based software engineering. In: Proc. 26th international conference on software engineering (ICSE 2004), Edinburgh, Scotland, 23–28 May; 2004:273–281.

[2] Dybå T., Kitchenham B.A., Jørgensen M. Evidence-based software engineering for practitioners. IEEE Softw. 2005;22(1):58–65.

[3] Jørgensen M., Dybå T., Kitchenham B.A. Teaching evidence-based software engineering to university students. In: Proc. 11th international software metrics symposium (Metrics 2005), Como, Italy, 19–22 September 2005; 2005.

[4] Sjøberg D.I.K., Dybå T., Jørgensen M. The future of empirical methods in software engineering research. In: Briand L., Wolf A., eds. 29th international conference on software engineering (ICSE'07), Minneapolis, Minnesota, USA, 20–26 May. Future of software engineering. Los Alamitos, CA: IEEE Computer Society Press; 2007:358–378.

[5] Dybå T. Contextualizing empirical evidence. IEEE Softw. 2013;30(1):81–83.

[6] Grades of Recommendation, Assessment, Development, and Evaluation. (GRADE) Working Group. Grading Quality of Evidence and Strength of Recommendations. BMJ. 2004;328(7454):1490–1494.

[7] Dybå T., Dingsøyr T. Empirical studies of agile software development: a systematic review. Inf Softw Technol. 2008;50(9–10):833–859.

[8] Kelly T. Evidence: fundamental concepts and the phenomenal conception. Philos Compass. 2008;3(5):933–955.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset