Chapter 19

Sharing Models

Challenges and Methods

Abstract

This part of Data Science for Software Engineering: Sharing Data and Models explores ensemble learners and multiobjective optimizers as applied to software engineering. Novel incremental ensemble learners are explained along with one of the largest ensemble learning (in effort estimation) experiments yet attempted. It turns out that the specific goals of the learning has an effect on what is learned and, for this reason, this part also explores multigoal reasoning. We show that multigoal optimizers can significantly improve effort estimation results.

So far, we have been concerned with sharing data from one project to another. Now we turn to a more complex issue—how to share models from one project to another.

The key concepts in this part of the book are ensembles and multiobjective optimization. Ensembles are committees of artificially generated experts where each expert is trained on slightly different sections of the data.

 Chapter 20 offers an introduction to ensemble-based learning and presents experiments with ensemble learning and effort estimation.

 Chapter 21 extends ensembles with online learning. Specifically, it discusses how to incrementally modify a model as new data arrives. It proposed a novel dynamic method whereby a “toolbox” of different models is built by adapting multiple models. When new data arrive, this method asks each item in the toolbox how much it is the right tool for the job of making that next prediction. The resulting prediction is the weighted sum of each model's prediction times its confidence in that prediction.

 Chapter 22 offers a very large experiment in ensemble-based learning. After studying 90 different learners, it proposes a selection rule for adding learners to an ensemble. The resulting ensemble is shown to perform much better than any of the 90 individual learners.

After that, this part of the book turns to multiobjective optimization. The motto of multiobjective optimization is know your goals. In terms of sharing models, the lesson of this kind of optimization is that model sharing between projects works best when those projects understand and share their goals:

 Chapter 23 shows that the same data can generate radically different models, depending on the goals of the learning. This means that unless two projects share the same goals, then it is pointless trying to share models between them.

 Given that models are goal-dependent, it is important to reason explicitly about those goals. Chapter 24 shows that multigoal optimizers can significantly improve effort estimation results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset