Chapter 9. Ensemble Learning and Dimensionality Reduction

In this chapter, we will cover the following recipes:

  • Recursively eliminating features
  • Applying principal component analysis for dimensionality reduction
  • Applying linear discriminant analysis for dimensionality reduction
  • Stacking and majority voting for multiple models
  • Learning with random forests
  • Fitting noisy data with the RANSAC algorithm
  • Bagging to improve results
  • Boosting for better learning
  • Nesting cross-validation
  • Reusing models with joblib
  • Hierarchically clustering data
  • Taking a Theano tour

Introduction

In the 1983 War Games movie, a computer made life and death decisions that could have resulted in World War III. As far as I know, technology wasn't able to pull off such feats at the time. However, in 1997, the Deep Blue supercomputer did manage to beat a world chess champion. In 2005, a Stanford self-driving car drove by itself for more than 130 kilometers in a desert. In 2007, the car of another team drove through regular traffic for more than 50 kilometers. In 2011, the Watson computer won a quiz against human opponents. If we assume that computer hardware is the limiting factor, then we can try to extrapolate into the future. Ray Kurzweil did just that, and according to him, we can expect human-level intelligence around 2029.

In this chapter, we will focus on the simpler problem of forecasting weather for the next day. We will assume that the weather today depends on yesterday's weather. Theoretically, if a butterfly flaps its wings at one location, this could trigger a chain of events causing a snow storm in a place thousands kilometers further away (the butterfly effect). This is not impossible, but very improbable. However, if we have many such incidents, a similar scenario will occur more often than you would suspect.

It is impossible to take into account all possible factors. In fact, we will try to make our life easier by ignoring some of the data we have available. We will apply classification and regression algorithms, as well as hierarchical clustering. Let's defer results evaluation to Chapter 10, Evaluating Classifiers, Regressors, and Clusters. If you are curious about the confusion matrix mentioned in the classification recipes, please jump to the Getting classification straight with the confusion matrix recipe.

Most artificial intelligence systems are nowadays, in fact, not so smart. A judge in a court of law could make wrong decisions because he or she is biased or having a bad day. A group of multiple judges should perform better. This is comparable to a machine learning project, in which we worry about overfitting and underfitting. Ensemble learning is a solution to this conundrum, and it basically means combining multiple learners in a clever way.

A major part of this chapter is about hyperparameter optimization—these are parameters of classifiers and regressors. To check for overfitting or underfitting, we can use learning curves, which show training and test scores for varying training set sizes. We can also vary the value of a single hyperparameter with validation curves.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset