Understanding the decision rules

To build the perfect tree, you would want to split the tree at the most informative feature, resulting in the purest daughter nodes. However, this simple idea leads to some practical challenges:

  • It's not actually clear what most informative means. We need a concrete value, a score function, or a mathematical equation that can describe how informative a feature is.
  • To find the best split, we have to search over all of the possibilities at every decision node.

Fortunately, the decision tree algorithm actually does these two steps for you. Two of the most commonly used criteria that scikit-learn supports are the following:

  • criterion='gini': The Gini impurity is a measure of misclassification, with the aim of minimizing the probability of misclassification. A perfect split of the data, where each subgroup contains data points of a single target label, would result in a Gini index of 0. We can measure the Gini index of every possible split of the tree and then choose the one that yields the lowest Gini impurity. It is commonly used in classification and regression trees.
  • criterion='entropy' (also known as information gain): In information theory, entropy is a measure of the amount of uncertainty associated with a signal or distribution. A perfect split of the data would have 0 entropy. We can measure the entropy of every possible split of the tree and then choose the one that yields the lowest entropy.

In scikit-learn, you can specify the split criterion in the constructor of the decision tree call. For example, if you want to use entropy, you would type the following:

In [29]: dtc_entropy = tree.DecisionTreeClassifier(criterion='entropy')

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset