How it works...

The algorithm of gradient boosting involves, first, the process computing the deviation of residuals for each partition, and then, determining the best data partitioning in each stage. Next, the successive model will fit the residuals from the previous stage and build a new model to reduce the residual variance (an error). The reduction of the residual variance follows the functional gradient descent technique, in which it minimizes the residual variance by going down its derivative, as show here:

Gradient descent method

In this recipe, we use the gradient boosting method from gbm to classify the telecom churn dataset. To begin the classification, we first install and load the gbm package. Then, we use the gbm function to train the classification model. Here, as our prediction target is the churn attribute, which is a binary outcome, we set the distribution as bernoulli in the distribution argument. Also, we set the 1,000 trees to fit in the n.tree argument, the maximum depth of the variable interaction to 7 in interaction.depth, the learning rate of the step size reduction to 0.01 in shrinkage, and the number of cross-validations to 3 in cv.folds. After the model is fitted, we can use the summary function to obtain the relative influence information of each variable in the table and figure. The relative influence shows the reduction attributable to each variable in the sum of the square error. Here, we can find that total_day_minutes is the most influential one in reducing the loss function.

Next, we use the gbm.perf function to find the optimum iteration. Here, we estimate the optimum number with cross-validation by specifying the method argument to cv. The function further generates two plots, where the black line plots the training error and the green one plots the validation error. The error measurement here is a bernoulli distribution, which we defined earlier in the training stage. The blue dash line on the plot shows where the optimum iteration is.

Then, we use the predict function to obtain the odd value of a log in each testing case returned from the Bernoulli loss function. In order to get the best prediction result, one can set the n.trees argument to an optimum iteration number. However, as the returned value is an odd value log, we still have to determine the best cut off to determine the label. Therefore, we use the roc function to generate an ROC curve and get the cut off with the maximum accuracy.

Finally, we can use the function coords to retrieve the best cut off threshold and use the ifelse function to determine the class label from the odd value of the log. Now, we can use the table function to generate the classification table and see how accurate the classification model is.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...