The package that we will use is neuralnet
. The function in neuralnet
will call for the use of a formula as we used elsewhere, such as y~x1+x2+x3+x4, data = df. In the past, we used y~. to specify all the other variables in the data as inputs. However, neuralnet
does not accommodate this at the time of writing this. The way around this limitation is to use the as.formula()
function. After first creating an object of the variable names, we will use this as an input in order to paste the variables properly on the right side of the equation:
> n = names(shuttleTrain) > n [1] "stability.xstab" "error.MM" "error.SS" [4] "error.XL" "sign.pp" "wind.tail" [7] "magn.Medium" "magn.Out" "magn.Strong" [10] "vis.yes" "use" > form <- as.formula(paste("use ~", paste(n[!n %in% "use"], collapse = " + "))) > form use ~ stability.xstab + error.MM + error.SS + error.XL + sign.pp + wind.tail + magn.Medium + magn.Out + magn.Strong + vis.yes
Keep this function in mind for your own use as it may come in quite handy. In the neuralnet
package, the function that we will use is appropriately named neuralnet()
. Other than the formula, there are four other critical arguments that we will need to examine:
hidden
: This is the number of hidden neurons in each layer, which can be up to three layers; the default is 1act.fct
: This is the activation function with the default logistic and tanh
availableerr.fct
: This is the function used to calculate the error with the default "sse"; as we are dealing with binary outcomes, we will use "ce" for cross-entropylinear.output
: This is a logical argument on whether or not to ignore act.fct
with the default TRUE, so for our data, this will need to be FALSEYou can also specify the algorithm. The default is resilient with backpropagation and we will use it along with the default of one hidden neuron:
> fit = neuralnet(form, data=shuttleTrain, err.fct="ce", linear.output=FALSE)
Here are the overall results:
> fit$result.matrix 1 error 0.008356573154 reached.threshold 0.008074195519 steps 689.000000000000 Intercept.to.1layhid1 -4.285590085839 stability.xstab.to.1layhid1 1.929309791976 error.MM.to.1layhid1 -1.724487179503 error.SS.to.1layhid1 -2.634918418999 error.XL.to.1layhid1 -0.438426863298 sign.pp.to.1layhid1 -0.857597732824 wind.tail.to.1layhid1 0.095087222283 magn.Medium.to.1layhid1 -0.016988896781 magn.Out.to.1layhid1 1.861116820668 magn.Strong.to.1layhid1 0.121644784735 vis.yes.to.1layhid1 6.272628328980 Intercept.to.use 32.409085601639 1layhid.1.to.use -67.336820525475
We can see that the error is extremely low at 0.0081. The number of steps required for the algorithm to reach the threshold, which is when the absolute partial derivatives of the error function become smaller than this threshold (default = 0.1). The highest weight of the first neuron is vis.yes.to.1layhid1
at 6.27.
You can also look at what are known as generalized weights. According to the neuralnet
package authors, the generalized weight is defined as the contribution of the ith covariate to the log-odds:
The generalized weight expresses the effect of each covariate xi and thus has an analogous interpretation as the ith regression parameter in regression models. However, the generalized weight depends on all other covariates. (Gunther and Fritsch, 2010). The weights can be called and examined. I've abbreviated the output to the first four variables and six observations only. Note that if you sum each row, you will get the same number, which means that the weights are equal for each covariate combination:
> head(fit$generalized.weights[[1]]) [,1] [,2] [,3] [,4] 1 -4.826708143 4.314287082 6.591985508 1.096847443 2 -4.751585064 4.247139344 6.489387581 1.079776065 6 -5.884240759 5.259548151 8.036290710 1.337166913 9 -11.346058891 10.141519613 15.495665693 2.578340208 10 -11.104906734 9.925969054 15.166316688 2.523539479 11 -10.952642060 9.789869358 14.958364085 2.488938025
To visualize the neural network, simply use the plot()
function:
> plot(fit)
The following is the output of the preceding command:
This plot shows the weights of the variables and intercepts. You can also examine the generalized weights in a plot. Let's look at vis.yes
versus wind.tail
, which has a low overall synaptic weight. Notice how vis.yes
is skewed and wind.tail
has an even distribution of weights, implying little predictive power:
> par(mfrow=c(1,2)) > gwplot(fit, selected.covariate = "vis.yes") > gwplot(fit, selected.covariate = "wind.tail")
We now want to see how well the model performs. This is done with the compute()
function and specifying the fit model and covariates. This syntax will be the same for the predictions on the test
and train
sets. Once computed, a list of the predictions is created with $net.result
:
> res = compute(fit, shuttleTrain[,1:10]) > predTrain = res$net.result
These results are in probabilities, so let's turn them into 0
or 1
and follow this up with a confusion matrix:
> predTrain = ifelse(predTrain>=0.5,1,0) > table(predTrain, shuttleTrain$use) predTrain 0 1 0 81 0 1 0 99
Lo and behold, the neural network model has achieved 100 percent accuracy. We will now hold our breath and see how it does on the test
set:
> res2 = compute(fit, shuttleTest[,1:10]) > predTest = res2$net.result > predTest = ifelse(predTest>=0.5,1,0) > table(predTest, shuttleTest$use) predTest 0 1 0 29 0 1 1 46
Only one false positive in the test
set. If you wanted to identify which one this was, use the which()
function to single it out, as follows:
> which(predTest==1 & shuttleTest$use==0) [1] 62 > shuttleTest[62,] stability.xstab error.MM error.SS error.XL sign.pp 203 0 1 0 0 1 wind.tail magn.Medium magn.Out magn.Strong vis.yes 203 0 0 0 1 1 use 203 0
It is row 62
in the test
set and observation 203
in the full dataset. Can we improve on this result and achieve 100 percent accuracy in the test
set? To do this, we will increase the complexity by specifying three neurons in the first layer and two neurons in the second layer:
> fit2 = neuralnet(form, data=shuttleTrain, hidden=c(3,2), err.fct="ce", linear.output=FALSE)
The plot results now get quite busy and extremely hard to interpret:
> plot(fit2)
It's now time to see how this performs on both the datasets:
> res = compute(fit2, shuttleTrain[,1:10]) > predTrain = res$net.result > predTrain = ifelse(predTrain>=0.5,1,0) > table(predTrain, shuttleTrain$use) predTrain 0 1 0 81 0 1 0 99
Perform this on the train
data as well:
> res2 = compute(fit2, shuttleTest[,1:10]) > predTest = res2$net.result > predTest = ifelse(predTest>=0.5,1,0) > table(predTest, shuttleTest$use) predTest 0 1 0 29 1 1 1 45
However, we've now added a false negative to the mix. We added complexity but increased the out-of-sample variance, as follows:
> which(predTest==1 & shuttleTest$use==0) [1] 62 > which(predTest==0 & shuttleTest$use==1) [1] 63
The false positive observation has not changed from the prior fit and row 63
marks the false negative. In this case, adding complexity did not improve the test
set's performance; in fact, it is less generalized than the simpler model.