Neural networks

Neural networks, inspired by the way biological brains are connected, consist of many neurons, or computational modules, organized in layers. Data is provided at the input layer and predictions are produced at the output layer. All intermediate layers are called hidden layers. Neurons that belong to the same layer are not connected to each other, only to neurons that belong in other layers. Each neuron can have multiple inputs, where each input is multiplied by a specific weight and the sum of multiplied inputs is passed to an activation function that defines the neuron's output. Common activation functions include the following:

Sigmoid Tanh ReLU Linear
                               

 

The network's goal is to optimize each neuron's weights, such that the cost function is minimized. Neural networks can be either used for regression, where the output layer consists of a single neuron, or classification, where it consists of many neurons, usually equal to the number of classes. There are a number of optimizing algorithms or optimizers available for neural networks. The most common is stochastic gradient descent or SGD. The main idea is that the weights are updated based on the direction and magnitude (first derivative) of the error's gradient, multiplied by a factor called the learning rate.

Variations and extensions have been proposed that take into account the second derivative, adapt the learning rate, or use the momentum of previous weight changes to update the weights.

Although the concept of neural networks has existed for a long time, recently their popularity has greatly increased with the advent of deep learning. Modern architectures consist of convolutional layers, where each layer's weights consist of matrices, and the output is calculated by sliding the weight matrix onto the input. Another type of layers, max pooling layers, calculates the output as the maximum input element again by sliding a fixed-size window onto the input. Recurrent layers retain information about their previous
states. Finally, fully connected layers are traditional neurons, as described previously.

Scikit-learn implements traditional neural networks, under the sklearn.neural_network package. Once again, using the preceding examples, we'll try to model the diabetes and breast cancer datasets. On the diabetes dataset, we'll use MLPRegressor with Stochastic Gradient Descent (SGD) as the optimizer, with mlpr = MLPRegressor(solver='sgd'). Without any further fine-tuning, we achieve an R2 of 0.64 and an MSE of 1977. On the breast cancer dataset, using the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) optimizer, with mlpc = MLPClassifier(solver='lbfgs'), we get a classification accuracy of 93% and a competent confusion matrix. The following table shows the neural network confusion matrix for the breast cancer dataset:

n = 169

Predicted: Malignant

Predicted: Benign

Target: Malignant

35

4

Target: Benign

8

122

A very important note on neural networks: the initial weights of a network are randomly initialized. Thus, the same code can perform differently if it is executed several times. In order to ensure non-random (non-stochastic) execution, the initial random state of the network must be fixed. The two scikit-learn classes implement this feature through the random_state parameter in the object constructor. In order to set the random state to a specific seed value, the constructor must be called as follows: mlpc = MLPClassifier(solver='lbfgs', random_state=12418).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset