Logistic units

As a starting point, we use the idea of a logistic unit over the simplified model of a neuron. It consists of a set of inputs and outputs and an activation function. This activation function is essentially performing a calculation on the set of inputs, and subsequently giving an output. Here, we set the activation function to the sigmoid that we used for logistic regression in the previous chapter:

Logistic units

We have Two input units, x1 and x2 and a bias unit, x0, that is set to one. These are fed into a hypothesis function that uses the sigmoid logistic function and a weight vector, w, which parameterizes the hypothesis function. The feature vector, consisting of binary values, and the parameter vector for the preceding example consist of the following:

Logistic units

To see how we can get this to perform logical functions, let's give the model some weights. We can write this as a function of the sigmoid, g, and our weights. To get started, we are just going to choose some weights. We will learn shortly how to train the model to learn its own weights. Let's say that we set out weight such that we have the following hypothesis function:

Logistic units

We feed our model some simple labeled data and construct a truth table:

Logistic units

Although this data appears relatively simple, the decision boundary that is needed to separate the classes is not. Our target variable, y, forms the logical XNOR with the input variables. The output is 1 only when both x1 and x2 are either 0 or 1.

Here, our hypothesis has given us a logical AND. That is, it returns a 1 when both x1 and x2 are 1. By setting the weights to other values, we can get our single artificial neuron to form other logical functions.

This gives us the logical OR function:

Logistic units

To perform an XNOR, we combine the AND, OR, and NOT functions. To perform negation, that is, a logical NOT, we simply choose large negative weights for the input variable that we want to negate.

Logistics units are connected together to form artificial neural networks. These networks consist of an input layer, one or more hidden layers, and an output layer. Each unit has an activation function, here the sigmoid, and is parameterized by the weight matrix W:

Logistic units

We can write out the activation functions for each of the units in the hidden layer:

Logistic units

The activation function for the output layer is as follows:

Logistic units

More generally, we can say a function mapping from a given layer, j, to the layer j+1 is determined by the parameter matrix, wj. The super script j represents the jth layer, and the subscript, i, denotes the unit in that layer. We denote the parameter or weight matrix, W(j), which governs the mapping from the layer j to the layer j + 1. We denote the individual weights in the subscript of their matrix index.

Note that the dimensions of the parameter matrix for each layer will be the number of units in the next layer multiplied by the number of units in the current layer plus 1; this is for x0, which is the bias layer. More formally, we can write the dimension of the parameter matrix for a given layer, j, as follows:

Logistic units

The subscript (j + 1) refers to the number of units in the next input layer and the forward layer, and the dj + 1 refers to the number of units in the current layer plus 1.

Let's now look at how we can calculate these activation functions using a vector implementation. We can write these functions more compactly by defining a new term, Z, which consists of the weighted linear combination of the input values for each unit on a given layer. Here is an example:

Logistic units

We are just replacing everything in the inner term of our activation function with a single function, Z. Here, the super script (2) represents the layer number, and the subscript 1 indicates the unit in that layer. So, more generally, the matrix that defines the activation function for the layer j is as follows:

Logistic units

So, in our three layer example, our output layer can be defined as follows:

Logistic units

We can learn features by first looking at just the three units on the single hidden layer and how it maps its input to the input of the single unit on the output layer. We can see that it is only performing logistic regression using the set of features (a2). The difference is that now the input features of the hidden layer have themselves been computed using the weights learned from the raw features at the input layer. Through hidden layers, we can start to fit more complicated nonlinear functions.

We can solve our XNOR problem using the following neural net architecture:

Logistic units

Here, we have three units on the input layer, two units plus the bias unit on the single hidden layer, and one unit on the output layer. We can set the weights for the first unit in the hidden layer (not including the bias unit) to perform the logical function x1 AND x2. The weights for the second unit perform the functions (NOT x1) AND (NOT x2). Finally, our output layer performs the OR function.

We can write our activation functions as follows:

Logistic units

The truth table for this network looks like this:

Logistic units

To perform multiclass classification with neural networks, we use architectures with an output unit for each class that we are trying to classify. The network outputs a vector of binary numbers with 1 indicating that the class is present. This output variable is an i dimensional vector, where i is the number of output classes. The output space for four features, for example, would look like this:

Logistic units

Our goal is to define a hypothesis function to approximately equal one of these four vectors:

Logistic units

This is essentially a one versus all representation.

We can describe a neural network architecture by the number of layers, L, and by the number of units in each layer by a number, si, where the subscript indicates the layer number. For convenience, I am going to define a variable, t, indicating the number of units on the layer l + 1, where l + 1 is the forward layer, that is, the layer to the right-hand side of the diagram.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset