Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Logistic units

As a starting point, we use the idea of a logistic unit over the simplified model of a neuron. It consists of a set of inputs and outputs and an activation function. This activation function is essentially performing a calculation on the set of inputs, and subsequently giving an output. Here, we set the activation function to the sigmoid that we used for logistic regression in the previous chapter:

We have Two input units, x₁ and x₂ and a bias unit, x₀, that is set to one. These are fed into a hypothesis function that uses the sigmoid logistic function and a weight vector, w, which parameterizes the hypothesis function. The feature vector, consisting of binary values, and the parameter vector for the preceding example consist of the following:

To see how we can get this to perform logical functions, let's give the model some weights. We can write this as a function of the sigmoid, g, and our weights. To get started, we are just going to choose some weights. We will learn shortly how to train the model to learn its own weights. Let's say that we set out weight such that we have the following hypothesis function:

We feed our model some simple labeled data and construct a truth table:

Although this data appears relatively simple, the decision boundary that is needed to separate the classes is not. Our target variable, y, forms the logical XNOR with the input variables. The output is 1 only when both x₁ and x₂ are either 0 or 1.

Here, our hypothesis has given us a logical AND. That is, it returns a 1 when both x₁ and x₂ are 1. By setting the weights to other values, we can get our single artificial neuron to form other logical functions.

This gives us the logical OR function:

To perform an XNOR, we combine the AND, OR, and NOT functions. To perform negation, that is, a logical NOT, we simply choose large negative weights for the input variable that we want to negate.

Logistics units are connected together to form artificial neural networks. These networks consist of an input layer, one or more hidden layers, and an output layer. Each unit has an activation function, here the sigmoid, and is parameterized by the weight matrix W:

We can write out the activation functions for each of the units in the hidden layer:

The activation function for the output layer is as follows:

More generally, we can say a function mapping from a given layer, j, to the layer j+1 is determined by the parameter matrix, w_j. The super script j represents the j^th layer, and the subscript, i, denotes the unit in that layer. We denote the parameter or weight matrix, W^(j), which governs the mapping from the layer j to the layer j + 1. We denote the individual weights in the subscript of their matrix index.

Note that the dimensions of the parameter matrix for each layer will be the number of units in the next layer multiplied by the number of units in the current layer plus 1; this is for x₀, which is the bias layer. More formally, we can write the dimension of the parameter matrix for a given layer, j, as follows:

The subscript (j + 1) refers to the number of units in the next input layer and the forward layer, and the d_j + 1 refers to the number of units in the current layer plus 1.

Let's now look at how we can calculate these activation functions using a vector implementation. We can write these functions more compactly by defining a new term, Z, which consists of the weighted linear combination of the input values for each unit on a given layer. Here is an example:

We are just replacing everything in the inner term of our activation function with a single function, Z. Here, the super script (2) represents the layer number, and the subscript 1 indicates the unit in that layer. So, more generally, the matrix that defines the activation function for the layer j is as follows:

So, in our three layer example, our output layer can be defined as follows:

We can learn features by first looking at just the three units on the single hidden layer and how it maps its input to the input of the single unit on the output layer. We can see that it is only performing logistic regression using the set of features (a²). The difference is that now the input features of the hidden layer have themselves been computed using the weights learned from the raw features at the input layer. Through hidden layers, we can start to fit more complicated nonlinear functions.

We can solve our XNOR problem using the following neural net architecture:

Here, we have three units on the input layer, two units plus the bias unit on the single hidden layer, and one unit on the output layer. We can set the weights for the first unit in the hidden layer (not including the bias unit) to perform the logical function x₁ AND x₂. The weights for the second unit perform the functions (NOT x₁) AND (NOT x₂). Finally, our output layer performs the OR function.

We can write our activation functions as follows:

The truth table for this network looks like this:

To perform multiclass classification with neural networks, we use architectures with an output unit for each class that we are trying to classify. The network outputs a vector of binary numbers with 1 indicating that the class is present. This output variable is an i dimensional vector, where i is the number of output classes. The output space for four features, for example, would look like this:

Our goal is to define a hypothesis function to approximately equal one of these four vectors:

This is essentially a one versus all representation.

We can describe a neural network architecture by the number of layers, L, and by the number of units in each layer by a number, s_i, where the subscript indicates the layer number. For convenience, I am going to define a variable, t, indicating the number of units on the layer l + 1, where l + 1 is the forward layer, that is, the layer to the right-hand side of the diagram.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Logistic units

Create new playlist

Sign In

Sign Up

Logistic units

Table of Contents for
Logistic units