Perceptron as an R6 class

The perceptron is the simplest neural network. It consists of an input and an output (no hidden layers), and the activation function is just the Heaviside function (step function at the origin), when the bias term is included.

This is the skeleton of the class:

library(R6)
Perceptron <- R6Class("Perceptron",
public = list(
threshold = NULL,
dim = NULL,
n_iter = NULL,
learning_rate = NULL,
w = NULL,
initialize = function(threshold = 0, learning_rate = 0.25, n_iter=100, dim=2){
self$n_iter <- n_iter
self$threshold <- threshold
self$learning_rate <- learning_rate
}
, forward = function(x){
}
, backward = function(t,y,x){
}

}
, train = function(X,t){
}
}
}
}
, predict = function(X){
X <- cbind(-1,X) #add bias
preds <- c()
for(i in 1:nrow(X)){
preds[i] <- self$forward(X[i,])
}
return(preds)
}
)
)

Although R6 supports private methods, we do not really need them at this point; we can live with public methods, which are specified previously. Note that you do need to initialize with NULL all the objects you plan to use; failing to do so will result in errors.

Now we can implement the perceptron, filling in the blanks in the preceding function:

library(R6)
Perceptron <- R6Class("Perceptron",
public = list(
threshold = NULL,
dim = NULL,
n_iter = NULL,
learning_rate = NULL,
w = NULL,
initialize = function(threshold = 0,
learning_rate = 0.25,
n_iter=100, dim=2)
{
self$n_iter <- n_iter
self$threshold <- threshold
self$learning_rate <- learning_rate
self$dim <- dim
self$w <- matrix(runif(self$dim+1), ncol = self$dim+1)
}
, forward = function(x){
dot_product <- sum(x*self$w)
y <- ifelse(dot_product>self$threshold,1,0)
return(y)
}
, backward = function(t,y,x){
for(j in 1:ncol(x)){
self$w[j] <- self$w[j]+self$learning_rate*(t-y)*x[j]
}
}
, train = function(X,t){
X <- cbind(-1,X) #add bias term
n_examples <- nrow(X)

for(iter in 1:self$n_iter){
for(i in 1:nrow(X)){
y_i <- self$forward(X[i,])
self$backward(t[i],y_i, X[i,])
}
if(iter %% 20 == 0){
cat("Iteration: ", iter)
print("Weights: ")
print(unlist(self$w))
}
}
}
, predict = function(X){
X <- cbind(-1,X) #add bias
preds <- c()
for(i in 1:nrow(X)){
preds[i] <- self$forward(X[i,])
}
return(preds)
}
)
)

How can you test that your implementation is correct? Well, your network should be able to predict correctly the labels of the following data (the OR function) after a few iterations:

x1 x2 t
0 0 0
1 0 1
0 1 1
1 1 1

To test your implementation, you need first to create a data frame with this dataset:

x1 <- c(0,0,1,1)
x2 <- c(0,1,0,1)
t <- c(0,1,1,1)
X <- data.frame(x1=x1, x2=x2)

Now, let's initialize:

lr <- LR$new(n_iter=100, dim=ncol(X))
lr

Next, we call the train method:

lr$train(X,t)
lr$w

And, finally, predict:

lr$predict(X)

To get an insight into what the perceptron is doing, we will draw the decision boundary, that is, the criteria the algorithm is using for classification.

First, we coerce to a data frame, as we will use the ggplot2 library:

df <- as.data.frame(X)
df$t <- as.factor(t)

Then, we get the coefficients:

# Get the line
w0 <- as.numeric(lr$w[1])
w1 <- as.numeric(lr$w[2])
w2 <- as.numeric(lr$w[3])

And finally, we create the line:

x1_vals <- seq(-0.15,1,0.1)
x2_vals <- (w0-w1*x1_vals)/w2
boundary <- data.frame(x1_vals=x1_vals, x2_vals=x2_vals)
#Plot decision boundary
library(ggplot2)
ggplot()+
geom_point(data=df, aes(x=x1,y=x2, color=t, size=2))+
geom_line(data=boundary, aes(x=x1_vals, y=x2_vals, size=1))+
theme_bw()

This gives us the following output:

 The OR function and the decision boundary

This shows us that the perceptron can separate this dataset in a non-unique way.

The functionality of the perceptron is quite limited, and it is only shown here to illustrate the way to implement classes in R. For instance, the perceptron is unable to separate simple examples as the one shown as follows:

 The XOR function

As you can see, there is no way to draw a line that separates both classes, hence showing the limitations of the perceptron. 

There are two ways around this. One is to create additional features, like x1*x2, which would indeed make the data separable. The other way is to create a more complicated decision mechanism (a non-linear decision boundary). We will, in some sense, show you how to do the second.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset