Multilayer perceptron (MLP) is the simplest feed-forward neural network. It mitigates the constraints of original perceptron that was able to learn only linearly separable patterns from the data. It achieves this by introducing at least one hidden layer in order to learn representation of the data that would enable linear separation.
Article: http://www.jakubglinka.com/2017-02-21-neural_networks_part1/ Code: https://github.com/jakubglinka/posts/blob/master/neural_networks_part1/neural_networks_part1.Rmd
gganimate
first
devtools::install_github("dgrtwo/gganimate")img
MXNet is an open-source deep learning framework that allows you to define, train, and deploy deep neural networks on a wide array of devices, from cloud infrastructure to mobile devices and it allows to mix symbolic and imperative programming flavors. For example custom loss functions and accuracy measures.
MXNet package expose so called symbolic API for R users. Its purpose is to create user friendly way of building neural networks abstracting out computational details to the MXNet specialized engine.
Most important symbols:
mx.symbol.Variable
: defines variables (input data, labels, …)mx.symbol.FullyConnected
: affine transformation of input tensormx.symbol.Activation
: places activation function which is applied to all fields of input tensormx.symbol.Output
: defines final transformation of data and loss function.Below is the example of code that configures perceptron with one hidden layer.
# required packages suppressPackageStartupMessages(library(ggplot2)) library(tidyverse) library(lubridate) library(data.table) # suppressPackageStartupMessages(library(rstan)) library(mxnet) library(mlbench) require(gganimate) source("mult_ggplot.R")
According to the theorem first proved by George Cybenko for sigmoid activation function: "feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of $\mathbb{R}^n$, under mild assumptions on the activation function."
Lets put mlp to the test then. For this purpose I will use sprials dataset from mlbench package.
set.seed(2015) ############ sprials dataset ############ s <- sample(x = c("train", "test"), size = 1000, prob = c(.8,.2), replace = TRUE) dta <- mlbench.spirals(n = 1000, cycles = 1.2, sd = .03) dta <- cbind(dta[["x"]], as.integer(dta[["classes"]]) - 1) colnames(dta) <- c("x","y","label") ######### train, validate, test ########## dta.train <- dta[s == "train",] dta.test <- dta[s == "test",] dta.plot <- data.frame( x = dta[, 1], y = dta[, 2], label = as.factor(dta[, 3]), sample = factor(s, levels = c("train", "test"))) ggplot(aes(x = x, y = y), data = dta.plot) + geom_point(aes(colour = label), size = 3) + geom_point(size = 3, shape = 1) + theme_classic() + ggtitle("Spirals dataset N = 1000")
########### Network configuration ######## # variables act <- mx.symbol.Variable("data") # affine transformation fc <- mx.symbol.FullyConnected(act, num.hidden = 10) # non-linear activation act <- mx.symbol.Activation(data = fc, act_type = "relu") # affine transformation fc <- mx.symbol.FullyConnected(act, num.hidden = 2) # softmax output and cross- mlp <- mx.symbol.SoftmaxOutput(fc)
set.seed(2015) ############ sprials dataset ############ s <- sample(x = c("train", "test"), size = 1000, prob = c(.8,.2), replace = TRUE) dta <- mlbench.spirals(n = 1000, cycles = 1.2, sd = .03) dta <- cbind(dta[["x"]], as.integer(dta[["classes"]]) - 1) colnames(dta) <- c("x","y","label") ######### train, validate, test ########## dta.train <- dta[s == "train",] dta.test <- dta[s == "test",]
Feed-forward networks are trained using iterative gradient descent type of algorithm. Additionally during single forward pass only subset of the data is used called batch. Process is repeated until all training examples are used. This is called an epoch. After every epoch MXNet returns training accuracy:
############# basic training ############# mx.set.seed(2014) model <- mx.model.FeedForward.create( symbol = mlp, X = dta.train[, c("x", "y")], y = dta.train[, c("label")], num.round = 5, array.layout = "rowmajor", learning.rate = 1, eval.metric = mx.metric.accuracy)
In order to stop process of training when the progress in accuracy is below certain level of tolerance we need to add custom callback to the feed forward procedure. It is called after every epoch to check if algorithm progresses. If not it will terminate optimization procedure and return results.
######## custom stopping criterion ####### mx.callback.train.stop <- function(tol = 1e-3, mean.n = 1e2, period = 100, min.iter = 100 ) { function(iteration, nbatch, env, verbose = TRUE) { if (nbatch == 0 & !is.null(env$metric)) { continue <- TRUE acc.train <- env$metric$get(env$train.metric)$value if (is.null(env$acc.log)) { env$acc.log <- acc.train } else { if ((abs(acc.train - mean(tail(env$acc.log, mean.n))) < tol & abs(acc.train - max(env$acc.log)) < tol & iteration > min.iter) | acc.train == 1) { cat("Training finished with final accuracy: ", round(acc.train * 100, 2), " %\n", sep = "") continue <- FALSE } env$acc.log <- c(env$acc.log, acc.train) } } if (iteration %% period == 0) { cat("[", iteration,"]"," training accuracy: ", round(acc.train * 100, 2), " %\n", sep = "") } return(continue) } } ###### training with custom stopping ##### mx.set.seed(2014) model <- mx.model.FeedForward.create( symbol = mlp, X = dta.train[, c("x", "y")], y = dta.train[, c("label")], num.round = 2000, array.layout = "rowmajor", learning.rate = 1, epoch.end.callback = mx.callback.train.stop(), eval.metric = mx.metric.accuracy, verbose = FALSE )
grid <- 2:10 res <- lapply(grid, function(xx) { act <- mx.symbol.Variable("data") fc <- mx.symbol.FullyConnected(act, num.hidden = xx) act <- mx.symbol.Activation(data = fc, act_type = "relu") fc <- mx.symbol.FullyConnected(act, num.hidden = 2) mlp <- mx.symbol.SoftmaxOutput(fc) mx.set.seed(2014) model <- mx.model.FeedForward.create( symbol = mlp, X = dta.train[, c("x", "y")], y = dta.train[, c("label")], num.round = 2000, array.layout = "rowmajor", learning.rate = 1, epoch.end.callback = mx.callback.train.stop(), eval.metric = mx.metric.accuracy, verbose = FALSE ) pred.test <- apply( predict( model, X = dta.test[, c("x", "y")], array.layout = "rowmajor" ), 2, which.max) test.acc <- e1071::classAgreement(table(pred.test, dta.test[, "label"]))[[1]] pred.train <- apply( predict( model, X = dta.train[, c("x", "y")], array.layout = "rowmajor" ), 2, which.max) train.acc <- e1071::classAgreement(table(pred.train, dta.train[, "label"]))[[1]] logp <- t(log(predict( model, X = dta.test[, c("x", "y")], array.layout = "rowmajor" ))) loss <- - mean(logp[, 1] * (dta.test[, "label"] == 0) + logp[, 2] * (dta.test[, "label"] == 1)) return(list( train = train.acc, test = test.acc, loss = loss, model = model)) })
library(ggplot2) library(dplyr) library(reshape2) tab <- res %>% lapply(function(xx) data.frame(xx[c("train", "test")])) %>% do.call(rbind, .) tab$hidden.nodes <- grid tab <- tab %>% melt(id.vars = c("hidden.nodes"), measure.vars = c("train", "test")) names(tab) <- c("hidden.nodes", "sample", "accuracy") ggplot(aes(x = hidden.nodes), data = tab) + geom_line(aes(y = accuracy, colour = sample), linetype = 2) + geom_point(aes(y = accuracy, colour = sample, shape = sample, size = 3)) + theme_classic() + scale_y_continuous(labels = function(xx) paste0(round(xx * 100, 1), "%")) + ggtitle("Accuracy vs number of hidden nodes") + guides(size = FALSE) + scale_size(range=c(0.8, 3))
grid <- expand.grid(x = seq(-1.1, 1.1, by = 0.05), y = seq(-1.1, 1.1, by = 0.05)) dtas <- lapply(1:length(res), function(xx) { prob <- predict(res[[xx]]$model, X = as.matrix(grid), array.layout = "rowmajor") prob <- t(prob) pred_label <- apply(prob, 1, which.max) - 1 require(dplyr) dta.plot <- bind_rows(data.frame(grid, prob = prob[, 1], cls="0", stringsAsFactors = FALSE), data.frame(grid, prob = prob[, 2], cls="1", stringsAsFactors = FALSE) ) dta.plot$frame <- xx - 1 dta.plot }) dta.plot <- bind_rows(dtas) p <- ggplot(aes(frame = frame), data = dta.plot) + geom_point(aes(x=x, y=y, col=cls, size = prob, frame = frame), alpha = .7, data = dta.plot[dta.plot$prob >= .5,]) + scale_size(range=c(0.8, 2)) + geom_contour(aes(x=x, y=y, z= as.numeric(prob > .5), group=cls, color=cls), bins=2, data = dta.plot[dta.plot$frame == 0,]) + geom_contour(aes(x=x, y=y, z= as.numeric(prob > .5), group=cls, color=cls), bins=2, data = dta.plot[dta.plot$frame == 1,]) + geom_contour(aes(x=x, y=y, z= as.numeric(prob > .5), group=cls, color=cls), bins=2, data = dta.plot[dta.plot$frame == 2,]) + geom_contour(aes(x=x, y=y, z= as.numeric(prob > .5), group=cls, color=cls), bins=2, data = dta.plot[dta.plot$frame == 3,]) + geom_contour(aes(x=x, y=y, z= as.numeric(prob > .5), group=cls, color=cls), bins=2, data = dta.plot[dta.plot$frame == 4,]) + geom_contour(aes(x=x, y=y, z= as.numeric(prob > .5), group=cls, color=cls), bins=2, data = dta.plot[dta.plot$frame == 5,]) + geom_contour(aes(x=x, y=y, z= as.numeric(prob > .5), group=cls, color=cls), bins=2, data = dta.plot[dta.plot$frame == 6,]) + geom_contour(aes(x=x, y=y, z= as.numeric(prob > .5), group=cls, color=cls), bins=2, data = dta.plot[dta.plot$frame == 7,]) + geom_contour(aes(x=x, y=y, z= as.numeric(prob > .5), group=cls, color=cls), bins=2, data = dta.plot[dta.plot$frame == 8,]) + theme_classic() + geom_point(aes(x=x, y=y, col=cls), size=3, data=data.frame(x=dta.train[,1], y=dta.train[,2], frame = NA, cls = as.character(dta.train[, 3]))) + geom_point(aes(x=x, y=y), size=3, shape=1, data=data.frame(x=dta.train[,1], y=dta.train[,2], frame = NA, cls = as.character(dta.train[, 3]))) + ggtitle("Multilayer Perceptron decision boundary") animation::ani.options(interval=2) # gganimate(p, filename = "{{ site.url }}/img/mlp_evolution.gif", title_frame = FALSE) gganimate(p, filename = "img/mlp_evolution.gif", title_frame = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.