https://github.com/apache/incubator-mxnet/blob/master/R-package/vignettes/CatsDogsFinetune.Rmd
MNIST is a handwritten digits image data set created by Yann LeCun. Every digit is represented by a 28x28 image. It has become a standard data set to test classifiers on simple image input. Neural network is no doubt a strong model for image classification tasks. There's a long-term hosted competition on Kaggle using this data set. We will present the basic usage of mxnet to compete in this challenge.
First, let us download the data from here, and put them under the data/
folder in your working directory.
Then we can read them in R and convert to matrices.
require(mxnet)
# read from files residing in the package extdata <- system.file("extdata", package = "rDeepThought") train <- read.csv(paste(extdata, "train_digits.csv", sep = "/"), header=TRUE) test <- read.csv(paste(extdata, "test_digits.csv", sep = "/"), header=TRUE) # [1] 42000 785 # [1] 28000 784
# Read from files in this folder train <- read.csv("train.csv", header=TRUE) test <- read.csv("test.csv", header=TRUE)
download.file('https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/data/mnist_csv.zip', destfile = 'mnist_csv.zip') unzip('mnist_csv.zip', exdir = '.')
train <- data.matrix(train) test <- data.matrix(test) train.x <- train[,-1] train.y <- train[,1] dim(train) dim(test)
Besides using the csv files from kaggle, you can also read the orginal MNIST dataset into R.
This is not necessary if you already have the train,csv and test.csv.
load_image_file <- function(filename) { f = file(filename, 'rb') readBin(f, 'integer', n = 1, size = 4, endian = 'big') n = readBin(f,'integer', n = 1, size = 4, endian = 'big') nrow = readBin(f,'integer', n = 1, size = 4, endian = 'big') ncol = readBin(f,'integer', n = 1, size = 4, endian = 'big') x = readBin(f, 'integer', n = n * nrow * ncol, size = 1, signed = F) x = matrix(x, ncol = nrow * ncol, byrow = T) close(f) x } load_label_file <- function(filename) { f = file(filename, 'rb') readBin(f,'integer', n = 1, size = 4, endian = 'big') n = readBin(f,'integer', n = 1, size = 4, endian = 'big') y = readBin(f,'integer', n = n, size = 1, signed = F) close(f) y } train.x <- load_image_file('mnist/train-images-idx3-ubyte') test.y <- load_image_file('mnist/t10k-images-idx3-ubyte') train.y <- load_label_file('mnist/train-labels-idx1-ubyte') test.y <- load_label_file('mnist/t10k-labels-idx1-ubyte')
Here every image is represented as a single row in train/test. The greyscale of each image falls in the range [0, 255], we can linearly transform it into [0,1] by
train.x <- t(train.x/255) test <- t(test/255) dim(train.x) dim(test)
We also transpose the input matrix to npixel x nexamples, which is the column major format accepted by mxnet (and the convention of R).
In the label part, we see the number of each digit is fairly even:
table(train.y) # 0 1 2 3 4 5 6 7 8 9 # 4132 4684 4177 4351 4072 3795 4137 4401 4063 4188
Now we have the data. The next step is to configure the structure of our network.
data <- mx.symbol.Variable("data") fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128) act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu") fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=64) act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu") fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=10) softmax <- mx.symbol.SoftmaxOutput(fc3, name="sm")
mxnet
, we use its own data type symbol
to configure the network. data <- mx.symbol.Variable("data")
use data
to represent the input data, i.e. the input layer.fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)
. This layer has data
as the input, its name and the number of hidden neurons.act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
. The activation function takes the output from the first hidden layer fc1
.act1
as the input, with its name as "fc2" and the number of hidden neurons as 64.act1
, except we have a different input source and name.If you are a big fan of the %>%
operator, you can also define the network as below:
library(magrittr) softmax <- mx.symbol.Variable("data") %>% mx.symbol.FullyConnected(name = "fc1", num_hidden = 128) %>% mx.symbol.Activation(name = "relu1", act_type = "relu") %>% mx.symbol.FullyConnected(name = "fc2", num_hidden = 64) %>% mx.symbol.Activation(name = "relu2", act_type = "relu") %>% mx.symbol.FullyConnected(name="fc3", num_hidden=10) %>% mx.symbol.SoftmaxOutput(name="sm")
We are almost ready for the training process. Before we start the computation, let's decide what device should we use.
devices <- mx.cpu()
devices <- mx.gpu()
Here we assign CPU or GPU to mxnet
. After all these preparation, you can run the following command to train the neural network! Note that mx.set.seed
is the correct function to control the random process in mxnet
.
mx.set.seed(0) tic <- proc.time() model <- mx.model.FeedForward.create(softmax, X = train.x, y = train.y, ctx = devices, num.round = 5, array.batch.size = 100, learning.rate = 0.07, momentum = 0.9, eval.metric = mx.metric.accuracy, initializer = mx.init.uniform(0.07), batch.end.callback = mx.callback.log.train.metric(100)) print(proc.time() - tic) # GPU: [5] Train-accuracy=0.981500000000003 # user system elapsed # 7.68 8.51 7.56 # 7.47 9.00 7.31 # 7.67 8.75 7.19
To make prediction, we can simply write
preds <- predict(model, test) dim(preds) # [1] 10 28000
It is a matrix with 28000 rows and 10 cols, containing the desired classification probabilities from the output layer. To extract the maximum label for each row, we can use the max.col
in R:
pred.label <- max.col(t(preds)) - 1 table(pred.label) # 0 1 2 3 4 5 6 7 8 9 # 2810 3201 2863 2864 2785 2485 2828 2783 2709 2672 # ----- GPU # 0 1 2 3 4 5 6 7 8 9 # 2850 3205 2842 2765 2824 2518 2838 2741 2735 2682 # 0 1 2 3 4 5 6 7 8 9 # 2850 3205 2842 2765 2824 2518 2838 2741 2735 2682
Why the difference?
With a little extra effort in the csv format, we can have our submission to the competition!
submission <- data.frame(ImageId=1:ncol(test), Label=pred.label) write.csv(submission, file='submission.csv', row.names=FALSE, quote=FALSE)
Next we are going to introduce a new network structure: LeNet. It is proposed by Yann LeCun to recognize handwritten digits. Now we are going to demonstrate how to construct and train an LeNet in mxnet
.
First we construct the network:
require(mxnet) # input data <- mx.symbol.Variable('data') # first conv conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=20) tanh1 <- mx.symbol.Activation(data=conv1, act_type="tanh") pool1 <- mx.symbol.Pooling(data=tanh1, pool_type="max", kernel=c(2,2), stride=c(2,2)) # second conv conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=50) tanh2 <- mx.symbol.Activation(data=conv2, act_type="tanh") pool2 <- mx.symbol.Pooling(data=tanh2, pool_type="max", kernel=c(2,2), stride=c(2,2)) # first fullc flatten <- mx.symbol.Flatten(data=pool2) fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=500) tanh3 <- mx.symbol.Activation(data=fc1, act_type="tanh") # second fullc fc2 <- mx.symbol.FullyConnected(data=tanh3, num_hidden=10) # loss lenet <- mx.symbol.SoftmaxOutput(data=fc2)
Then let us reshape the matrices (after normalization) into arrays:
train.array <- train.x dim(train.array) <- c(28, 28, 1, ncol(train.x)) test.array <- test dim(test.array) <- c(28, 28, 1, ncol(test))
Next we are going to compare the training speed on different devices, so the definition of the devices goes first:
n.gpu <- 1 device.cpu <- mx.cpu() device.gpu <- lapply(0:(n.gpu-1), function(i) { mx.gpu(i) })
As you can see, we can pass a list of devices, to ask mxnet to train on multiple GPUs (you can do similar thing for cpu, but since internal computation of cpu is already multi-threaded, there is less gain than using GPUs).
We start by training on CPU first. Because it takes a bit time to do so, we will only run it for one iteration.
mx.set.seed(0) tic <- proc.time() model <- mx.model.FeedForward.create(lenet, X = train.array, y = train.y, ctx = device.cpu, num.round = 1, array.batch.size = 100, learning.rate = 0.05, momentum = 0.9, wd = 0.00001, eval.metric = mx.metric.accuracy, batch.end.callback = mx.callback.log.train.metric(100)) print(proc.time() - tic) # user system elapsed # 146.65 141.76 50.92
mx.set.seed(0) tic <- proc.time() model <- mx.model.FeedForward.create(lenet, X = train.array, y = train.y, ctx = device.gpu, num.round = 5, array.batch.size = 100, learning.rate = 0.05, momentum = 0.9, wd = 0.00001, eval.metric = mx.metric.accuracy, batch.end.callback = mx.callback.log.train.metric(100)) print(proc.time() - tic) # user system elapsed # 17.23 69.32 41.19
As you can see by using GPU, we can get a much faster speedup in training! Finally we can submit the result to Kaggle again to see the improvement of our ranking!
preds <- predict(model, test.array) pred.label <- max.col(t(preds)) - 1 submission <- data.frame(ImageId=1:ncol(test), Label=pred.label) write.csv(submission, file='submission.csv', row.names=FALSE, quote=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.