Introduction

EdNet is a package for training deep neural network models with fully connected layers on a number of different regression and classification tasks. This package was built as a personal project to develop my coding skills and my understanding of neural networks and the algorithms used to train them. While there are many deep learning libraries out there that are faster and contain far greater functionality, this one is simple and easy to use in R. You may find it useful if you want a neural network package in R to play around with in order to improve your understanding and hone your intuition on network architectures and hyperparameters.

Currently EdNet supports binary and multiclass classification as well as regression for the following distributions: Normal, Poisson, Gamma, Tweedie. Drop-out and L1 and L2 regularisation are also supported, as well as training with weights and offset models. Training can be with or without mini-batch learning and with Gradient Descent, Momentum, RMSProp, or Adam optimisation.

Installation

EdNet can be installed directly from GitHub using the devtools package. The code is below.

devtools::install_github("EdwinGraham/EdNet")

Binary classification example

EdNet comes with a number of dummy datasets with which to try out different model architectures and parameters to begin to get a feel for them. For this example we will load in the dummy binary data into our environment.

library(EdNet)
data("dfBinary")
head(dfBinary)

There are 3 random normally distributed variables, x1, x2 and x3 and a binary target variable that depends in some probabilistic way on the other variables. We will train a neural network with 5 hidden layers for 10 epochs, using out the first 200 rows as holdout data.

modelFit <- EdNetTrain(X=as.matrix(dfBinary[, 1:3]),
                       Y=as.matrix(dfBinary[, 4]),
                       family="binary",
                       learning_rate=0.01,
                       num_epochs=40,
                       hidden_layer_dims=c(64, 64, 32, 32, 16),
                       hidden_layer_activations="relu",
                       optimiser="Adam",
                       mini_batch_size=32,
                       print_every_n=40L,
                       dev_set = 1:200)

The costs are calculated on the final mini-batch of an epoch which explains the noisiness.

There may be a better possible set of parameters, for example we could add drop-out or regularisation. Now we have build a model, We can use the generic predict function to score probabilities onto the dataset.

dfBinary$prediction <- predict(modelFit, as.matrix(dfBinary[, 1:3]))
head(dfBinary)

Looking only at the rows we held out, we can calculate an accuracy metric.

devSetAccuracy <- mean((dfBinary$prediction[1:200]>.5)==dfBinary$target[1:200])
devSetAccuracy

The model we've built acheives a 85% accuracy on the hold-out data, so we are doing a lot better than random.

Multiclass classification with Iris data

EdNet also supports multiclass classification. The following example uses the famous Iris dataset (from the stats package) to train a neural network to predict iris species using sepal and petal lengths and widths.

First we bring the Iris data into the environment.

data("iris")
head(iris)

# Plot sepal length v sepal width and colour by species
plot(iris$Sepal.Length, iris$Sepal.Width, col=iris$Species, pch=16,
     xlab="Sepal Length", ylab="Sepal Width", main="Iris species by sepal length and width")
legend("topright",
       legend=c("setosa", "versicolor", "virginica"),
       col=1:3, pch=16)

Before fitting the model. For multi-class learning, EdNetTrain requires the input labels to be one-hot encoded. It's also good practice to normalise input features before training.

X <- sapply(iris[, 1:4], normalise)
head(X)

Y <- onehotEncode(iris$Species)
head(Y)

We will hold out 20 rows at random (there are 150 altogether) in order to test the model is working as well as we expect.

set.seed(1984)
holdoutRows <- sample(1:150, 20)

We're now ready to train a model. We will train a network with 3 fully-connected hidden layers with dimensions 128, 64 and 32. Each of these layers will use a relu activiation function and a drop-out percentage of 50%. We will train for 40 epochs with a mini-batch size of 16 using Adam optimisation.

modelFit <- EdNetTrain(X,
                       Y,
                       family="multiclass",
                       learning_rate=0.01,
                       num_epochs=40,
                       hidden_layer_dims=c(128, 64, 32),
                       hidden_layer_activations="relu",
                       optimiser="Adam",
                       lambda=0.001,
                       keep_prob=0.5,
                       mini_batch_size=16,
                       print_every_n=40L,
                       dev_set=holdoutRows)

We can generate the predicted class probabilities by using the predict function.

classProbs <- predict(modelFit, X)
head(classProbs)

Using the predictedClass function we can calculate the predicted class for each row (which is the class with the highest probability.)

iris$predictions <- predictedClass(classProbs, levels(iris$Species))
head(iris)

Plotting these predictions on the holdout only (it is easy to generate good predictions on the training data.)

dev <- iris[holdoutRows, ]

par(mfrow=c(1,2))
plot(dev$Sepal.Length, dev$Sepal.Width, col=dev$Species, pch=16,
     xlab="Sepal Length", ylab="Sepal Width", main="Actual")
plot(dev$Sepal.Length, dev$Sepal.Width, col=dev$predictions, pch=16,
     xlab="Sepal Length", ylab="Sepal Width", main="Predicted")
legend("topright", inset=0.07,
       legend=c("setosa", "versicolor", "virginica"),
       col=1:3, pch=16)

Predictions on the holdout data set have 100% accuracy!

Poisson regression example with offset model

For this example we use a dummy Poisson dataset with 3 features, an exposure column and a target column. The target column is integer count data and the exposure column represents the length of time that the Poisson process ran for. The underlying rate depends on the three feature columns in some way (that we want to model.)

data("dfPoisson")
head(dfPoisson)

The target should depend on exposure in a well-understood linear way that we need to account for in building the model.

plot(dfPoisson$exposure, dfPoisson$target)

To take exposure into account we use an offset. We need to apply the log function to exposure as the offset is applied in training before the final activation function (which is the exponential function in this case.)

modelFit <- EdNetTrain(X=as.matrix(dfPoisson[, 1:3]),
                       Y=as.matrix(dfPoisson[, 5]),
                       offset=log(as.matrix(dfPoisson[, 4])),
                       family="poisson",
                       learning_rate=0.01,
                       num_epochs=40,
                       hidden_layer_dims=c(128, 32, 8),
                       hidden_layer_activations="relu",
                       optimiser="Adam",
                       lambda = 0.004,
                       keep_prob=0.8,
                       mini_batch_size=8,
                       print_every_n=40L,
                       dev_set=1:200)

Having built a model we can use predict to put predicted rate per exposure onto the original data.

dfPoisson$predictedRate <- predict(modelFit, as.matrix(dfPoisson[, 1:3]))
dfPoisson$actualRate <- dfPoisson$target/dfPoisson$exposure
head(dfPoisson)

Plotting the actual v predicted rate on the hold-out data shows that the model is broadly picking up the correct trend.

plot(dfPoisson$predictedRate[1:200], dfPoisson$actualRate[1:200])
abline(c(0, 0), 1)

Gamma distribution example and use of checkpoint model

This example uses dummy Gamma distributed data and makes use of the checkpoint feature to continue learning from an existing model. First we bring in the data.

data("dfGamma")
head(dfGamma)

Now we train a model using gradient descent for 15 epochs without using mini-batches.

modelFit <- EdNetTrain(X=as.matrix(dfGamma[, 1:3]),
                       Y=as.matrix(dfGamma[, 4]),
                       family="gamma",
                       learning_rate=0.002,
                       num_epochs=15,
                       hidden_layer_dims=c(128, 32, 8),
                       hidden_layer_activations="relu",
                       optimiser="GradientDescent",
                       print_every_n=15,
                       dev_set=1:200)

The cost function on the training set is decreasing monotonically since we are not using drop-out or mini-batches. However, the learning seems to have stalled. We can Use first model as a checkpoint and continue training for another 25 epochs with Adam optimisation, mini-batch descent, drop-out and regularisation.

modelFit <- EdNetTrain(X=as.matrix(dfGamma[, 1:3]),
                       Y=as.matrix(dfGamma[, 4]),
                       family="gamma",
                       learning_rate=0.005,
                       num_epochs=25,
                       optimiser="Adam",
                       lambda=0.004,
                       keep_prob=0.8,
                       mini_batch_size=8,
                       print_every_n=25,
                       dev_set=1:200,
                       checkpoint=modelFit)

Changing the optimisation parameters has caused a significant improvement. We can use predict to put the predictions back on the original dataset.

dfGamma$predictions <- predict(modelFit, as.matrix(dfPoisson[, 1:3]))
head(dfGamma)

A plot on the hold-out data shows a reasonable correlation between actual values and predictions (the data is reasonably noisy.)

plot(dfGamma$predictions[1:200], dfGamma$target[1:200])
abline(c(0, 0), 1)

Tweedie distribution example with weights

The final example uses dummy Tweedie distributed data (continuous with a positive mass at zero.) The target variable is drawn from a compound Poisson-Gamma distribution where the Poisson part of the distribution also has a variable exposure as in the Poisson example above. This is similar to an insurance claim cost example, where different policies may have different periods of exposure.

data("dfTweedie")
head(dfTweedie)

We can see that in this example (unlike most insurance examples) only a small proportion of rows have a value of zero.

mean(dfTweedie$target==0)

When modelling using the Tweedie distribution we need to specify a power parameter between 1 and 2. In order to adjust for the exposure We're modelling the target per exposure and using exposure as a weight.

X <- as.matrix(dfTweedie[, 1:3])
Y <- as.matrix(dfTweedie$target/dfTweedie$exposure)

modelFit <- EdNetTrain(X=X,
                       Y=Y,
                       weight=dfTweedie$exposure,
                       family="tweedie",
                       tweediePower=1.3,
                       learning_rate=0.001,
                       num_epochs=40,
                       hidden_layer_dims=c(128, 64, 32, 8),
                       hidden_layer_activations="relu",
                       optimiser="Adam",
                       lambda=0.01,
                       keep_prob=0.8,
                       mini_batch_size=8,
                       print_every_n=40,
                       dev_set=1:200)

Using predict we can again put the model predictions back on the original data.

dfTweedie$predictedRate <- predict(modelFit, X)
dfTweedie$actualRate <- dfTweedie$target/dfTweedie$exposure
head(dfTweedie)

Plotting the actual v predicted rate on the hold-out data.

plot(dfTweedie$predictedRate[1:200], dfTweedie$actualRate[1:200])
abline(c(0, 0), 1)


EdwinGraham/EdNet documentation built on May 6, 2019, 12:22 p.m.