knitr::opts_chunk$set(echo = TRUE) require(automl)
This document is intended to answer the following questions; why & how automl and how to use it
automl package provides:
-Deep Learning last tricks (those who have taken Andrew NG’s MOOC on Coursera will be in familiar territory)
-hyperparameters autotune with metaheuristic (PSO)
-experimental stuff and more to come (you’re welcome as coauthor!)
Deploying and maintaining most Deep Learning frameworks means: Python...
R language is so simple to install and maintain in production environments that it is obvious to use a pure R based package for deep learning !
Disadvantages :
1st disadvantage: you have to test manually different combinations of parameters (number of layers, nodes, activation function, etc ...) and then also tune manually hyper parameters for training (learning rate, momentum, mini batch size, etc ...)
2nd disadvantage: only for those who are not mathematicians, calculating derivative in case of new cost or activation function, may by an issue.
The Particle Swarm Optimization algorithm is a great and simple one.
In a few words, the first step consists in throwing randomly a set of particles in a space and the next steps consist in discovering the best solution while converging.
video tutorial from Yarpiz is a great ressource
automl package was born from the idea to use metaheuristic PSO to address the identified disadvantages above.
And last but not the least reason: use R and R only :-)
3 functions are available:
- automl_train_manual: the manual mode to train a model
- automl_train: the automatic mode to train model
- automl_predict: the prediction function to apply a trained model on datas
Mix 1 consists in using PSO algorithm to optimize the hyperparameters: each particle corresponds to a set of hyperparameters.
The automl_train function was made to do that.
Mix 2 is experimental, it consists in using PSO algorithm to optimize the weights of Neural Network in place of gradient descent: each particle corresponds to a set of neural network weights matrices.
The automl_train_manual function do that too.
For those who will laugh at seeing deep learning with one hidden layer and the Iris data set of 150 records, I will say: you’re perfectly right :-)
The goal at this stage is simply to take the first steps
Subject: predict Sepal.Length given other Iris parameters
1st with gradient descent and default hyperparameters value for learning rate (0.001) and mini batch size (32)
data(iris) xmat <- cbind(iris[,2:4], as.numeric(iris$Species)) ymat <- iris[,1] amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat)
res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat)) colnames(res) <- c('actual', 'predict') head(res)
:-[] no pain, no gain ...
After some manual fine tuning on learning rate, mini batch size and iterations number (epochs):
data(iris) xmat <- cbind(iris[,2:4], as.numeric(iris$Species)) ymat <- iris[,1] amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat, hpar = list(learningrate = 0.01, minibatchsize = 2^2, numiterations = 30))
res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat)) colnames(res) <- c('actual', 'predict') head(res)
Better result, but with human efforts!
Same subject: predict Sepal.Length given other Iris parameters
data(iris) xmat <- as.matrix(cbind(iris[,2:4], as.numeric(iris$Species))) ymat <- iris[,1] start.time <- Sys.time() amlmodel <- automl_train(Xref = xmat, Yref = ymat, autopar = list(psopartpopsize = 15, numiterations = 5, auto_layers_max = 1, nbcores = 4)) end.time <- Sys.time() cat(paste('time ellapsed:', end.time - start.time, '\n'))
res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat)) colnames(res) <- c('actual', 'predict') head(res)
It’s even better, with no human efforts but machine time
Windows users won’t benefit from parallelization, the function uses parallel package included with R base...
Same subject: predict Sepal.Length given other Iris parameters
data(iris) xmat <- as.matrix(cbind(iris[,2:4], as.numeric(iris$Species))) ymat <- iris[,1] amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat, hpar = list(modexec = 'trainwpso', numiterations = 30, psopartpopsize = 50))
res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat)) colnames(res) <- c('actual', 'predict') head(res)
Pretty good too, even better!
Same subject: predict Sepal.Length given other Iris parameters
Let’s try with Mean Absolute Percentage Error instead of Mean Square Error
data(iris) xmat <- as.matrix(cbind(iris[,2:4], as.numeric(iris$Species))) ymat <- iris[,1] f <- 'J=abs((y-yhat)/y)' f <- c(f, 'J=sum(J[!is.infinite(J)],na.rm=TRUE)') f <- c(f, 'J=(J/length(y))') f <- paste(f, collapse = ';') amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat, hpar = list(modexec = 'trainwpso', numiterations = 30, psopartpopsize = 50, costcustformul = f))
res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat)) colnames(res) <- c('actual', 'predict') head(res)
Subject: predict Species given other Iris parameters
Softmax is available with PSO, no derivative needed ;-)
data(iris) xmat = iris[,1:4] lab2pred <- levels(iris$Species) lghlab <- length(lab2pred) iris$Species <- as.numeric(iris$Species) ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE) ymat <- (ymat == as.numeric(iris$Species)) + 0 amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat, hpar = list(modexec = 'trainwpso', layersshape = c(10, 0), layersacttype = c('relu', 'softmax'), layersdropoprob = c(0, 0), numiterations = 50, psopartpopsize = 50))
res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat)) colnames(res) <- c(paste('act',lab2pred, sep = '_'), paste('pred',lab2pred, sep = '_')) head(res) tail(res)
Same subject: predict Species given other Iris parameters
1st example: with gradient descent and 2 hidden layers containing 10 nodes, with various activation functions for hidden layers
data(iris) xmat = iris[,1:4] lab2pred <- levels(iris$Species) lghlab <- length(lab2pred) iris$Species <- as.numeric(iris$Species) ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE) ymat <- (ymat == as.numeric(iris$Species)) + 0 amlmodel <- automl_train_manual( Xref = xmat, Yref = ymat, hpar = list( layersshape = c(10, 10, 0), layersacttype = c('tanh', 'relu', ''), layersdropoprob = c(0, 0, 0)))
nb: last activation type may be left to blank (it will be set automatically)
2nd example: with gradient descent and no hidden layer (logistic regression)
data(iris) xmat = iris[,1:4] lab2pred <- levels(iris$Species) lghlab <- length(lab2pred) iris$Species <- as.numeric(iris$Species) ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE) ymat <- (ymat == as.numeric(iris$Species)) + 0 amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat, hpar = list(layersshape = c(0), layersacttype = c('sigmoid'), layersdropoprob = c(0)))
We saved the model to continue training later (see below in next section)
amlmodelsaved <- amlmodel
Subject: continue training on saved model (model saved above in last section)
amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat, hpar = list(numiterations = 100, psopartpopsize = 50), mdlref = amlmodelsaved)
We can see the error continuing to decrease from last training
The training continued with the same parameters, but notice that we were able to change the number of iterations
Same subject: predict Species given other Iris parameters
Let’s try the automatic approach in 2 steps with the same Logistic Regression architecture;
1st step goal is performance, overfitting
2nd step is robustness, regularization
data(iris) xmat = iris[,1:4] lab2pred <- levels(iris$Species) lghlab <- length(lab2pred) iris$Species <- as.numeric(iris$Species) ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE) ymat <- (ymat == as.numeric(iris$Species)) + 0 amlmodel <- automl_train(Xref = xmat, Yref = ymat, hpar = list(layersshape = c(0), layersacttype = c('sigmoid'), layersdropoprob = c(0)), autopar = list(auto_runtype = '2steps'))
Compared to the last runs (in previous sections above), difference between train and cross validation errors is much more tenuous
Automatically :-)
- review the code to object oriented
- manage transfert learning from existing frameworks
- implement CNN
- implement RNN
- ...
-> I won't do it alone, let's create a team !
https://aboulaboul.github.io/automl
https://github.com/aboulaboul/automl