getLSTMmodel: getLSTMmodel
In MarkusBonsch/mxLSTM: LSTM model for regression

Description Usage Details Value See Also Examples

Constructs a custom mxLSTM model for use in the caret train logic. It behaves slightly different than the usual caret models as retrieved by getModelInfo. See details.

1	getLSTMmodel()

model setup
The model is an LSTM recurrent neural network model with rmsprop optimizer.

Purpose
The purpose of the custom model is the following:

Allow multiple y: Allow a regression within caret that predicts multiple y in one model.
Scaling of y: Allow for scaling of y. Possible options are c('scale', 'center', 'minMax')
scale x variables again: If e.g. a PCA is conducted in the preprocessing, the resulting inputs can be scaled again by preProcessing options c('scaleAgain', 'centerAgain')

Usage
The model differs from 'usual' caret models in its usage. Differences when using it in train:

Different formula for model specification

Usually, the formula would be for example y1+y2+y3 ~ x1+x2+x3. Caret does not allow this specification, therefore a hack is used:

construct a column dummy = y1
Specify the formula as dummy~x1+x2+x3+y1+y2+y3.
Determine x and y variables with the arguments xVariables = c('x1', 'x2', 'x3') and yVariables = c('y1','y2','y3')

Different pre-processing arguments

Don't us the caret preProcess argument. Use preProcessX and preProcessY instead
Don't specify preProcOptions in the trainControl call. Specify them in the call to train. They will be valid for preProcessX only since y pre-processing does not require further arguments. preProcessX can be anything that caret allows plus c('scaleAgain', 'centerAgain') for scaling as a last preProcessing step. preProcessY can include c('scale', 'center', 'minMax').

Additional mandatory arguments to fit function

For transforming the input to the LSTM, the following additional arguments must be specified to the train function:

seqLength: The sequence length (number of rows in input)
seqFreq: frequency of sequence starts (in rows). If smaller than seqLength, sequences are overlapping

Additional argumets to fit function:

testData: dataset for evaluating performance after each epoch
initialModel: can be specified if the aim is to continue training on an existing model.
Has to be the output of a call to fitLSTMmodel.
ATTENTION! Be sure, to specify the same xVariables, yVariables, hidden layers, and preProcessing steps as in the original training.
seed: Optional random seed that is set before model training for reproducibility.

Additional argumets to predict function:

fullSequence: Boolean. If FALSE, only the last output of a sequence is returned. If TRUE, the output for the whole sequence is returned.

\item

Different prediction functionFor predicting from the model as returned by caret's train, you have to use the predictAll function. This will call the internal predict function of getLSTMmodel returning predictions for all y-variables.

tuning parameters

num.epoch: number of training epochs
batch.size: batch size
layer1 number of hidden units in LSTM layer 1.
layer2 number of hidden units in LSTM layer 2.
layer3 number of hidden units in LSTM layer 3.
dropout1 dropout probability for LSTM layer 1
dropout2 dropout probability for LSTM layer 2
dropout3 dropout probability for LSTM layer 3
activation Activation function for the update layers in the LSTM cells. "relu" or "tanh"
shuffle Boolean. Should the training batches be randomly reordered? Each sequence of course stays in its native order
weight_decay: defaults to 0
learningrate_momentum: gamma1. See API description of mx.opt.rmsprop
momentum: gamma2. See API description of mx.opt.rmsprop.
clip_gradients: See API description of mx.opt.rmsprop

Other specific features

plot training history It is possible to plot the training history of an mxLSTM model with plot_trainHistory
restore checkpoint from specified epoch It is possible to restore the model weights after a given epoch with the function restoreLstmCheckpoint.

A list of functions similar to the output of caret's getModelInfo:

saveCaretLstmModel, loadCaretLstmModel, plot_trainHistory, fitLSTMmodel, predictLSTMmodel, getPreProcessor, predictPreProcessor, invertPreProcessor

## Not run: 
library(mxLSTM)
library(data.table)
library(caret)
###########################################################
## perform a regression with nxLSTM
## on dummy data

## simple data: one numeric output as a function of two numeric inputs.
## including lag values
## with some noise.
set.seed(200)
mx.set.seed(200)
nObs <- 20000
dat <- data.table(x = runif(n = nObs, min = 1000, max = 2000),
                  y = runif(n = nObs, min = -10, max = 10))
## create target
dat[, target := 0.5 * x + 0.7 * lag(y, 3) - 0.2 * lag(x, 5)]
dat[, target2 := 0.1 * x + 0.3 * lag(y, 1) - 0.4 * lag(x, 2)]
dat[, target := target + rnorm(nObs, 0, 10)]
dat[, target2 := target2 + rnorm(nObs, 0, 10)]

## convert to nxLSTM input
dat <- transformLSTMinput(dat = dat, targetColumn = c("target", "target2"), seq.length = 5)

## convert to caret input
dat <- lstmInput2caret(dat)

## split into train and test set
trainIdx <- sample(seq_len(nrow(dat)), as.integer(nrow(dat) / 3))
evalIdx  <- sample(seq_len(nrow(dat))[-trainIdx], as.integer(nrow(dat) / 3))
testIdx  <- seq_len(nrow(dat))[-c(trainIdx, evalIdx)]
datTrain <- dat[trainIdx,]
datEval  <- dat[evalIdx,]
datTest  <- dat[testIdx,]

## define caret trainControl
thisTrainControl  <- trainControl(method = "cv",
                                  number = 2,
                                  verboseIter = TRUE)


## do the training

## grid for defining the parameters of the mxNet model
lstmGrid <- expand.grid(layer1 = 64, layer2 = 0, layer3 = 0,
                        weight.decay = 0, dropout1 = 0, dropout2 = 0, dropout3 = 0,
                        learningrate.momentum = 0.95,
                        momentum = 0.1, num.epoch = 50,
                        batch.size = 128, activation = "relu", shuffle = TRUE, stringsAsFactors = FALSE)

## construct formula with all variables on rigth-hand-side
form <- formula(paste0("dummy~", paste0(setdiff(names(datTrain), "dummy"), collapse = "+")))

caret_lstm <- train(form = form,
                    data = datTrain,
                    testData = datEval,
                    method = getLSTMmodel(), ## get our custom model
                    xVariables = c("x", "y"), ## define predictors
                    yVariables = c("target", "target2"), ## define outcomes
                    preProcessX = c("pca", "scaleAgain", "centerAgain"),
                    preProcessY = c("scale", "center"), ## in case of multiple y, this makes sense imho
                    debugModel = FALSE,
                    trControl = thisTrainControl,
                    tuneGrid = lstmGrid,
                    learning.rate = c("1" = 0.02, "40" = 0.0002), ## adaptive learningrate that changes at epoch 40
                    optimizeFullSequence = FALSE
)

## get nice output of training history
plot_trainHistory(caret_lstm$finalModel)

## get predictions for the datasets
predTrain <- predictAll(caret_lstm, newdata = datTrain, fullSequence = FALSE)
predEval  <- predictAll(caret_lstm, newdata = datEval, fullSequence = FALSE)
predTest  <- predictAll(caret_lstm, newdata = datTest, fullSequence = FALSE)

## get nice goodness of fit plots.
plot_goodnessOfFit(predicted = predTrain$target, observed = datTrain$target_seq5Y)
plot_goodnessOfFit(predicted = predTrain$target2, observed = datTrain$target2_seq5Y)
plot_goodnessOfFit(predicted = predTest$target, observed = datTest$target_seq5Y)
plot_goodnessOfFit(predicted = predTest$target2, observed = datTest$target2_seq5Y)

## save the model
saveCaretLstmModel(caret_lstm, "testModel")

## End(Not run)