Description Usage Details Value See Also Examples
Constructs a custom mxLSTM
model for use in the caret
train
logic. It behaves slightly different than the usual
caret models as retrieved by getModelInfo
. See details.
1 |
model setup
The model is an LSTM recurrent neural network model with rmsprop optimizer.
Purpose
The purpose of the custom model is the following:
Allow a regression within caret that predicts multiple y in one model.
Allow for scaling of y. Possible options are c('scale', 'center', 'minMax')
If e.g. a PCA is conducted in the preprocessing, the resulting inputs can be scaled again
by preProcessing options c('scaleAgain', 'centerAgain')
Usage
The model differs from 'usual' caret models in its usage.
Differences when using it in train
:
Usually, the formula would be for example y1+y2+y3 ~ x1+x2+x3
.
Caret does not allow this specification, therefore a hack is used:
construct a column dummy = y1
Specify the formula as dummy~x1+x2+x3+y1+y2+y3
.
Determine x and y variables with the arguments xVariables = c('x1', 'x2', 'x3')
and yVariables = c('y1','y2','y3')
Don't us the caret preProcess
argument. Use preProcessX
and
preProcessY
instead
Don't specify preProcOptions
in the trainControl
call.
Specify them in the call to train
.
They will be valid for preProcessX only since y pre-processing does not require further arguments.
preProcessX can be anything that caret allows plus c('scaleAgain', 'centerAgain')
for scaling as a last preProcessing step. preProcessY
can include
c('scale', 'center', 'minMax')
.
For transforming the input to the LSTM, the following additional arguments must be specified to the train function:
seqLength
: The sequence length (number of rows in input)
seqFreq
: frequency of sequence starts (in rows).
If smaller than seqLength, sequences are overlapping
Additional argumets to fit function:
testData: dataset for evaluating performance after each epoch
initialModel: can be specified if the aim is to continue
training on an existing model.
Has to be the output of a call to fitLSTMmodel
.
ATTENTION! Be sure, to specify the same xVariables, yVariables, hidden layers, and
preProcessing steps as in the original training.
seed: Optional random seed that is set before model training for reproducibility.
Additional argumets to predict function:
fullSequence: Boolean. If FALSE, only the last output of a sequence is returned. If TRUE, the output for the whole sequence is returned.
Different prediction functionFor predicting from the model as returned by caret's train
,
you have to use the predictAll
function. This will call the internal
predict function of getLSTMmodel
returning predictions for all y-variables.
tuning parameters
num.epoch: number of training epochs
batch.size: batch size
layer1 number of hidden units in LSTM layer 1.
layer2 number of hidden units in LSTM layer 2.
layer3 number of hidden units in LSTM layer 3.
dropout1 dropout probability for LSTM layer 1
dropout2 dropout probability for LSTM layer 2
dropout3 dropout probability for LSTM layer 3
activation Activation function for the update layers in the LSTM cells. "relu" or "tanh"
shuffle Boolean. Should the training batches be randomly reordered? Each sequence of course stays in its native order
weight_decay: defaults to 0
learningrate_momentum: gamma1. See API description of mx.opt.rmsprop
momentum: gamma2. See API description of mx.opt.rmsprop.
clip_gradients: See API description of mx.opt.rmsprop
Other specific features
plot training history It is possible to plot the training history
of an mxLSTM model with plot_trainHistory
restore checkpoint from specified epoch It is possible to restore
the model weights after a given epoch with the function restoreLstmCheckpoint
.
A list of functions similar to the output of caret's getModelInfo
:
saveCaretLstmModel
, loadCaretLstmModel
,
plot_trainHistory
, fitLSTMmodel
,
predictLSTMmodel
, getPreProcessor
,
predictPreProcessor
, invertPreProcessor
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | ## Not run:
library(mxLSTM)
library(data.table)
library(caret)
###########################################################
## perform a regression with nxLSTM
## on dummy data
## simple data: one numeric output as a function of two numeric inputs.
## including lag values
## with some noise.
set.seed(200)
mx.set.seed(200)
nObs <- 20000
dat <- data.table(x = runif(n = nObs, min = 1000, max = 2000),
y = runif(n = nObs, min = -10, max = 10))
## create target
dat[, target := 0.5 * x + 0.7 * lag(y, 3) - 0.2 * lag(x, 5)]
dat[, target2 := 0.1 * x + 0.3 * lag(y, 1) - 0.4 * lag(x, 2)]
dat[, target := target + rnorm(nObs, 0, 10)]
dat[, target2 := target2 + rnorm(nObs, 0, 10)]
## convert to nxLSTM input
dat <- transformLSTMinput(dat = dat, targetColumn = c("target", "target2"), seq.length = 5)
## convert to caret input
dat <- lstmInput2caret(dat)
## split into train and test set
trainIdx <- sample(seq_len(nrow(dat)), as.integer(nrow(dat) / 3))
evalIdx <- sample(seq_len(nrow(dat))[-trainIdx], as.integer(nrow(dat) / 3))
testIdx <- seq_len(nrow(dat))[-c(trainIdx, evalIdx)]
datTrain <- dat[trainIdx,]
datEval <- dat[evalIdx,]
datTest <- dat[testIdx,]
## define caret trainControl
thisTrainControl <- trainControl(method = "cv",
number = 2,
verboseIter = TRUE)
## do the training
## grid for defining the parameters of the mxNet model
lstmGrid <- expand.grid(layer1 = 64, layer2 = 0, layer3 = 0,
weight.decay = 0, dropout1 = 0, dropout2 = 0, dropout3 = 0,
learningrate.momentum = 0.95,
momentum = 0.1, num.epoch = 50,
batch.size = 128, activation = "relu", shuffle = TRUE, stringsAsFactors = FALSE)
## construct formula with all variables on rigth-hand-side
form <- formula(paste0("dummy~", paste0(setdiff(names(datTrain), "dummy"), collapse = "+")))
caret_lstm <- train(form = form,
data = datTrain,
testData = datEval,
method = getLSTMmodel(), ## get our custom model
xVariables = c("x", "y"), ## define predictors
yVariables = c("target", "target2"), ## define outcomes
preProcessX = c("pca", "scaleAgain", "centerAgain"),
preProcessY = c("scale", "center"), ## in case of multiple y, this makes sense imho
debugModel = FALSE,
trControl = thisTrainControl,
tuneGrid = lstmGrid,
learning.rate = c("1" = 0.02, "40" = 0.0002), ## adaptive learningrate that changes at epoch 40
optimizeFullSequence = FALSE
)
## get nice output of training history
plot_trainHistory(caret_lstm$finalModel)
## get predictions for the datasets
predTrain <- predictAll(caret_lstm, newdata = datTrain, fullSequence = FALSE)
predEval <- predictAll(caret_lstm, newdata = datEval, fullSequence = FALSE)
predTest <- predictAll(caret_lstm, newdata = datTest, fullSequence = FALSE)
## get nice goodness of fit plots.
plot_goodnessOfFit(predicted = predTrain$target, observed = datTrain$target_seq5Y)
plot_goodnessOfFit(predicted = predTrain$target2, observed = datTrain$target2_seq5Y)
plot_goodnessOfFit(predicted = predTest$target, observed = datTest$target_seq5Y)
plot_goodnessOfFit(predicted = predTest$target2, observed = datTest$target2_seq5Y)
## save the model
saveCaretLstmModel(caret_lstm, "testModel")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.