mxLSTM: mxLSTM: A library facilitating regression analysis with LSTMs

Description Usage Arguments Details Value See Also Examples

Description

Provides functions for doing and evaluating regressions with LSTMs.

Builds an LSTM model

Usage

1
2
3
4
5
6
mxLSTM(x, y, num.epoch, test.x = NULL, test.y = NULL, num.hidden,
  optimizeFullSequence = FALSE, dropoutLstm = num.hidden * 0,
  zoneoutLstm = num.hidden * 0, batchNormLstm = FALSE, gammaInit = 1,
  batch.size = 128, activation = "relu", optimizer = "rmsprop",
  learning.rate = 0.002, initializer = mx.init.Xavier(), shuffle = TRUE,
  initialModel = NULL, ...)

Arguments

x

array containing the features:

  • Dimension 1: one entry for each feature

  • Dimension 2: one entry for each element in the sequence

  • Dimension 3: one entry for each training event

Use transformLSTMinput to transform data.frames into this structure.

y

array containing the target labels:

  • Dimension 1: one entry for each output variable

  • Dimension 2: one entry for each element in the sequence

  • Dimension 3: one entry for each training event

Use transformLSTMinput to transform data.frames into this structure.

num.epoch

integer number of training epochs over full ldataset

test.x

same as x, but for testing, not for training

test.y

same as y but for testing, not for training

num.hidden

integer vector of flexible length. For each entry, an LSTM layer with the corresponding number of neurons is created.

optimizeFullSequence

Boolean. If TRUE, each sequence element is in the output and adds to the loss. If FALSE (default), only the last element of each sequence will be used to optimize the model and the outputs of the rest of the sequence are not available in the output.

dropoutLstm

numeric vector of same length as num.hidden. Specifies the dropout probability for each LSTM layer. Dropout is applied according to Cheng et al. "An exploration of dropout with LSTMs". Difference: we employ a constant dropout rate; we do per element dropout.

zoneoutLstm

numeric vector of same length as num.hidden. Specifies the zoneout probability for each LSTM layer. Zoneout is implemented according to Krueger et al. 2017 "Zoneout: Regularizing RNNs by randomly preserving hidden activations".

batchNormLstm

logical. If TRUE, each LSTM layer is batch normalized according to the recommendations in T. Cooljmans et al. ILRC 2017 "Recurrent batch normalization".

gammaInit

numeric value. Will be used to initialize the gamma matrices of batchNormLayers. Cooljmans et al. recommend 0.1 (for use with tanh activation), mxnet default is 1. My experience: 0.1 works very poorly with relu activation.

batch.size

self explanatory

activation

activation function for update layers in the LSTM cells. "relu" or "tanh"

optimizer

character specifying the type of optimizer to use.

learning.rate

learning rate for the optimizer. Can be a single number or a named vector for adaptive learning rate. If it is a vector, the names have to specify the epoch at which this value becomes active. For example learning.rate = c(1=0.004, 30 = 0.002, 50 = 0.0005) will train epochs 1 to 29 with 0.004, epochs 30 to 49 with 0.002 and everything after 50 with 0.0005

initializer

random initializer for weights

shuffle

Boolean. Should the training data be reordered randomly prior to training? (reorders full sequences, order within each sequence is unaffected.)

initialModel

mxLSTM model. If provided, all weights are initialized based on the given model.

...

Additional arguments to optimizer

Details

sequence length is inferred from input (dimension 2).

Value

object of class mxLSTM: list: a symbol, arg.params, aux.params, a log, and the variable names

See Also

fitLSTMmodel, predictLSTMmodel, getLSTMmodel, plot_trainHistory

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
## Not run: 
library(mxLSTM)
library(data.table)
## simple data: two numeric outputs as a function of two numeric inputs.
## including lag values
## with some noise.
nObs <- 20000
dat <- data.table(x = runif(n = nObs, min = 1000, max = 2000),
                  y = runif(n = nObs, min = -10, max = 10))
## create target
dat[, target := 0.5 * x + 0.7 * lag(y, 3) - 0.2 * lag(x, 5)]
dat[, target2 := 0.1 * x + 0.3 * lag(y, 1) - 0.4 * lag(x, 2)]
dat[, target := target + rnorm(nObs, 0, 10)]
dat[, target2 := target2 + rnorm(nObs, 0, 10)]

## convert to nxLSTM input
dat <- transformLSTMinput(dat = dat, targetColumn = c("target", "target2"), seq.length = 5)

## split into training and test set
trainIdx <- sample(dim(dat$x)[3], as.integer(dim(dat$x)[3]/2))
testIdx  <- seq_len(dim(dat$x)[3])[-trainIdx]

## train model
model <- mxLSTM(x = dat$x[,,trainIdx], 
                y = dat$y[,,trainIdx], 
                num.epoch = 50, 
                num.hidden = 64, 
                dropoutLstm = 0, 
                zoneoutLstm = 0.01, 
                batchNormLstm = TRUE, 
                batch.size = 128, 
                optimizer = "rmsprop",
                learning.rate =  c("1" = 0.005, "20" = 0.002, "40" = 0.0005))

## plot training history
plot_trainHistory(model)

## get some predictions (on test set)
predTest <- predictLSTMmodel(model = model, dat = dat$x[,,testIdx], fullSequence = FALSE)

## nice plot
plot_goodnessOfFit(predicted = predTest$y1, observed = dat$y[1,5, testIdx])
plot_goodnessOfFit(predicted = predTest$y2, observed = dat$y[2,5, testIdx])

## save the model
## saveLstmModel(model, "testModel")

## End(Not run)

MarkusBonsch/mxLSTM documentation built on May 28, 2019, 6:40 a.m.