Forecasting with Long Short Term Memory Based on Recurrent Neural Network [Hochreiter, 1997].


In simple words the feedword network feeds input not only forward from layer to layer but also in a loop back to specific layers which is called recurrent. LSTM is an improvement in the case of 'vanishing gradients'.

The procedure of this method works as follows: We start out with centered and scaled time series data (not necessary: we just need a time series varying in the interval [-1,1]) provided from a numerical vector data (hence equidistant). Furthermore one has to set a forecast length forecast_length.


FcLSTM(DataVec, SplitAt, ForecastHorizon, 

Seasonality = 28, Scaled = TRUE, ErrorLoss = "MSE", Epochs = 100, 

Neurons = 28, ActivationFunction = "relu", RecurrentActivation = "sigmoid",

Batch_size = 1, Time, PlotIt = FALSE, Silent = TRUE,...)



[1:n] numerical vector of regular (equidistant) time series data.


Index of row where the DataVec is divided into test and train data. If not given n is used


Scalar defining the timesteps to forecast ahead


Main saisonality of data, is used for generating batches of data. Default is 28


TRUE: automatic scaling


Error for the loss function, either "MRD","SRD","MSE","MAE". Default is "MSE"


Number of epochs to train the model, see batch_size in fit in [keras].


Number of units per layer, see units in layer_lstm.


Defines the function of activation to use, please see [Goodfellow, 2016] for details.


Defines the function of recurrent activation to use, please see [Goodfellow, 2016] for details.


Number of samples per gradient update, see batch_size in fit in [keras]. The batch size is the number of data samples in one forward/backward pass of a RNN before a weight update.

The batch size shouldn't be chosen too high in relation to the forecast_length.


Optional, [1:n] character vector of Time in the length of data. [1:n] character vector of Time in the length of data.


Optional, FALSE (default), do nothing. TRUE: plots the forecast versus the validation set.


Optional, if FALSE, print diverse ouptuts of keras. Default is TRUE


Further arguments for layer_lstm


In this approach the recurrent ANN has several internal parameters set as defined in deep learning, see [Goodfellow, 2016] for details. The last layer is a densely-connected NN layer within a time_distributed layer. Currently only one hidden-layer is set.

The epochs are the total number of forward/backward pass iterations. Typically more improves model performance unless overfitting occurs at which time the validation accuracy/loss will not improve.

data should be scaled between [-1,1] with "sound" distribution, see [Goodfellow, 2016; Mörchen 2006].

Gradients are vanishing if inputs between zero and one are multiplied several times, because then gradient can shrink to zero. The result is the weights would not change significantly in an recurrent ANN of many layers ('deep learning').

ErrorLoss defines the objective function which should be minimized, see loss in compile in [keras], if you want to use a pre-coded function. You can also put in custom loss functions if you write it in keras backend syntax. (e.g. tensor_srd. The 'Adam' optimizes is chosen here [Kingma/Ba, 2014].


List of


Pointer to an ANN model generated by keras, the model is not directly available in R


Output of fit in [keras]


Forecast generated by the ANN model where we put in the last portion of the training set of length forecast_length as data to predict from. The test data stays untouched.


[(k+1):n] vector, the part of Response not used in the model


[(k+1):n] vector, time of response not used in the model


[1:k] vector, the part of Response used in the model


[1:k] vector, time of Training data if given


[1:k] vector, forecasted value using TrainData


# keras and tensorflow have to be installed in python, python can be called from console

# Steps are:



# Execute the below



#Todo: Integrate Dropout (removeing units from NNs during training) to improve generalisation (Hinton et al., 2012).


Michael Thrun


# Sunspots with autocorrelation for a lag of 10 years above 0.5
# (aximum at 125 months
data = datasets::sunspot.month

# scale the subset, reduce the extent of outliers by sqrt
sub = sqrt(data)
quants = quantile(data, c(0.01, 0.5, 0.99), na.rm = F)
min = quants[1]
max = quants[3]
denom = max - min
data = (data - min) / denom

# We are ready to apply the LSTM procedure with a batch_size, 
## Not run: 
results = FcLSTM( data, ForecastHorizon = 1, Batch_size = 40,Seasonality=48, Epochs=300,ErrorLoss="MRD")

# Get the forecast data from the returned
fc = results$Forecast

# Rescale the forecast data to be comparable to the original dataset
fc_rescaled = (denom * fc + min)^2

# Plot out the forecast (in tail use the forecast_length, here 120)
plot(tail(data, 120), type="l")
points(fc_rescaled, col="red")

## End(Not run)

