FcLSTM | R Documentation |
In simple words the feedword network feeds input not only forward from layer to layer but also in a loop back to specific layers which is called recurrent. LSTM is an improvement in the case of 'vanishing gradients'.
The procedure of this method works as follows:
We start out with centered and scaled time series data (not necessary: we just need a time series varying in the interval [-1,1]) provided from a numerical vector data
(hence equidistant). Furthermore one has to set a forecast length forecast_length
.
FcLSTM(DataVec, SplitAt, ForecastHorizon,
Seasonality = 28, Scaled = TRUE, ErrorLoss = "MSE", Epochs = 100,
Neurons = 28, ActivationFunction = "relu", RecurrentActivation = "sigmoid",
Batch_size = 1, Time, PlotIt = FALSE, Silent = TRUE,...)
DataVec |
[1:n] numerical vector of regular (equidistant) time series data. |
SplitAt |
Index of row where the DataVec is divided into test and train data. If not given n is used |
ForecastHorizon |
Scalar defining the timesteps to forecast ahead |
Seasonality |
Main saisonality of data, is used for generating batches of data. Default is 28 |
Scaled |
TRUE: automatic scaling |
ErrorLoss |
Error for the loss function, either "MRD","SRD","MSE","MAE". Default is "MSE" |
Epochs |
Number of epochs to train the model, see |
Neurons |
Number of units per layer, see |
ActivationFunction |
Defines the function of activation to use, please see [Goodfellow, 2016] for details. |
RecurrentActivation |
Defines the function of recurrent activation to use, please see [Goodfellow, 2016] for details. |
Batch_size |
Number of samples per gradient update, see The batch size shouldn't be chosen too high in relation to the forecast_length. |
Time |
Optional, [1:n] character vector of Time in the length of data. [1:n] character vector of Time in the length of data. |
PlotIt |
Optional, FALSE (default), do nothing. TRUE: plots the forecast versus the validation set. |
Silent |
Optional, if FALSE, print diverse ouptuts of keras. Default is TRUE |
... |
Further arguments for |
In this approach the recurrent ANN has several internal parameters set as defined in deep learning, see [Goodfellow, 2016] for details. The last layer is a densely-connected NN layer within a time_distributed layer. Currently only one hidden-layer is set.
The epochs
are the total number of forward/backward pass iterations. Typically more improves model performance unless overfitting occurs at which time the validation accuracy/loss will not improve.
data
should be scaled between [-1,1] with "sound" distribution, see [Goodfellow, 2016; Mörchen 2006].
Gradients are vanishing if inputs between zero and one are multiplied several times, because then gradient can shrink to zero. The result is the weights would not change significantly in an recurrent ANN of many layers ('deep learning').
ErrorLoss
defines the objective function which should be minimized, see loss
in compile
in [keras], if you want to use a pre-coded function. You can also put in custom loss functions if you write it in keras backend syntax. (e.g. tensor_srd
. The 'Adam' optimizes is chosen here [Kingma/Ba, 2014].
List of
Model |
Pointer to an ANN model generated by keras, the model is not directly available in R |
FitStats |
Output of |
Forecast |
Forecast generated by the ANN model where we put in the last portion of the training set of length |
TestData |
[(k+1):n] vector, the part of Response not used in the model |
TestTime |
[(k+1):n] vector, time of response not used in the model |
TrainData |
[1:k] vector, the part of Response used in the model |
TrainTime |
[1:k] vector, time of Training data if given |
TrainingForecast |
[1:k] vector, forecasted value using TrainData |
# keras
and tensorflow
have to be installed in python, python can be called from console
# Steps are:
devtools::install_github("rstudio/tensorflow")
devtools::install_github("rstudio/keras")
# Execute the below
tensorflow::install_tensorflow()
tensorflow::tf_config()
#Todo: Integrate Dropout (removeing units from NNs during training) to improve generalisation (Hinton et al., 2012).
Michael Thrun
Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y.: Deep learning, (Vol. 1), Cambridge: MIT press, 2016.
Kingma, D. P., Ba, J.: Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2014.
Hochreiter, Sepp, Jürgen Schmidhuber: "Long short-term memory.", Neural computation, Vol 9.8, pp. 1735-1780, 1997.
Mörchen, Fabian; Time series knowledge mining. Görich & Weiershäuser, 2006.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
keras and tensorflow.
# Sunspots with autocorrelation for a lag of 10 years above 0.5
# (aximum at 125 months
data = datasets::sunspot.month
# scale the subset, reduce the extent of outliers by sqrt
sub = sqrt(data)
quants = quantile(data, c(0.01, 0.5, 0.99), na.rm = F)
min = quants[1]
max = quants[3]
denom = max - min
data = (data - min) / denom
data=as.numeric(data)
# We are ready to apply the LSTM procedure with a batch_size,
## Not run:
results = FcLSTM( data, ForecastHorizon = 1, Batch_size = 40,Seasonality=48, Epochs=300,ErrorLoss="MRD")
# Get the forecast data from the returned
fc = results$Forecast
# Rescale the forecast data to be comparable to the original dataset
fc_rescaled = (denom * fc + min)^2
# Plot out the forecast (in tail use the forecast_length, here 120)
plot(tail(data, 120), type="l")
points(fc_rescaled, col="red")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.