README.md

nowcastLSTM

New in v0.2.2: ability to get uncertainty intervals for predictions and predictions on synthetic vintages.

New in v0.2.0: ability to get feature contributions to the model and perform automatic hyperparameter tuning and variable selection, no need to write this outside of the library anymore.

R wrapper for nowcast_lstm Python library. MATLAB and Julia wrappers also exist. Long short-term memory neural networks for economic nowcasting. More background in this paper in the Journal of Official Statistics.

Installation and set up

Installing the library: Install devtools with install.packages("devtools"). Then, from R, run: devtools::install_github("dhopp1/nowcastLSTM"). If you get errors about packages being built on different versions of R, try running Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true"), then run the install command again. Note on updating the library: This R wrapper is not versioned. When there is a new version of library, update the Python library by running pip install nowcast-lstm==0.2.3 (substitute 0.2.3 with whatever the latest version is) from the command line, then from R rerun devtools::install_github("dhopp1/nowcastLSTM"). This should give you access to the latest functionality in R. Installing Python: If you already have Python installed on your system, simply follow the install instructions from the nowcast_lstm Python library and point initialize_session to the path where your Python is installed later on. If you don't have Python installed on your system, run the following commands in R once you've run devtools::install_github("dhopp1/nowcastLSTM"):

library(reticulate)
install_miniconda(path = miniconda_path(), update = TRUE, force = FALSE)
py_install(conda=miniconda_path(), "dill numpy pandas pmdarima torch nowcast-lstm", pip=TRUE)

Example: nowcastLSTM_example.zip contains an R Markdown file with a dataset and more detailed example of usage in R.

Set up

Once all Python libraries are installed, run the initialize_session function in R each time you use the library.

library(nowcastLSTM)

# this function should be run at the beginning of every session. Python path can be left empty to use the system default
initialize_session(python_path = "path_to/python")

# if you installed Python via reticulate, use this. You may get a warning about requesting one path and getting another, but it should work regardless.
initialize_session(python_path = miniconda_path())

# use this to set Python location permanently
Sys.setenv(RETICULATE_PYTHON = "path_to/python")

Background

LSTM neural networks have been used for nowcasting before, combining the strengths of artificial neural networks with a temporal aspect. However their use in nowcasting economic indicators remains limited, no doubt in part due to the difficulty of obtaining results in existing deep learning frameworks. This library seeks to streamline the process of obtaining results in the hopes of expanding the domains to which LSTM can be applied.

While neural networks are flexible and this framework may be able to get sensible results on levels, the model architecture was developed to nowcast growth rates of economic indicators. As such training inputs should ideally be stationary and seasonally adjusted.

Further explanation of the background problem can be found in this UNCTAD research paper. Further explanation and results in this UNCTAD research paper.

Quick usage

Given data = a dataframe with a date column + monthly data + a quarterly target series to run the model on, usage is as follows:

library(nowcastLSTM)
initialize_session()

# this command will instantiate and train an LSTM network
# due to quirks with using Python from R, the python_model_name argument should be set to the same name used for the R object it is assigned to.
model <- LSTM(data, "target_col_name", n_timesteps=12, python_model_name = "model") # default parameters with 12 timestep history
#model <- LSTM(data, "target_col_name", n_timesteps=12, n_models=10, seeds=c(1:10), python_model_name = "model") # For reproducibility on a single machine/system, give a list of manual seeds as long as the n_models parameter. Reproducibility across machines is not guaranteed.


predict(model, data) # predictions on the training set

# predicting on a testset, which is the same dataframe as the training data + newer data
# this will give predictions for all dates, but only predictions after the training data ends should be considered for testing
predict(model, test_data)

# to gauge performance on artificial data vintages
ragged_preds(model, pub_lags, lag, test_data)

# save a trained model
# python_model_name should be the same value used when the model was initially trained
save_lstm(model, "trained_model.pkl", python_model_name = "model")

# load a previously trained model
# due to quirks with using Python from R, the python_model_name argument should be set to the same name used for the R object it is assigned to.
trained_model <- load_lstm("trained_model.pkl", python_model_name = "trained_model")

Model selection

To ease variable and hyperparameter selection, the library provides provisions for this process to be carried out automatically. See the example file or run ? on the functions for more information.

# case where given hyperparameters, want to select which variables go into the model
selected_variables <- variable_selection(data, "target_col_name", n_timesteps=12) # default parameters with 12 timestep history

# case where given variables, want to select hyperparameters
performance <- hyperparameter_tuning(data, "target_col_name", n_timesteps=12, n_hidden_grid=c(10,20))

# case where want to select both variables and hyperparameters for the model
performance <- select_model(data, "target_col_name", n_timesteps=12, n_hidden_grid=c(10,20))

Prediction uncertainty

Produce estimates along with lower and upper bounds of an uncertainty interval. See the example file or run ? on the functions for more information.

interval_preds <- interval_predict(
  model,
  test_data,
  interval = 0.95
)

ragged_interval_preds <- ragged_interval_predict(
  model, 
  pub_lags, 
  lag = 2, 
  data = test_data, 
  interval = 0.95
)

LSTM parameters

LSTM outputs

Assuming a model has been instantiated and trained with model = LSTM(...), the following functions are available, run help(function) on any of them to find out more about them and their parameters. Other information, like training loss, is available in the trained model object, accessed via $, e.g. model$train_loss:



dhopp1/nowcastLSTM documentation built on May 7, 2024, 9:30 a.m.