step_distributed_lag: Create a distributed lagged predictor
In hydrorecipes: Hydrogeology Steps for the 'recipes' Package

View source: R/step_distributed_lag.R

step_distributed_lag

R Documentation

Create a distributed lagged predictor

Description

step_distributed_lag creates a specification of a recipe step that will add new basis lag columns. The new data will include NA values up to the maximum lag. These can be removed with recipes::step_naomit(). The inspiration for this step comes from the dlnm package. For large datasets with large maximum time lags, convolution is done in the frequency domain for efficiency. Samples should be ordered and have regular spacing (i.e. regular time series, regular spatial sampling).

Usage

step_distributed_lag(
  recipe,
  ...,
  role = "predictor",
  trained = FALSE,
  knots = NULL,
  basis_mat = NULL,
  spline_fun = splines::ns,
  options = list(intercept = TRUE),
  prefix = "distributed_lag_",
  keep_original_cols = FALSE,
  columns = NULL,
  skip = FALSE,
  id = rand_id("distributed_lag")
)

Arguments

`recipe`	A recipe object. The step will be added to the sequence of operations for this recipe.
`...`	One or more selector functions to choose variables for this step. See `selections()` for more details.
`role`	For model terms created by this step, what analysis role should they be assigned? By default, the new columns created by this step from the original variables will be used as predictors in a model.
`trained`	A logical to indicate if the quantities for preprocessing have been estimated.
`knots`	An integer vector of breakpoints to define the spline. These should include the `Boundary.knots`. See splines for more info.
`basis_mat`	The matrix of basis kernels to convolve. This is `NULL` until computed by `prep.recipe()`. This can also be specified as an object generated from the splines or splines2 packages having attributes for `knots` and `Boundary.knots`. If specified like this `knots` will be obtained from the `basis_mat` and not from the `knots` parameter.
`spline_fun`	Function used for calculating `basis_mat`. This should return an object having `knots` and `Boundary.knots` attributes.
`options`	The arguments to pass to `spline_fun`.
`prefix`	A prefix for generated column names, default to "distributed_lag_".
`keep_original_cols`	A logical to keep the original variables in the output. Defaults to `FALSE`.
`columns`	A character string of variable names that will be populated elsewhere.
`skip`	A logical. Should the step be skipped when the recipe is baked by `bake()`? While all operations are baked when `prep()` is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using `skip = TRUE` as it may affect the computations for subsequent operations.
`id`	A character string that is unique to this step to identify it.

Details

This step assumes that the data are already in the proper sequential order for lagging. The input should be sampled at a regular interval (time, space, etc.). When the recipe is baked a set of vectors resulting from the convolution of a vector and a basis matrix is returned. Distributed lags can be used to model a delayed response to a input in a flexible manner with fewer regressor terms. The method achieves this by convolving a input stress with a basis lag matrix (commonly spline function) which leads to a set of regressors with fewer terms but still capable of describing both fast and slow responses.

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

References

Almon, S (1965). The Distributed Lag Between Capital Appropriations and Expenditures. Econometrica 33(1), 178.

Gasparrini A. Distributed lag linear and non-linear models in R: the package dlnm. Journal of Statistical Software. 2011; 43(8):1-20. https://doi.org/10.18637/jss.v043.i08

Examples

data(wipp30)

rec_base <- recipe(wl~baro, data = wipp30)

# default uses splines::ns
rec <- rec_base |>
  step_distributed_lag(baro,
                       knots = log_lags(4, 72)) |>
  prep()

# use different spline function
rec <- rec_base |>
  step_distributed_lag(baro,
                       spline_fun = splines::bs,
                       options = list(intercept = TRUE,
                                      degree = 4L),
                       knots = log_lags(4, 72)) |>
  prep()

# specify basis_mat
basis_mat <- splines2::mSpline(0:72, knots = c(3,16))
rec <- rec_base |>
  step_distributed_lag(baro,
                       basis_mat = basis_mat) |>
  prep()

hydrorecipes documentation built on June 27, 2022, 9:06 a.m.