step_distributed_lag: Create a distributed lagged predictor

View source: R/step_distributed_lag.R

step_distributed_lagR Documentation

Create a distributed lagged predictor

Description

step_distributed_lag creates a specification of a recipe step that will add new basis lag columns. The new data will include NA values up to the maximum lag. These can be removed with recipes::step_naomit(). The inspiration for this step comes from the dlnm package. For large datasets with large maximum time lags, convolution is done in the frequency domain for efficiency. Samples should be ordered and have regular spacing (i.e. regular time series, regular spatial sampling).

Usage

step_distributed_lag(
  recipe,
  ...,
  role = "predictor",
  trained = FALSE,
  knots = NULL,
  basis_mat = NULL,
  spline_fun = splines::ns,
  options = list(intercept = TRUE),
  prefix = "distributed_lag_",
  keep_original_cols = FALSE,
  columns = NULL,
  skip = FALSE,
  id = rand_id("distributed_lag")
)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose variables for this step. See selections() for more details.

role

For model terms created by this step, what analysis role should they be assigned? By default, the new columns created by this step from the original variables will be used as predictors in a model.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

knots

An integer vector of breakpoints to define the spline. These should include the Boundary.knots. See splines for more info.

basis_mat

The matrix of basis kernels to convolve. This is NULL until computed by prep.recipe(). This can also be specified as an object generated from the splines or splines2 packages having attributes for knots and Boundary.knots. If specified like this knots will be obtained from the basis_mat and not from the knots parameter.

spline_fun

Function used for calculating basis_mat. This should return an object having knots and Boundary.knots attributes.

options

The arguments to pass to spline_fun.

prefix

A prefix for generated column names, default to "distributed_lag_".

keep_original_cols

A logical to keep the original variables in the output. Defaults to FALSE.

columns

A character string of variable names that will be populated elsewhere.

skip

A logical. Should the step be skipped when the recipe is baked by bake()? While all operations are baked when prep() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

Details

This step assumes that the data are already in the proper sequential order for lagging. The input should be sampled at a regular interval (time, space, etc.). When the recipe is baked a set of vectors resulting from the convolution of a vector and a basis matrix is returned. Distributed lags can be used to model a delayed response to a input in a flexible manner with fewer regressor terms. The method achieves this by convolving a input stress with a basis lag matrix (commonly spline function) which leads to a set of regressors with fewer terms but still capable of describing both fast and slow responses.

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

References

Almon, S (1965). The Distributed Lag Between Capital Appropriations and Expenditures. Econometrica 33(1), 178.

Gasparrini A. Distributed lag linear and non-linear models in R: the package dlnm. Journal of Statistical Software. 2011; 43(8):1-20. https://doi.org/10.18637/jss.v043.i08

See Also

step_lead_lag() recipes::step_lag()

Other row operation steps: step_lead_lag()

Examples

data(wipp30)

rec_base <- recipe(wl~baro, data = wipp30)

# default uses splines::ns
rec <- rec_base |>
  step_distributed_lag(baro,
                       knots = log_lags(4, 72)) |>
  prep()

# use different spline function
rec <- rec_base |>
  step_distributed_lag(baro,
                       spline_fun = splines::bs,
                       options = list(intercept = TRUE,
                                      degree = 4L),
                       knots = log_lags(4, 72)) |>
  prep()

# specify basis_mat
basis_mat <- splines2::mSpline(0:72, knots = c(3,16))
rec <- rec_base |>
  step_distributed_lag(baro,
                       basis_mat = basis_mat) |>
  prep()


hydrorecipes documentation built on June 27, 2022, 9:06 a.m.