View source: R/step_distributed_lag.R
step_distributed_lag | R Documentation |
step_distributed_lag
creates a specification of a recipe step that
will add new basis lag columns. The new data will
include NA values up to the maximum lag. These can be removed
with recipes::step_naomit()
. The inspiration for this step comes from the
dlnm package. For large datasets
with large maximum time lags, convolution is
done in the frequency domain for efficiency. Samples should be ordered and
have regular spacing (i.e. regular time series, regular spatial sampling).
step_distributed_lag( recipe, ..., role = "predictor", trained = FALSE, knots = NULL, basis_mat = NULL, spline_fun = splines::ns, options = list(intercept = TRUE), prefix = "distributed_lag_", keep_original_cols = FALSE, columns = NULL, skip = FALSE, id = rand_id("distributed_lag") )
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
For model terms created by this step, what analysis role should they be assigned? By default, the new columns created by this step from the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
knots |
An integer vector of breakpoints to define the spline. These should
include the |
basis_mat |
The matrix of basis kernels to convolve. This is
|
spline_fun |
Function used for calculating |
options |
The arguments to pass to |
prefix |
A prefix for generated column names, default to "distributed_lag_". |
keep_original_cols |
A logical to keep the original variables in the
output. Defaults to |
columns |
A character string of variable names that will be populated elsewhere. |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
This step assumes that the data are already in the proper sequential order for lagging. The input should be sampled at a regular interval (time, space, etc.). When the recipe is baked a set of vectors resulting from the convolution of a vector and a basis matrix is returned. Distributed lags can be used to model a delayed response to a input in a flexible manner with fewer regressor terms. The method achieves this by convolving a input stress with a basis lag matrix (commonly spline function) which leads to a set of regressors with fewer terms but still capable of describing both fast and slow responses.
An updated version of recipe with the new step added to the sequence of any existing operations.
Almon, S (1965). The Distributed Lag Between Capital Appropriations and Expenditures. Econometrica 33(1), 178.
Gasparrini A. Distributed lag linear and non-linear models in R: the package dlnm. Journal of Statistical Software. 2011; 43(8):1-20. https://doi.org/10.18637/jss.v043.i08
step_lead_lag()
recipes::step_lag()
Other row operation steps:
step_lead_lag()
data(wipp30) rec_base <- recipe(wl~baro, data = wipp30) # default uses splines::ns rec <- rec_base |> step_distributed_lag(baro, knots = log_lags(4, 72)) |> prep() # use different spline function rec <- rec_base |> step_distributed_lag(baro, spline_fun = splines::bs, options = list(intercept = TRUE, degree = 4L), knots = log_lags(4, 72)) |> prep() # specify basis_mat basis_mat <- splines2::mSpline(0:72, knots = c(3,16)) rec <- rec_base |> step_distributed_lag(baro, basis_mat = basis_mat) |> prep()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.