step_lead_lag: Create a lead predictor
In hydrorecipes: Hydrogeology Steps for the 'recipes' Package

View source: R/step_lead_lag.R

step_lead_lag

R Documentation

Create a lead predictor

Description

step_lead_lag creates a specification of a recipe step that will add new columns that are shifted forward (lag) or backward (lead). Data will by default include NA values where the shift was induced. These can be removed with recipes::step_naomit(). Samples should be ordered and have regular spacing (i.e. regular time series, regular spatial sampling).

Usage

step_lead_lag(
  recipe,
  ...,
  role = "predictor",
  trained = FALSE,
  lag = 1,
  n_subset = 1,
  n_shift = 0,
  prefix = "lead_lag_",
  keep_original_cols = FALSE,
  columns = NULL,
  skip = FALSE,
  id = rand_id("lead_lag")
)

Arguments

`recipe`	A recipe object. The step will be added to the sequence of operations for this recipe.
`...`	One or more selector functions to choose variables for this step. See `selections()` for more details.
`role`	For model terms created by this step, what analysis role should they be assigned? By default, the new columns created by this step from the original variables will be used as predictors in a model.
`trained`	A logical to indicate if the quantities for preprocessing have been estimated.
`lag`	A vector of integers. Each specified column will be lagged for each value in the vector. Negative values are accepted and indicate leading the vector (i.e. the reverse of lagging)
`n_subset`	A single integer. Subset every `n_subset` values.
`n_shift`	A single integer amount to shift results in number of observations.
`prefix`	A prefix for generated column names, default to "lag_lead_".
`keep_original_cols`	A logical to keep the original variables in the output. Defaults to `FALSE`.
`columns`	A character string of variable names that will be populated (eventually) by the `terms` argument.
`skip`	A logical. Should the step be skipped when the recipe is baked by `bake()`? While all operations are baked when `prep()` is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using `skip = TRUE` as it may affect the computations for subsequent operations.
`id`	A character string that is unique to this step to identify it.

Details

This step assumes that the data are already in the proper sequential order for lagging. This step allows a vector to be shifted forward (lag) or backward (lead). While forward shifts are commonly used for lagged responses, there are cases where a backward shift may be useful. This can arise when there are unknown clock errors between two sensors making the response appear to occur before the input. Another situation where a backward shift may be useful is in cyclical signals where alignment is unknown. The data can also efficiently be subsetted during the lag/leading process resulting in smaller model inputs while still utilizing the entire lag/lead history.

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

Examples

data(wipp30)

recipe(wl~., data = wipp30) |>
  step_lead_lag(baro, lag = -2:2, n_subset = 1, n_shift = 0) |>
  prep()

recipe(wl~ ., data = wipp30) |>
  step_lead_lag(baro, lag = -2:2, n_subset = 2, n_shift = 0) |>
  prep()

recipe(wl~ ., data = wipp30) |>
  step_lead_lag(baro, lag = -2:2, n_subset = 2, n_shift = 1) |>
  prep()

hydrorecipes documentation built on June 27, 2022, 9:06 a.m.