lag_column: Lag a Column Based on Date and Time Range

View source: R/lag_column.R

lag_columnR Documentation

Lag a Column Based on Date and Time Range

Description

This function generates a lagged version of a given column based on a date variable, with the ability to specify a range of lags. It also allows for the optional removal of NA values.

Usage

lag_column(column, date, lag, max_lag = lag, drop_na = TRUE)

Arguments

column

A numeric vector or column to be lagged.

date

A vector representing dates corresponding to the column. This should be in a date or datetime format.

lag

An integer specifying the minimum lag (in days, hours, etc.) to apply to column.

max_lag

An integer specifying the maximum lag (in days, hours, etc.) to apply to column. Defaults to lag.

drop_na

A logical value indicating whether to drop NA values from the resulting lagged column. Defaults to TRUE.

Value

A vector of the same length as column, containing the lagged values. If no matching dates are found within the lag window, NA is returned for that position.

Examples

# Basic example with a vector
dates <- as.Date("2023-01-01") + 0:9
values <- rnorm(10)
lagged_values <- lag_column(values, dates, lag = 1, max_lag = 3)

# Example using a tibble and dplyr::group_by
data <- tibble::tibble(
  permno = rep(1:2, each = 10),
  date = rep(seq.Date(as.Date('2023-01-01'), by = "month", length.out = 10), 2),
  size = runif(20, 100, 200),
  bm = runif(20, 0.5, 1.5)
)

data |>
  dplyr::group_by(permno) |>
  dplyr::mutate(
    across(c(size, bm),
           \(x) lag_column(x, date, months(3), months(6), drop_na = TRUE))
  ) |>
dplyr::ungroup()


tidyfinance documentation built on Sept. 11, 2024, 7:08 p.m.