hop_index: Hop relative to an index
In DavisVaughan/slurrr: Sliding Window Functions

hop_index

R Documentation

Hop relative to an index

Description

hop_index() is the lower level engine that powers slide_index(). It has slightly different invariants than slide_index(), and is useful when you either need to hand craft boundary values, or want to compute a result with a size that is different from .x.

Usage

hop_index(.x, .i, .starts, .stops, .f, ...)

hop_index_vec(.x, .i, .starts, .stops, .f, ..., .ptype = NULL)

Arguments

`.x`	`⁠[vector]⁠` The vector to iterate over and apply `.f` to.
`.i`	`⁠[vector]⁠` The index vector that determines the window sizes. It is fairly common to supply a date vector as the index, but not required. There are 3 restrictions on the index: The size of the index must match the size of `.x`, they will not be recycled to their common size. The index must be an increasing vector, but duplicate values are allowed. The index cannot have missing values.
`.starts`, `.stops`	`⁠[vector]⁠` Vectors of boundary values that make up the windows to bucket `.i` with. Both `.starts` and `.stops` will be recycled to their common size, and that common size will be the size of the result. Both vectors will be cast to the type of `.i` using `vctrs::vec_cast()`. These boundaries are both inclusive, meaning that the slice of `.x` that will be used in each call to `.f` is where `.i >= start & .i <= stop` returns `TRUE`.
`.f`	`⁠[function / formula]⁠` If a function, it is used as is. If a formula, e.g. `~ .x + 2`, it is converted to a function. There are three ways to refer to the arguments: For a single argument function, use `.` For a two argument function, use `.x` and `.y` For more arguments, use `..1`, `..2`, `..3` etc This syntax allows you to create very compact anonymous functions.
`...`	Additional arguments passed on to the mapped function.
`.ptype`	`⁠[vector(0) / NULL]⁠` A prototype corresponding to the type of the output. If `NULL`, the default, the output type is determined by computing the common type across the results of the calls to `.f`. If supplied, the result of each call to `.f` will be cast to that type, and the final output will have that type. If `getOption("vctrs.no_guessing")` is `TRUE`, the `.ptype` must be supplied. This is a way to make production code demand fixed types.

Value

A vector fulfilling the following invariants:

`hop_index()`

vec_size(hop_index(.x, .starts, .stops)) == vec_size_common(.starts, .stops)
vec_ptype(hop_index(.x, .starts, .stops)) == list()

`hop_index_vec()`

vec_size(hop_index_vec(.x, .starts, .stops)) == vec_size_common(.starts, .stops)
vec_size(hop_index_vec(.x, .starts, .stops)[[1]]) == 1L
vec_ptype(hop_index_vec(.x, .starts, .stops, .ptype = ptype)) == ptype

Examples

library(vctrs)
library(lubridate, warn.conflicts = FALSE)

# ---------------------------------------------------------------------------
# Returning a size smaller than `.x`

i <- as.Date("2019-01-25") + c(0, 1, 2, 3, 10, 20, 35, 42, 45)

# slide_index() allows you to slide relative to `i`
slide_index(i, i, ~.x, .before = weeks(1))

# But you might be more interested in coarser summaries. This groups
# by year-month and computes 2 `.f` on 2 month windows.
i_yearmonth <- year(i) + (month(i) - 1) / 12
slide_index(i, i_yearmonth, ~.x, .before = 1)

# ^ This works nicely when working with dplyr if you are trying to create
# a new column in a data frame, but you'll notice that there are really only
# 3 months, so only 3 values are being calculated. If you only want to return
# a vector of those 3 values, you can use `hop_index()`. You'll have to
# hand craft the boundaries, but this is a general strategy
# I've found useful:
first_start <- floor_date(i[1], "months")
last_stop <- ceiling_date(i[length(i)], "months")
dates <- seq(first_start, last_stop, "1 month")
inner <- dates[2:(length(dates) - 1L)]
starts <- vec_c(first_start, inner)
stops <- vec_c(inner - 1, last_stop)

hop_index(i, i, starts, stops, ~.x)

# ---------------------------------------------------------------------------
# Non-existant dates with `lubridate::months()`

# Imagine you want to compute a 1 month rolling average on this
# irregular daily data.
i <- vec_c(as.Date("2019-02-27") + 0:3, as.Date("2019-03-27") + 0:5)
x <- rnorm(vec_seq_along(i))

# You might try `slide_index()` like this, but you'd run into this error
library(rlang)

with_options(
  catch_cnd(
    slide_index(x, i, mean, .before = months(1))
  ),
  rlang_backtrace_on_error = current_env()
)

# This is because when you actually compute the `.i - .before` sequence,
# you hit non-existant dates. i.e. `"2019-03-29" - months(1)` doesn't exist.
i - months(1)

# To get around this, lubridate provides `add_with_rollback()`,
# and the shortcut operation `%m-%`, which subtracts the month, then rolls
# forward/backward if it hits an `NA`. You can manually generate boundaries,
# then provide them to `hop_index()`.
starts <- i %m-% months(1)
stops <- i

hop_index(x, i, starts, stops, mean)

hop_index(i, i, starts, stops, ~.x)

DavisVaughan/slurrr documentation built on Feb. 17, 2025, 3:12 p.m.