hop_index: Hop relative to an index

View source: R/hop-index.R

hop_indexR Documentation

Hop relative to an index

Description

hop_index() is the lower level engine that powers slide_index(). It has slightly different invariants than slide_index(), and is useful when you either need to hand craft boundary values, or want to compute a result with a size that is different from .x.

Usage

hop_index(.x, .i, .starts, .stops, .f, ...)

hop_index_vec(.x, .i, .starts, .stops, .f, ..., .ptype = NULL)

Arguments

.x

⁠[vector]⁠

The vector to iterate over and apply .f to.

.i

⁠[vector]⁠

The index vector that determines the window sizes. It is fairly common to supply a date vector as the index, but not required.

There are 3 restrictions on the index:

  • The size of the index must match the size of .x, they will not be recycled to their common size.

  • The index must be an increasing vector, but duplicate values are allowed.

  • The index cannot have missing values.

.starts, .stops

⁠[vector]⁠

Vectors of boundary values that make up the windows to bucket .i with. Both .starts and .stops will be recycled to their common size, and that common size will be the size of the result. Both vectors will be cast to the type of .i using vctrs::vec_cast(). These boundaries are both inclusive, meaning that the slice of .x that will be used in each call to .f is where .i >= start & .i <= stop returns TRUE.

.f

⁠[function / formula]⁠

If a function, it is used as is.

If a formula, e.g. ~ .x + 2, it is converted to a function. There are three ways to refer to the arguments:

  • For a single argument function, use .

  • For a two argument function, use .x and .y

  • For more arguments, use ..1, ..2, ..3 etc

This syntax allows you to create very compact anonymous functions.

...

Additional arguments passed on to the mapped function.

.ptype

⁠[vector(0) / NULL]⁠

A prototype corresponding to the type of the output.

If NULL, the default, the output type is determined by computing the common type across the results of the calls to .f.

If supplied, the result of each call to .f will be cast to that type, and the final output will have that type.

If getOption("vctrs.no_guessing") is TRUE, the .ptype must be supplied. This is a way to make production code demand fixed types.

Value

A vector fulfilling the following invariants:

hop_index()

  • vec_size(hop_index(.x, .starts, .stops)) == vec_size_common(.starts, .stops)

  • vec_ptype(hop_index(.x, .starts, .stops)) == list()

hop_index_vec()

  • vec_size(hop_index_vec(.x, .starts, .stops)) == vec_size_common(.starts, .stops)

  • vec_size(hop_index_vec(.x, .starts, .stops)[[1]]) == 1L

  • vec_ptype(hop_index_vec(.x, .starts, .stops, .ptype = ptype)) == ptype

See Also

slide(), slide_index(), hop_index2()

Examples

library(vctrs)
library(lubridate, warn.conflicts = FALSE)

# ---------------------------------------------------------------------------
# Returning a size smaller than `.x`

i <- as.Date("2019-01-25") + c(0, 1, 2, 3, 10, 20, 35, 42, 45)

# slide_index() allows you to slide relative to `i`
slide_index(i, i, ~.x, .before = weeks(1))

# But you might be more interested in coarser summaries. This groups
# by year-month and computes 2 `.f` on 2 month windows.
i_yearmonth <- year(i) + (month(i) - 1) / 12
slide_index(i, i_yearmonth, ~.x, .before = 1)

# ^ This works nicely when working with dplyr if you are trying to create
# a new column in a data frame, but you'll notice that there are really only
# 3 months, so only 3 values are being calculated. If you only want to return
# a vector of those 3 values, you can use `hop_index()`. You'll have to
# hand craft the boundaries, but this is a general strategy
# I've found useful:
first_start <- floor_date(i[1], "months")
last_stop <- ceiling_date(i[length(i)], "months")
dates <- seq(first_start, last_stop, "1 month")
inner <- dates[2:(length(dates) - 1L)]
starts <- vec_c(first_start, inner)
stops <- vec_c(inner - 1, last_stop)

hop_index(i, i, starts, stops, ~.x)

# ---------------------------------------------------------------------------
# Non-existant dates with `lubridate::months()`

# Imagine you want to compute a 1 month rolling average on this
# irregular daily data.
i <- vec_c(as.Date("2019-02-27") + 0:3, as.Date("2019-03-27") + 0:5)
x <- rnorm(vec_seq_along(i))

# You might try `slide_index()` like this, but you'd run into this error
library(rlang)

with_options(
  catch_cnd(
    slide_index(x, i, mean, .before = months(1))
  ),
  rlang_backtrace_on_error = current_env()
)

# This is because when you actually compute the `.i - .before` sequence,
# you hit non-existant dates. i.e. `"2019-03-29" - months(1)` doesn't exist.
i - months(1)

# To get around this, lubridate provides `add_with_rollback()`,
# and the shortcut operation `%m-%`, which subtracts the month, then rolls
# forward/backward if it hits an `NA`. You can manually generate boundaries,
# then provide them to `hop_index()`.
starts <- i %m-% months(1)
stops <- i

hop_index(x, i, starts, stops, mean)

hop_index(i, i, starts, stops, ~.x)


DavisVaughan/slurrr documentation built on Oct. 29, 2024, 2:28 p.m.