make_ref_unique: Make a lookup vector consisting of unique elements after...

View source: R/make_ref_unique.R

make_ref_uniqueR Documentation

Make a lookup vector consisting of unique elements after applying fun

Description

For a vector x, apply fun and output result, named by unique x

Usage

make_ref_unique(x, dropNA = TRUE, fun, ...)

Arguments

x

(required) An input atomic vector

dropNA

Should NA be excluded, post-uniquification? Defaults to TRUE

fun

A character vector of length 1, denoting the function to apply to x

...

Additional (ideally named) arguments to pass to fun, if applicable

Details

This is a convenient way to create lookup vectors, that can be used to transform the original vector via character subscripting, which can be more efficient (in some cases significantly) than applying the transformation on the input directly.

The (any) improvement in performance is highly dependent on the complexity of the operation (fun and any parameters defined within ...), as well as the length and cardinality of x.

Value

A named vector the length of unique(x), or length(unique(x[!is.na(x)])) if dropNA==TRUE. Additionally, a message communicating the reduction in cardinality. If no reduction in cardinality was detected, i.e. x was already entirely unique, then a message stating that this function effectively has no benefit.

Note

fun should be passed as an explicitly named argument, i.e. fun="gsub" due to the dropNA order. Furthermore, it may be necessary to also explicitly pass x in the input within ... if you wish to call a function (fun) that does not accept x as the first argument, AND where the argument name is something other than x.

Examples

# an ideal scenario
set.seed(10)
vec_x <- rep(
  replicate(1E4,
            paste0(
              paste(sample(LETTERS[1:4], 3, replace = TRUE), collapse = ""),
              sample(100L:1000L, 1)
            )),
  200
)

# using a lookup table to subscript
system.time(
  via_subscript <- make_ref_unique(vec_x, fun = "sub", pattern = "ABC", replacement = "")[vec_x]
)
# versus direct application
system.time(via_direct <- sub(x = vec_x, pattern = "ABC", replacement = ""))

# check
identical(unname(via_subscript), via_direct)

slin30/wzMisc documentation built on Jan. 27, 2023, 1 a.m.