extras: Extra utilities

setdiff_R Documentation

Extra utilities

Description

Extra utilities

Usage

setdiff_(x, y, dups = TRUE)

intersect_(x, y, dups = TRUE)

x %in_% table

x %!in_% table

sample_(x, size = vector_length(x), replace = FALSE, prob = NULL)

val_insert(x, value, n = NULL, prop = NULL)

na_insert(x, n = NULL, prop = NULL)

vector_length(x)

cheapr_var(x, na.rm = TRUE)

cheapr_rev(x)

cheapr_sd(x, na.rm = TRUE)

rev_(x)

sd_(x, na.rm = TRUE)

var_(x, na.rm = TRUE)

with_local_seed(expr, .seed = NULL, .envir = environment(), ...)

Arguments

x

A vector or data frame.

y

A vector or data frame.

dups

Should duplicates be kept? Default is TRUE.

table

See ?collapse::fmatch

size

See ?sample.

replace

See ?sample.

prob

See ?sample.

value

The column name to assign the values of a vector.

n

Number of scalar values (or NA) to insert randomly into your vector.

prop

Proportion of scalar values (or NA) values to insert randomly into your vector.

na.rm

Should NA values be ignored in var_() Default is TRUE.

expr

Expression that will be evaluated with a local seed that is independent and has absolutely no effect on the global RNG state.

.seed

A local seed to set which is only used inside with_local_seed(). After the execution of the expression the original seed is reset.

.envir

Environment to evaluate expression.

...

Further arguments passed onto cut or set.seed.

Value

intersect_() returns a vector of common values between x and y.
setdiff_() returns a vector of values in x but not y.
⁠%in_%⁠ and ⁠%!in_%⁠ both return a logical vector signifying if the values of x exist or don't exist in table respectively.
sample_() is an alternative to sample() that natively samples data frame rows through sset(). It also does not have a special case when length(x) is 1.
val_insert inserts scalar values randomly into your vector. Useful for replacing lots of data with a single value.
na_insert inserts NA values randomly into your vector. Useful for generating missing data.
var_ returns the variance of a numeric vector. No coercion happens for integer vectors and so is very cheap.
rev_ is a much cheaper version of rev().
with_local_seed offers no speed improvements but is extremely handy in executing random number based expressions like rnorm() without affecting the global RNG state. It allows you to run these expressions in a sort of independent 'container' and with an optional seed for that 'container' for reproducibility. The rationale for including this in 'cheapr' is that it can reduce the need to set many seed values, especially for multiple output comparisons of RNG expressions. Another way of thinking about it is that with_local_seed() is a helper that allows you to write reproducible code without side-effects, which traditionally cannot be avoided when calling set.seed() directly.

Examples

library(cheapr)

# Using `with_local_seed()`

# The below 2 statements are equivalent

# Statement 1
set.seed(123456789)
res <- rnorm(10)

# Statement 2
res2 <- with_local_seed(rnorm(10), .seed = 123456789)

# They are the same
identical(res, res2)

# As an example we can see that the RNG is unaffected by generating
# random uniform deviates in batches between calls to `with_local_seed()`
# and comparing to the first result

set.seed(123456789)
batch1 <- rnorm(2)

with_local_seed(runif(10))
batch2 <- rnorm(2)
with_local_seed(runif(10))
batch3 <- rnorm(1)
with_local_seed(runif(10))
batch4 <- rnorm(5)

# Combining the batches produces the same result
# therefore `with_local_seed` did not interrupt the rng sequence
identical(c(batch1, batch2, batch3, batch4), res)

# It can be useful in multiple comparisons
out1 <- with_local_seed(rnorm(5))
out2 <- with_local_seed(rnorm(5))
out3 <- with_local_seed(rnorm(5))

identical(out1, out2)
identical(out1, out3)


cheapr documentation built on Nov. 28, 2025, 5:06 p.m.