knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(rbatteries)

For the most part, sample() does the job very well. Given some vector, we can resample with replacement rather conveniently

x <- 1:10
sample(x, replace = TRUE)

But what if the vector x is of size one?

x <- 42
sample(x, replace = TRUE)

Suddenly the interpretation becomes sample from 1 to and including 42.

Looking in the documentation of sample

if (length(x) == 1L && is.numeric(x) && is.finite(x) && 
    x >= 1) {
  # if (missing(size)) # missing is not available outside of function decl.
  #   size <- x
  size <- x # replaced with this 
  sample.int(x, size, replace = TRUE) 
}

This behaviour is documented in the details section of the help page of sample.

Resample on the other hand behave more intuitively.

resample(x, replace = TRUE)

Resample for upsampling and downsampling

x <- 1:5
resample(x, n = 10)

This does not work, as the n in sample.int is present.

Upsampling results in a NA whenever replace = FALSE as well. This is not meaningful. There are two ways to solve this

It is not straightforward to change, because what if the user of resample calls it with replace = FALSE and n >= length(x)? A warning seems like the best procedure forward.

Resampling datasets

It is custom to have the rows of a data-frame denote the observations, and the column corresponds to variables [^regressors].

[^regressors]: Also called regressors, features, and of course covariates.

Take the iris-dataset as an example. It contains r nrow(iris) observations, and say we want to downsample to three rows -- for whatever reason.

resample(iris, n = 3)

This is not the expected result. The reason is, we are resampling along the first dimension, which the one designated to variables, and not observations.

Here we have multiple ways of tackling this issue.

A possible scenario: One-column data.frame that we wish to resample, e.g.

sepal_length_10 <- resample(iris$Sepal.Length, n = 10)
sepal_length_10
class(sepal_length_10)


CGMossa/rbatteries documentation built on Oct. 30, 2019, 5:29 a.m.