knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(rbatteries)
For the most part, sample()
does the job very well. Given some vector, we can
resample with replacement rather conveniently
x <- 1:10 sample(x, replace = TRUE)
But what if the vector x
is of size one?
x <- 42 sample(x, replace = TRUE)
Suddenly the interpretation becomes sample from 1 to and including 42.
Looking in the documentation of sample
if (length(x) == 1L && is.numeric(x) && is.finite(x) && x >= 1) { # if (missing(size)) # missing is not available outside of function decl. # size <- x size <- x # replaced with this sample.int(x, size, replace = TRUE) }
This behaviour is documented in the details section of the help page of sample
.
Resample on the other hand behave more intuitively.
resample(x, replace = TRUE)
resample
always
behaves like this.x <- 1:5 resample(x, n = 10)
This does not work, as the n
in sample.int
is present.
resample
using missing()
and also add a unit-test for this.Upsampling results in a NA
whenever replace = FALSE
as well. This is not
meaningful. There are two ways to solve this
replace = TRUE
whenever n >= length(x)
It is not straightforward to change, because what if the user of resample
calls
it with replace = FALSE
and n >= length(x)
? A warning seems like the best
procedure forward.
replace = FALSE
and
n >= length(x)
. It is custom to have the rows of a data-frame denote the observations, and the column corresponds to variables [^regressors].
[^regressors]: Also called regressors, features, and of course covariates.
Take the iris
-dataset as an example. It contains r nrow(iris)
observations,
and say we want to downsample to three rows -- for whatever reason.
resample(iris, n = 3)
This is not the expected result. The reason is, we are resampling along the first dimension, which the one designated to variables, and not observations.
Here we have multiple ways of tackling this issue.
resample
a generic, with specific behaviour for data-frames.x
is of non-vector type.A possible scenario: One-column data.frame
that we wish to resample
, e.g.
sepal_length_10 <- resample(iris$Sepal.Length, n = 10) sepal_length_10 class(sepal_length_10)
data.frame
, with one column named Sepal.Length
.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.