knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

gsample

gsample offers a drop-in replacement for the R base::sample() functions for random sampling, with considerably better performance for the case of weighted sampling without replacement (both from the speed and memory point of view).

Installation

You can install gsample from GitHub with:

# install.packages("devtools")
devtools::install_github("vgherard/gsample")

Example

The gsample API is identical to the one of base::sample():

library(gsample)
n <- 1e3
size <- 1e2
prob <- exp(rnorm(n, sd = 3))
gsample.int(n, size, prob = prob)
x <- letters
size <- 10
prob <- exp(rnorm(length(letters), sd = 3))
gsample(x, size, prob = prob)

Here are some simple benchmark comparisons with base::sample():

library(dplyr)
library(ggplot2)
set.seed(840)
n <- 1e6
prob <- rexp(n)

bm <- lapply(10 ^ (1:5), function(size) {
    bench::mark(
        gsample.int(n, size, prob = prob),
        sample.int(n, size, prob = prob),
        check = FALSE
        ) %>%
        select(expression, median) %>%
        mutate(expression = as.character(expression)) %>%
            mutate(size = size)
    }) 

bind_rows(bm) %>%
    ggplot(aes(x = size, y = 1e3 * median, colour = expression)) +
        geom_line() +
        scale_x_continuous(trans = "log10") +
        scale_y_continuous("median (ms)", trans = "log10")


vgherard/gsample documentation built on Jan. 26, 2021, 1:41 a.m.