blb: Bag of little bootstraps

View source: R/bootstraps.R

blbR Documentation

Bag of little bootstraps

Description

Bag of little bootstrap as described by Kleiner et al. 2014 implemented with adaptive convergence checking.

Usage

blb(
  data,
  subset_size_b = nrow(data)^0.7,
  n_subsets = NA,
  n_resamples = 100,
  window_subsets = 3,
  window_resamples = 20,
  epsilon = 0.05,
  fun_estimator = NULL,
  fun_metric = NULL
)

Arguments

data

A two-dimensional numerical data object.

subset_size_b

An integer value. The number of rows for each subset bootstraps. Kleiner et al. 2014 suggest empirically a value of nrow(data) ^ lambda with lambda = 0.7, but lambda between 0.1 and 1.

n_subsets

An integer value. The upper limit of sampled subsets s. If NA, then all subsets are sampled. If convergence is achieved earlier, then not all s subsets are processed.

n_resamples

An integer value. The upper limit of Monte-Carlo iterations (resamples, r) carried out on each subset. Kleiner et al. 2014 found empirically that a value of r = 100 worked well for confidence intervals. If convergence is achieved earlier, then not all r resamples are processed.

window_subsets

An integer value. The window size of the number of previous subsets to consider for adaptive convergence checking.

window_resamples

An integer value. The window size of the number of previous resamples to consider for adaptive convergence checking.

epsilon

A positive numerical value. The acceptable relative error to determine convergence.

fun_estimator

A function with two arguments x and weights where x will be j-th subset of data and weights a vector of counts defining the k-th resample of the j-th subset (see examples). The return value is a named vector of the estimator(s) of interest.

fun_metric

A function with one argument x which will be applied to each element of the results of fun_estimator across all resamples of the j-th subset. The return value is a named vector of the estimator(s) of quality assessment for the estimator(s) of interest.

Value

A two-dimensional object of means across all subsets where rows represent the estimator(s) of quality assessment, i.e., the output of fun_metric, and the columns represent the the estimator(s) of interest, i.e., the output of fun_estimator.

References

Kleiner, A., A. Talwalkar, P. Sarkar, and M. I. Jordan. 2014. A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76:795–816.

See Also

Function drBLB of package https://github.com/delta-rho/datadr.

Examples

n <- 10000
xt <- seq(0, 10, length.out = n)
ex_data <- data.frame(
  x1 = sample(xt),
  x2 = sample(xt)
)

# Linear regression with coefficients 1, 2, and 3
ex_data[, "y"] <-
  1 + rnorm(n, 0, 1) + 2 * ex_data[, "x1"] + 3 * ex_data[, "x2"]

# Estimate coefficients with BLB
blb(
  data = ex_data,
  fun_estimator = function(x, weights) {
    coef(lm(
      y ~ x1 + x2,
      data = x,
      weights = weights / max(weights)
    ))
  },
  fun_metric = function(x) {
    quantile(x, probs = c(0.025, 0.5, 0.975))
  }
)


DrylandEcology/rSW2utils documentation built on Dec. 9, 2023, 10:44 p.m.