drBLB: Bag of Little Bootstraps Transformation Method

Description Usage Arguments Details Author(s) References See Also Examples

Description

Bag of little bootstraps transformation method

Usage

1
drBLB(x, statistic, metric, R, n)

Arguments

x

a subset of a ddf

statistic

a function to apply to the subset specifying the statistic to compute. Must have arguments 'data' and 'weights' - see details). Must return a vector, where each element is a statistic of interest.

metric

a function specifying the metric to be applied to the R bootstrap samples of each statistic returned by statistic. Expects an input vector and should output a vector.

R

the number of bootstrap samples

n

the total number of observations in the data

Details

It is necessary to specify weights as a parameter to the statistic function because for BLB to work efficiently, it must resample each time with a sample of size n. To make this computationally possible for very large n, we can use weights (see reference for details). Therefore, only methods with a weights option can legitimately be used here.

Author(s)

Ryan Hafen

References

Kleiner, Ariel, et al. "A scalable bootstrap for massive data." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76.4 (2014): 795-816.

See Also

divide, recombine

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Not run: 
# BLB is meant to run on random replicate divisions
rrAdult <- divide(adult, by = rrDiv(1000), update = TRUE)

adultBlb <- rrAdult %>% addTransform(function(x) {
  drBLB(x,
    statistic = function(x, weights)
      coef(glm(incomebin ~ educationnum + hoursperweek + sex,
        data = x, weights = weights, family = binomial())),
    metric = function(x)
      quantile(x, c(0.05, 0.95)),
    R = 100,
    n = nrow(rrAdult)
  )
})

# compute the mean of the resulting CI limits
# (this will take a little bit of time because of resampling)
coefs <- recombine(adultBlb, combMean)
matrix(coefs, ncol = 2, byrow = TRUE)

## End(Not run)

datadr documentation built on May 1, 2019, 8:06 p.m.