# drBLB: Bag of Little Bootstraps Transformation Method In datadr: Divide and Recombine for Large, Complex Data

## Description

Bag of little bootstraps transformation method

## Usage

 `1` ```drBLB(x, statistic, metric, R, n) ```

## Arguments

 `x` a subset of a ddf `statistic` a function to apply to the subset specifying the statistic to compute. Must have arguments 'data' and 'weights' - see details). Must return a vector, where each element is a statistic of interest. `metric` a function specifying the metric to be applied to the `R` bootstrap samples of each statistic returned by `statistic`. Expects an input vector and should output a vector. `R` the number of bootstrap samples `n` the total number of observations in the data

## Details

It is necessary to specify `weights` as a parameter to the `statistic` function because for BLB to work efficiently, it must resample each time with a sample of size `n`. To make this computationally possible for very large `n`, we can use `weights` (see reference for details). Therefore, only methods with a weights option can legitimately be used here.

Ryan Hafen

## References

Kleiner, Ariel, et al. "A scalable bootstrap for massive data." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76.4 (2014): 795-816.

`divide`, `recombine`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22``` ```## Not run: # BLB is meant to run on random replicate divisions rrAdult <- divide(adult, by = rrDiv(1000), update = TRUE) adultBlb <- rrAdult %>% addTransform(function(x) { drBLB(x, statistic = function(x, weights) coef(glm(incomebin ~ educationnum + hoursperweek + sex, data = x, weights = weights, family = binomial())), metric = function(x) quantile(x, c(0.05, 0.95)), R = 100, n = nrow(rrAdult) ) }) # compute the mean of the resulting CI limits # (this will take a little bit of time because of resampling) coefs <- recombine(adultBlb, combMean) matrix(coefs, ncol = 2, byrow = TRUE) ## End(Not run) ```