Description Usage Arguments Details Author(s) References See Also Examples
Bag of little bootstraps transformation method
1 | drBLB(x, statistic, metric, R, n)
|
x |
a subset of a ddf |
statistic |
a function to apply to the subset specifying the statistic to compute. Must have arguments 'data' and 'weights' - see details). Must return a vector, where each element is a statistic of interest. |
metric |
a function specifying the metric to be applied to the |
R |
the number of bootstrap samples |
n |
the total number of observations in the data |
It is necessary to specify weights
as a parameter to the statistic
function because for BLB to work efficiently, it must resample each time with a sample of size n
. To make this computationally possible for very large n
, we can use weights
(see reference for details). Therefore, only methods with a weights option can legitimately be used here.
Ryan Hafen
Kleiner, Ariel, et al. "A scalable bootstrap for massive data." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76.4 (2014): 795-816.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ## Not run:
# BLB is meant to run on random replicate divisions
rrAdult <- divide(adult, by = rrDiv(1000), update = TRUE)
adultBlb <- rrAdult %>% addTransform(function(x) {
drBLB(x,
statistic = function(x, weights)
coef(glm(incomebin ~ educationnum + hoursperweek + sex,
data = x, weights = weights, family = binomial())),
metric = function(x)
quantile(x, c(0.05, 0.95)),
R = 100,
n = nrow(rrAdult)
)
})
# compute the mean of the resulting CI limits
# (this will take a little bit of time because of resampling)
coefs <- recombine(adultBlb, combMean)
matrix(coefs, ncol = 2, byrow = TRUE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.