View source: R/make_bootstrap_weights.R
make_doubled_half_bootstrap_weights | R Documentation |
Creates bootstrap replicate weights using the method of Antal and Tillé (2014). This method is applicable to single-stage sample designs, potentially with stratification and clustering. It can be used for designs that use simple random sampling without replacement or unequal probability sampling without replacement. One advantage of this method is that it yields integer replicate factors of 0, 1, 2, or 3.
make_doubled_half_bootstrap_weights(
num_replicates = 100,
samp_unit_ids,
strata_ids,
samp_unit_sel_probs,
output = "weights"
)
num_replicates |
Positive integer giving the number of bootstrap replicates to create. |
samp_unit_ids |
Vector of sampling unit IDs. |
strata_ids |
Vector of strata IDs for each sampling unit at each stage of sampling. |
samp_unit_sel_probs |
Vector of selection probabilities for each sampling unit. |
output |
Either |
For stratified sampling, the replicate factors are generated independently in each stratum. For cluster sampling at a given stage, the replicate factors are generated at the cluster level and then the cluster's replicate factors are applied to all units in the cluster.
In the case of unequal probability sampling, this bootstrap method is only recommended for high entropy sampling methods (i.e., most methods other than systematic sampling).
See Section 7 of Antal and Tillé (2014)
for a clear description of how the replicates are formed.
The paper presents two options for the resampling probabilities
used in replication: the R function uses the option
referred to in the paper as "the \pi
-bootstrap."
A matrix of with the same number of rows as samp_unit_ids
and the number of columns equal to the value of the argument num_replicates
.
Specifying output = "factors"
returns a matrix of replicate adjustment factors which can later be multiplied by
the full-sample weights to produce a matrix of replicate weights.
Specifying output = "weights"
returns the matrix of replicate weights,
where the full-sample weights are inferred using samp_unit_sel_probs
.
Antal, E. and Tillé, Y. (2014). "A new resampling method for sampling designs without replacement: The doubled half bootstrap." Computational Statistics, 29(5), 1345-1363. https://doi.org/10.1007/s00180-014-0495-0
If the survey design can be accurately represented using svydesign
,
then it is easier to simply use as_bootstrap_design
with argument type = "Antal-Tille"
.
Use estimate_boot_reps_for_target_cv
to help choose the number of bootstrap replicates.
library(survey)
# Example 1: A cluster sample
data('library_multistage_sample', package = 'svrep')
replicate_factors <- make_doubled_half_bootstrap_weights(
num_replicates = 5,
samp_unit_ids = library_multistage_sample$PSU_ID,
strata_ids = rep(1, times = nrow(library_multistage_sample)),
samp_unit_sel_probs = library_multistage_sample$PSU_SAMPLING_PROB,
output = "factors"
)
# Example 2: A single-stage sample selected with unequal probabilities, without replacement
## Load an example dataset of U.S. counties states with 2004 Presidential vote counts
data("election", package = 'survey')
pps_wor_design <- svydesign(data = election_pps,
pps = "overton",
fpc = ~ p, # Inclusion probabilities
ids = ~ 1)
## Create bootstrap replicate weights
set.seed(2022)
bootstrap_replicate_weights <- make_doubled_half_bootstrap_weights(
num_replicates = 5000,
samp_unit_ids = pps_wor_design$cluster[,1],
strata_ids = pps_wor_design$strata[,1],
samp_unit_sel_probs = pps_wor_design$prob
)
## Create a replicate design object with the survey package
bootstrap_rep_design <- svrepdesign(
data = pps_wor_design$variables,
repweights = bootstrap_replicate_weights,
weights = weights(pps_wor_design, type = "sampling"),
type = "bootstrap"
)
## Compare std. error estimates from bootstrap versus linearization
data.frame(
'Statistic' = c('total', 'mean'),
'SE (bootstrap)' = c(SE(svytotal(x = ~ Bush, design = bootstrap_rep_design)),
SE(svymean(x = ~ I(Bush/votes),
design = bootstrap_rep_design))),
'SE (Overton\'s PPS approximation)' = c(SE(svytotal(x = ~ Bush,
design = pps_wor_design)),
SE(svymean(x = ~ I(Bush/votes),
design = pps_wor_design))),
check.names = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.