Rao-Wu Bootstrap in surveysd
In surveysd: Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs

options(rmarkdown.html_vignette.check_title = FALSE)
knitr::opts_chunk$set(eval = FALSE)

In the following we describe an extension of the methodology implemented in surveysd. While the default rescaling method described in vignette("Methodology") follows the approach of Preston, this vignette introduces an alternative rescaling method based on the Rao-Wu bootstrap [@raowu1988].

Mathematical Formulation

More formally, the replicate weights are constructed as follows: $$ w^_{hi} = \left(1 - \lambda_h + \lambda_h \cdot \frac{n_h}{m_h} \cdot r^{hi} \right) \cdot w{hi} $$ with $$ \lambda_h = \sqrt{\frac{m_h(1 - f_h)}{n_h - 1}} $$ where:
- ( w_{hi} ) is the design weight for PSU ( i ) in stratum ( h ), ( w_h = N_h / n_h ) is the average design weight for the entire stratum ( h ),
- ( N_h ) is the total number of units in stratum ( h ),
- ( n_h ) is the number of PSUs in the original sample for stratum ( h ),
- ( r^*_{hi} \in {0, 1, 2, \dots} ) is a resampling indicator that represents how many times PSU ( i ) in stratum ( h ) was drawn in the bootstrap ,
- ( m_h = n_h - 1 ) is the number of units to be drawn in each replicate for stratum ( h ),
- ( \lambda_h) is the scaling factor used to adjust the weights during the bootstrap process,
- and ( f_h = n_h / N_h ) is the sampling fraction in stratum ( h ).

Use of Rao-Wu

The Rao-Wu bootstrap is particularly suited for complex survey designs with stratification and multi-stage selection. The design-based variance estimation benefits from a resampling method that accounts for the primary sampling stage and incorporates finite population corrections (FPC) directly in the replicate weights. The method is appropriate when the sampling fraction in the first stage (i.e., the share of selected PSUs within strata) is relatively small, typically below 10%.

Rao-Wu is not suitable when:
- The sampling fraction in the first stage is large and the first-stage sampling is without replacement.
- The survey does not include PSU-level identifiers, making a first-stage resampling infeasible since the method requires resampling entire PSUs to reflect first-stage sampling variability.
- The design is single-stage, where methods like the Preston bootstrap may be more appropriate.

In those cases, it is recommended to fall back on alternative bootstrap methods such as the default Preston approach implemented in surveysd, which offers more flexibility for a wider range of designs without PSU information.

Single PSUs

When dealing with multistage sampling designs, the issue of single PSUs, e.g. a single response unit at a stage or in a stratum, can arise. When applying resampling methods such as the Rao-Wu bootstrap, these single PSUs can introduce challenges in the resampling process. In the Rao-Wu method, we adjust for the presence of single PSUs by combining them with the next smallest stratum or cluster before applying the resampling procedure. This ensures that the resampling reflects the structure of the original design while maintaining appropriate variance estimates for the total and other statistics of interest.

Example Implementation

Load Dataset

library(surveysd)

set.seed(1234)
eusilc <- demo.eusilc(n = 2, prettyNames = TRUE)

eusilc[1:5, .(year, povertyRisk, gender, pWeight)]

Draw bootstrap replicates

For the bootstrap select 'method = "Rao-Wu"'. Otherwise the default "Preston" is used.

dat_boot_rw <- draw.bootstrap(eusilc, 
                              method = "Rao-Wu",
                              REP = 10, 
                              hid = "hid", 
                              weights = "pWeight", 
                              strata = "region", 
                              period = "year")

Calibrate bootstrap replicates

Calibrate each sample according to the distribution of gender (on a personal level) and region (on a household level).

dat_boot_calib <- recalib(dat_boot_rw, 
                          conP.var = "gender", 
                          conH.var = "region",
                          epsP = 1e-2, 
                          epsH = 2.5e-2, 
                          verbose = FALSE)
dat_boot_calib[1:5, .(year, povertyRisk, gender, pWeight, w1, w2, w3, w4)]

Any scripts or data that you put into this service are public.

surveysd documentation built on Aug. 26, 2025, 5:08 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

surveysd
Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs

Rao-Wu Bootstrap in surveysd
In surveysd: Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs

Mathematical Formulation

Use of Rao-Wu

Single PSUs

Example Implementation

Load Dataset

Draw bootstrap replicates

Calibrate bootstrap replicates

Try the surveysd package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

surveysd Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs

Rao-Wu Bootstrap in surveysd In surveysd: Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs

Mathematical Formulation

Use of Rao-Wu

Single PSUs

Example Implementation

Load Dataset

Draw bootstrap replicates

Calibrate bootstrap replicates

Try the surveysd package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

surveysd
Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs

Rao-Wu Bootstrap in surveysd
In surveysd: Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs