knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(dplyr) library(sastats)
Rake weighting in R is fairly straightforward using anesrake::anesrake()
, for which sastats::rake_weight()
is a thin wrapper. The key aspect of the process is getting the two input datasets into the right format:
The target population dataset needs to be a named list (one element per demographic variable) of target distributions. The weights::wpct()
function helps with this.
The survey dataset needs to be a data frame with a unique ID variable (e.g., Vrid), and one variable for each demographic measure.
Importantly, demographic variables must be stored either as factors or numerics and the category coding much match between the two datasets. You can also see production examples for B4W-19-10 and AS/HS: rakewt-ashs.
For demonstration, package sastats includes survey and population datasets which include several demographic variables:
library(dplyr) library(sastats) data(svy, pop) svy <- select(svy$person, -weight) # survey to be weighted glimpse(svy) # target population (outdoor recreationists) from a genpop survey glimpse(pop)
Importantly, the demographic variables have been encoded in the same way between the 2 datasets (as factors in this case):
wtvars <- setdiff(names(svy), "Vrid") sapply(wtvars, function(x) all(levels(svy[[x]]) == levels(pop[[x]])))
The easiest way to see the format needed for the target population is to run weights::wpct()
on a demographic variable:
weights::wpct(svy$age_weight)
For our example, we are defining the target population (outdoor recreationists) using another general population survey dataset. In this case, we need to ensure we use the stwt
variable from that survey since it was itself weighted:
weights::wpct(pop$age_weight, pop$stwt)
The weights::wpct()
function returns a vector, but we need a list of vectors (one element per demographic variables of interest). A straightforward method involves looping over the demographic variables with sapply()
:
pop_target <- sapply(wtvars, function(x) weights::wpct(pop[[x]], pop[["stwt"]])) pop_target
If we weren't using a reference population dataset, we would need to get the target distributions into a list in some other way. For example, by hand:
partial_target <- list( "sex" = c("Male" = 0.517, "Female" = 0.483), "age_weight" = c("18-34" = 0.351, "35-54" = 0.349, "55+" = 0.300) # etc. ) partial_target
Using weights::wpct()
also of course provides a convenient method for comparison with the survey dataset to be weighted:
sapply(wtvars, function(x) weights::wpct(svy[[x]]))
We can now run the rake weighting procedure. By default, rake_weight()
returns a list of 2 elements: (1) the survey dataset with the weight variable appended, and (2) the anesrake()
return object, which includes a bunch of useful summary statistics (including "design effect", which may be useful for estimating confidence intervals).
svy_wts <- rake_weight(svy, pop_target, "Vrid") svy <- svy_wts$svy summary(svy$weight) wt_summary <- summary(svy_wts$wts) deff <- wt_summary$general.design.effect deff wt_summary
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.