knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
multical
: Multilevel calibration weighting for surveysYou can install multical
from github using devtools
.
## Install devtools if noy already installed install.packages("devtools", repos='http://cran.us.r-project.org') ## Install augsynth from github devtools::install_github("ebenmichael/multical")
To show how to use multical
, we'll use two contrived data examples. One, data_individual
, has individual-level information on covariates, response, and outcome, while the other, data_cell
, has cell-level information.
library(multical) library(dplyr) data(data_individual) data(data_cell)
Each data set has the same 4 covariates, but data_individual
records whether each individual responded response
to the survey, and how they answered a binary question y
.
head(data_individual)
On the other hand, data_cell
records each individual combination of the covariates, the number of respondents in that cell, the number of individuals in the target population in that cell, and the mean outcome in the cell.
head(data_cell)
To compute the multilevel calibration weights for both cases, we use the multical
function. For individual-level data, we give a formula response ~ covariates
, and an indicator for whether an individual is in the target population (in this case everyone is). We also tell multical
what order of interactions to consider,.
For instance, if we only want to rake on the first order margins, we set order = 1
.
out <- multical(response ~ X1 + X2 + X3 + X4, intarget, data_individual, order = 1)
We can see the difference between the re-weighted sample and the population via the get_imbalance
function:
get_balance(out, 1)
To estimate the population average with individual level data, we can join the output with our data and take the weighted average
inner_join(data_individual, out) %>% filter(response == 1) %>% summarise(sum(weight * y) / sum(weight))
If we want to include higher order interaction terms, we can increase order
and give a hyper-parameter lambda
that controls the degree of approximate post-stratification.
out <- multical(response ~ X1 + X2 + X3 + X4, intarget, data_individual, order = 4, lambda = 1) rbind(head(get_balance(out, 4)), tail(get_balance(out, 4)))
By default, multical
uses all higher order interactions. For large datasets this may be prohibitively computational expensive! So consider starting with a low order and increasing for large datasets. If no value of lambda
is provided, multical
solves for a series of different hyper-parameter values. We can use the get_balance_v_sample_size
function to trace out the trade-off between better balance and lower effective sample sizes.
out <- multical(response ~ X1 + X2 + X3 + X4, intarget, data_individual) imbal <- get_balance_v_sample_size(out, 4) plot(imbal$n_eff, imbal$imbalance)
For cell-level data, almost everything is the same. However, now we use the sample and population counts rather than indicators for response and being in the target population.
out <- multical(sample_count ~ X1 + X2 + X3 + X4, target_count, data_cell, order = 4, lambda = 1)
Note: the output data frame out
may have the cells in a different order than in the input data frame. Be sure to join the output with the original data frame on the variables to map the weights to the data accurately. We can then estimate the population average with the weights and sample counts.
left_join(data_cell, out) %>% summarise(sum(weight * sample_count * y) / sum(target_count))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.