knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

multical: Multilevel calibration weighting for surveys

Installation

You can install multical from github using devtools.

## Install devtools if noy already installed
install.packages("devtools", repos='http://cran.us.r-project.org')
## Install augsynth from github
devtools::install_github("ebenmichael/multical")

Data

To show how to use multical, we'll use two contrived data examples. One, data_individual, has individual-level information on covariates, response, and outcome, while the other, data_cell, has cell-level information.

library(multical)
library(dplyr)
data(data_individual)
data(data_cell)

Each data set has the same 4 covariates, but data_individual records whether each individual responded response to the survey, and how they answered a binary question y.

head(data_individual)

On the other hand, data_cell records each individual combination of the covariates, the number of respondents in that cell, the number of individuals in the target population in that cell, and the mean outcome in the cell.

head(data_cell)

Getting multilevel calibration weights

To compute the multilevel calibration weights for both cases, we use the multical function. For individual-level data, we give a formula response ~ covariates, and an indicator for whether an individual is in the target population (in this case everyone is). We also tell multical what order of interactions to consider,. For instance, if we only want to rake on the first order margins, we set order = 1.

out <- multical(response ~ X1 + X2 + X3 + X4, intarget, data_individual, order = 1)

We can see the difference between the re-weighted sample and the population via the get_imbalance function:

get_balance(out, 1)

To estimate the population average with individual level data, we can join the output with our data and take the weighted average

inner_join(data_individual, out) %>%
  filter(response == 1) %>%
  summarise(sum(weight * y) / sum(weight))

If we want to include higher order interaction terms, we can increase order and give a hyper-parameter lambda that controls the degree of approximate post-stratification.

out <- multical(response ~ X1 + X2 + X3 + X4, intarget, data_individual, order = 4, lambda = 1)

rbind(head(get_balance(out, 4)), tail(get_balance(out, 4)))

By default, multical uses all higher order interactions. For large datasets this may be prohibitively computational expensive! So consider starting with a low order and increasing for large datasets. If no value of lambda is provided, multical solves for a series of different hyper-parameter values. We can use the get_balance_v_sample_size function to trace out the trade-off between better balance and lower effective sample sizes.

out <- multical(response ~ X1 + X2 + X3 + X4, intarget, data_individual)

imbal <- get_balance_v_sample_size(out, 4)
plot(imbal$n_eff, imbal$imbalance)

For cell-level data, almost everything is the same. However, now we use the sample and population counts rather than indicators for response and being in the target population.

out <- multical(sample_count ~ X1 + X2 + X3 + X4, target_count, data_cell, order = 4, lambda = 1)

Note: the output data frame out may have the cells in a different order than in the input data frame. Be sure to join the output with the original data frame on the variables to map the weights to the data accurately. We can then estimate the population average with the weights and sample counts.

left_join(data_cell, out) %>%
  summarise(sum(weight * sample_count * y) / sum(target_count))


ebenmichael/multical documentation built on Dec. 20, 2021, 3:12 a.m.