View source: R/rescale_weights.R
rescale_weights | R Documentation |
Most functions to fit multilevel and mixed effects models only
allow the user to specify frequency weights, but not design (i.e., sampling
or probability) weights, which should be used when analyzing complex samples
(e.g., probability samples). rescale_weights()
implements two algorithms,
one proposed by Asparouhov (2006) and Carle (2009), to rescale
design weights in survey data to account for the grouping structure of
multilevel models, and one based on the design effect proposed by
Kish (1965), to rescale weights by the design effect to account for
additional sampling error introduced by weighting.
rescale_weights(
data,
probability_weights = NULL,
by = NULL,
nest = FALSE,
method = "carle"
)
data |
A data frame. |
probability_weights |
Variable indicating the probability (design or sampling) weights of the survey data (level-1-weight), provided as character string or formula. |
by |
Variable names (as character vector, or as formula), indicating
the grouping structure (strata) of the survey data (level-2-cluster
variable). It is also possible to create weights for multiple group
variables; in such cases, each created weighting variable will be suffixed
by the name of the group variable. This argument is required for
|
nest |
Logical, if |
method |
String, indicating which rescale-method is used for rescaling
weights. Can be either |
method = "carle"
Rescaling is based on two methods: For rescaled_weights_a
, the sample
weights probability_weights
are adjusted by a factor that represents the
proportion of group size divided by the sum of sampling weights within each
group. The adjustment factor for rescaled_weights_b
is the sum of sample
weights within each group divided by the sum of squared sample weights
within each group (see Carle (2009), Appendix B). In other words,
rescaled_weights_a
"scales the weights so that the new weights sum to the
cluster sample size" while rescaled_weights_b
"scales the weights so that
the new weights sum to the effective cluster size".
Regarding the choice between scaling methods A and B, Carle suggests that "analysts who wish to discuss point estimates should report results based on weighting method A. For analysts more interested in residual between-group variance, method B may generally provide the least biased estimates". In general, it is recommended to fit a non-weighted model and weighted models with both scaling methods and when comparing the models, see whether the "inferential decisions converge", to gain confidence in the results.
Though the bias of scaled weights decreases with increasing group size, method A is preferred when insufficient or low group size is a concern.
The group ID and probably PSU may be used as random effects (e.g. nested design, or group and PSU as varying intercepts), depending on the survey design that should be mimicked.
method = "kish"
Rescaling is based on scaling the sample weights so the mean value is 1, which means the sum of all weights equals the sample size. Next, the design effect (Kish 1965) is calculated, which is the mean of the squared weights divided by the squared mean of the weights. The scaled sample weights are then divided by the design effect. This method is most appropriate when weights are based on additional variables beyond the grouping variables in the model (e.g., other demographic characteristics), but may also be useful in other contexts.
Some tests on real-world survey-data suggest that, in comparison to the Carle-method, the Kish-method comes closer to estimates from a regular survey-design using the survey package. Note that these tests are not representative and it is recommended to check your results against a standard survey-design.
data
, including the new weighting variable(s). For method = "carle"
, new
columns rescaled_weights_a
and rescaled_weights_b
are returned, and for
method = "kish"
, the returned data contains a column rescaled_weights
.
These represent the rescaled design weights to use in multilevel models (use
these variables for the weights
argument).
Asparouhov T. (2006). General Multi-Level Modeling with Sampling Weights. Communications in Statistics - Theory and Methods 35: 439-460
Carle A.C. (2009). Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology 9(49): 1-13
Kish, L. (1965) Survey Sampling. London: Wiley.
data(nhanes_sample)
head(rescale_weights(nhanes_sample, "WTINT2YR", "SDMVSTRA"))
# also works with multiple group-variables
head(rescale_weights(nhanes_sample, "WTINT2YR", c("SDMVSTRA", "SDMVPSU")))
# or nested structures.
x <- rescale_weights(
data = nhanes_sample,
probability_weights = "WTINT2YR",
by = c("SDMVSTRA", "SDMVPSU"),
nest = TRUE
)
head(x)
# compare different methods, using multilevel-Poisson regression
d <- rescale_weights(nhanes_sample, "WTINT2YR", "SDMVSTRA")
result1 <- lme4::glmer(
total ~ factor(RIAGENDR) + log(age) + factor(RIDRETH1) + (1 | SDMVPSU),
family = poisson(),
data = d,
weights = rescaled_weights_a
)
result2 <- lme4::glmer(
total ~ factor(RIAGENDR) + log(age) + factor(RIDRETH1) + (1 | SDMVPSU),
family = poisson(),
data = d,
weights = rescaled_weights_b
)
d <- rescale_weights(
nhanes_sample,
"WTINT2YR",
method = "kish"
)
result3 <- lme4::glmer(
total ~ factor(RIAGENDR) + log(age) + factor(RIDRETH1) + (1 | SDMVPSU),
family = poisson(),
data = d,
weights = rescaled_weights
)
d <- rescale_weights(
nhanes_sample,
"WTINT2YR",
"SDMVSTRA",
method = "kish"
)
result4 <- lme4::glmer(
total ~ factor(RIAGENDR) + log(age) + factor(RIDRETH1) + (1 | SDMVPSU),
family = poisson(),
data = d,
weights = rescaled_weights
)
parameters::compare_parameters(
list(result1, result2, result3, result4),
exponentiate = TRUE,
column_names = c("Carle (A)", "Carle (B)", "Kish", "Kish (grouped)")
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.