method_glm | R Documentation |
Model for the outcome for the mass imputation estimator using generalized linear
models via the stats::glm
function. Estimation of the mean is done using S_B
probability sample or known population totals.
method_glm(
y_nons,
X_nons,
X_rand,
svydesign,
weights = NULL,
family_outcome = "gaussian",
start_outcome = NULL,
vars_selection = FALSE,
pop_totals = NULL,
pop_size = NULL,
control_outcome = control_out(),
control_inference = control_inf(),
verbose = FALSE,
se = TRUE
)
y_nons |
target variable from non-probability sample |
X_nons |
a |
X_rand |
a |
svydesign |
a svydesign object |
weights |
case / frequency weights from non-probability sample |
family_outcome |
family for the glm model |
start_outcome |
start parameters (default |
vars_selection |
whether variable selection should be conducted |
pop_totals |
population totals from the |
pop_size |
population size from the |
control_outcome |
controls passed by the |
control_inference |
controls passed by the |
verbose |
parameter passed from the main |
se |
whether standard errors should be calculated |
Analytical variance
The variance of the mean is estimated based on the following approach
(a) non-probability part (S_A
with size n_A
; denoted as var_nonprob
in the result)
\hat{V}_1 = \frac{1}{n_A^2}\sum_{i=1}^{n_A} \hat{e}_i \left\lbrace \boldsymbol{h}(\boldsymbol{x}_i; \hat{\boldsymbol{\beta}})^\prime\hat{\boldsymbol{c}}\right\rbrace,
where \hat{e}_i = y_i - m(\boldsymbol{x}_i; \hat{\boldsymbol{\beta}})
and
\widehat{\boldsymbol{c}}=\left\lbrace n_B^{-1} \sum_{i \in B} \dot{\boldsymbol{m}}\left(\boldsymbol{x}_i ; \boldsymbol{\beta}^*\right) \boldsymbol{h}\left(\boldsymbol{x}_i ; \boldsymbol{\beta}^*\right)^{\prime}\right\rbrace^{-1} N^{-1} \sum_{i \in A} w_i \dot{\boldsymbol{m}}\left(\boldsymbol{x}_i ; \boldsymbol{\beta}^*\right).
Under the linear regression model \boldsymbol{h}\left(\boldsymbol{x}_i ; \widehat{\boldsymbol{\beta}}\right)=\boldsymbol{x}_i
and \widehat{\boldsymbol{c}}=\left(n_A^{-1} \sum_{i \in A} \boldsymbol{x}_i \boldsymbol{x}_i^{\prime}\right)^{-1} N^{-1} \sum_{i \in B} w_i \boldsymbol{x}_i .
(b) probability part (S_B
with size n_B
; denoted as var_prob
in the result)
This part uses functionalities of the {survey}
package and the variance is estimated using the following
equation:
\hat{V}_2=\frac{1}{N^2} \sum_{i=1}^{n_B} \sum_{j=1}^{n_B} \frac{\pi_{i j}-\pi_i \pi_j}{\pi_{i j}}
\frac{m(\boldsymbol{x}_i; \hat{\boldsymbol{\beta}})}{\pi_i} \frac{m(\boldsymbol{x}_i; \hat{\boldsymbol{\beta}})}{\pi_j}.
Note that \hat{V}_2
in principle can be estimated in various ways depending on the type of the design and whether population size is known or not.
Furthermore, if only population totals/means are known and assumed to be fixed we set \hat{V}_2=0
.
an nonprob_method
class which is a list
with the following entries
fitted model either an glm.fit
or cv.ncvreg
object
predicted values for the non-probablity sample
predicted values for the probability sample or population totals
coefficients for the model (if available)
an updated surveydesign2
object (new column y_hat_MI
is added)
estimated population mean for the target variable
whether variable selection was performed
variance for the probability sample component (if available)
variance for the non-probability sampl component
total variance, if possible it should be var_prob+var_nonprob
if not, just a scalar
model type (character "glm"
)
family type (character "glm"
)
Kim, J. K., Park, S., Chen, Y., & Wu, C. (2021). Combining non-probability and probability survey samples through mass imputation. Journal of the Royal Statistical Society Series A: Statistics in Society, 184(3), 941-963.
data(admin)
data(jvs)
jvs_svy <- svydesign(ids = ~ 1, weights = ~ weight, strata = ~ size + nace + region, data = jvs)
res_glm <- method_glm(y_nons = admin$single_shift,
X_nons = model.matrix(~ region + private + nace + size, admin),
X_rand = model.matrix(~ region + private + nace + size, jvs),
svydesign = jvs_svy)
res_glm
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.