method_npar | R Documentation |
Model for the outcome for the mass imputation estimator using loess via stats::loess
.
Estimation of the mean is done using the S_B
probability sample.
method_npar(
y_nons,
X_nons,
X_rand,
svydesign,
weights = NULL,
family_outcome = "gaussian",
start_outcome = NULL,
vars_selection = FALSE,
pop_totals = NULL,
pop_size = NULL,
control_outcome = control_out(),
control_inference = control_inf(),
verbose = FALSE,
se = TRUE
)
y_nons |
target variable from non-probability sample |
X_nons |
a |
X_rand |
a |
svydesign |
a svydesign object |
weights |
case / frequency weights from non-probability sample (default NULL) |
family_outcome |
family for the glm model) |
start_outcome |
a place holder (not used in |
vars_selection |
whether variable selection should be conducted |
pop_totals |
a place holder (not used in |
pop_size |
population size from the |
control_outcome |
controls passed by the |
control_inference |
controls passed by the |
verbose |
parameter passed from the main |
se |
whether standard errors should be calculated |
Analytical variance
The variance of the mean is estimated based on the following approach
(a) non-probability part (S_A
with size n_A
; denoted as var_nonprob
in the result)
\hat{V}_1 = \frac{1}{N^2} \sum_{i=1}^{n_A} \left\lbrace\hat{g}_B(\boldsymbol{x}_i)\right\rbrace^{2} \hat{e}_i^2,
where \hat{e}_i=y_i - \hat{m}(x_i)
is the residual and \hat{g}_B(\boldsymbol{x}_i) = \left\lbrace \pi_B(\boldsymbol{x}_i) \right\rbrace^{-1}
can be estimated
various ways. In the package we estimate \hat{g}_B(\boldsymbol{x}_i)
using \pi_B(\boldsymbol{x}_i)=E(R | \boldsymbol{x})
as suggested by Chen et al. (2022, p. 6). In particular,
we currently support this using stats::loesswith
"gaussian"' family.
(b) probability part (S_B
with size n_B
; denoted as var_prob
in the result)
This part uses functionalities of the {survey}
package and the variance is estimated using the following
equation:
\hat{V}_2=\frac{1}{N^2} \sum_{i=1}^{n_B} \sum_{j=1}^{n_B} \frac{\pi_{i j}-\pi_i \pi_j}{\pi_{i j}}
\frac{\hat{m}(x_i)}{\pi_i} \frac{\hat{m}(x_j)}{\pi_j}.
Note that \hat{V}_2
in principle can be estimated in various ways depending on the type of the design and whether population size is known or not.
an nonprob_method
class which is a list
with the following entries
fitted model object returned by stats::loess
predicted values for the non-probablity sample
predicted values for the probability sample or population totals
coefficients for the model (if available)
an updated surveydesign2
object (new column y_hat_MI
is added)
estimated population mean for the target variable
whether variable selection was performed
variance for the probability sample component (if available)
variance for the non-probability sampl component
model type (character "npar"
)
Chen, S., Yang, S., & Kim, J. K. (2022). Nonparametric mass imputation for data integration. Journal of Survey Statistics and Methodology, 10(1), 1-24.
set.seed(123123123)
N <- 10000
n_a <- 500
n_b <- 1000
n_b1 <- 0.7*n_b
n_b2 <- 0.3*n_b
x1 <- rnorm(N, 2, 1)
x2 <- rnorm(N, 2, 1)
y1 <- rnorm(N, 0.3 + 2*x1+ 2*x2, 1)
y2 <- rnorm(N, 0.3 + 0.5*x1^2+ 0.5*x2^2, 1)
strata <- x1 <= 2
pop <- data.frame(x1, x2, y1, y2, strata)
sample_a <- pop[sample(1:N, n_a),]
sample_a$w_a <- N/n_a
sample_a_svy <- svydesign(ids=~1, weights=~w_a, data=sample_a)
pop1 <- subset(pop, strata == TRUE)
pop2 <- subset(pop, strata == FALSE)
sample_b <- rbind(pop1[sample(1:nrow(pop1), n_b1), ],
pop2[sample(1:nrow(pop2), n_b2), ])
res_y_npar <- nonprob(outcome = y1 + y2 ~ x1 + x2,
data = sample_b,
svydesign = sample_a_svy,
method_outcome = "npar")
res_y_npar
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.