mice.impute.plausible.values | R Documentation |
This imputation function performs unidimensional plausible value imputation if (subject-wise) measurement errors or the reliability of the scale is known (Mislevy, 1991; see also Asparouhov & Muthen, 2010; Blackwell, Honaker & King, 2011, 2017a, 2017b). The function also allows the input of an individual likelihood obtained by fitting an item response model.
mice.impute.plausible.values(y, ry, x, type, alpha=NULL,
alpha.se=0, scale.values=NULL, sig.e.miss=1e+06,
like=NULL, theta=NULL, normal.approx=NULL,
pviter=15, imputationWeights=rep(1, length(y)), plausible.value.print=TRUE,
pls.facs=NULL, interactions=NULL, quadratics=NULL, extract_data=TRUE,
control_latreg=list( progress=FALSE, ridge=1e-5 ), ...)
y |
Incomplete data vector of length |
ry |
Vector of missing data pattern ( |
x |
Matrix ( |
type |
Type of predictor variables. |
alpha |
A known reliability estimate. An optional standard error of the estimate
can be provided in |
alpha.se |
Optional numeric value of the standard error of the |
scale.values |
A list consisting of scale values of scale values and its corresponding standard errors (see Example 1). |
sig.e.miss |
A standard error of measurement for cases with missing values on a scale |
like |
Individual likelihood evaluated at |
theta |
Grid of unidimensional latent variable |
normal.approx |
Logical indicating whether the individual posterior should be approximated by a normal distribution |
pviter |
Number of iterations in each imputation which should be run until the plausible values are drawn |
imputationWeights |
Optional vector of sample weights |
plausible.value.print |
An optional logical indicating whether some information about the plausible value imputation should be printed at the console |
pls.facs |
Number of PLS factors if PLS dimension reduction is used |
interactions |
Vector of variable names used for creating interactions |
quadratics |
Vector of variable names used for creating quadratic terms |
extract_data |
Logical indicating whether input data should be extracted
from parent environment within |
control_latreg |
Control arguments for |
... |
Further objects to be passed |
The linear model is assumed for drawing plausible values of a variable
Y
contaminated by measurement error. Assuming Y=\theta + e
and a linear regression model for \theta
\theta=\bold{X} \beta + \epsilon
(plausible value) imputations from the posterior distribution
P( \theta | Y, \bold{X} )
are drawn. See Mislevy (1991) for details.
A vector of length nrow(x)
containing imputed plausible values.
Plausible value imputation is also known as multiple overimputation
(Blackwell, Honaker & King, 2016a, 2016b) which is implemented
in the Amelia package, see Amelia::moPrep
and Amelia::amelia
.
Asparouhov, T., & Muthen, B. (2010). Plausible values for latent variables using Mplus. Technical Report. https://www.statmodel.com/papers.shtml
Blackwell, M., Honaker, J., & King, G. (2011). Multiple overimputation: A unified approach to measurement error and missing data. Technical Report.
Blackwell, M., Honaker, J., & King, G. (2017a). A unified approach to measurement error and missing data: Overview and applications. Sociological Methods & Research, 46(3), 303-341.
Blackwell, M., Honaker, J., & King, G. (2017b). A unified approach to measurement error and missing data: Details and extensions. Sociological Methods & Research, 46(3), 342-369.
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56, 177-196.
See TAM::tam.latreg
for fitting latent regression
models.
## Not run:
#############################################################################
# EXAMPLE 1: Plausible value imputation for data.ma04 | 2 scales
#############################################################################
data(data.ma04, package="miceadds")
dat <- data.ma04
# Scale 1 consists of items A1,...,A4
# Scale 2 consists of items B1,...,B5
dat$scale1 <- NA
dat$scale2 <- NA
#** inits imputation method and predictor matrix
res <- miceadds::mice_inits(dat, ignore=c("group") )
predM <- res$predictorMatrix
impMethod <- res$method
impMethod <- gsub("pmm", "norm", impMethod )
# look at missing proportions
colSums( is.na(dat) )
# redefine imputation methods for plausible value imputation
impMethod[ "scale1" ] <- "plausible.values"
predM[ "scale1", ] <- 1
predM[ "scale1", c("A1", "A2", "A3", "A4" ) ] <- 3
# items corresponding to a scale should be declared by a 3 in the predictor matrix
impMethod[ "scale2" ] <- "plausible.values"
predM[,"scale2" ] <- 0
predM[ "scale2", c("A2","A3","A4","V6","V7") ] <- 1
diag(predM) <- 0
# use imputed scale values as predictors for V5, V6 and V7
predM[ c("V5","V6","V7"), c("scale1","scale2" ) ] <- 1
# exclude for V5, V6 and V7 the items of scales A and B as predictors
predM[ c("V5","V6","V7"), c( paste0("A",2:4), paste0("B",1:5) ) ] <- 0
# exclude 'group' as a predictor
predM[,"group"] <- 0
# look at imputation method and predictor matrix
impMethod
predM
#-------------------------------
# Parameter for imputation
#***
# scale 1 (A1,...,A4)
# known Cronbach's Alpha
alpha <- NULL
alpha <- list( "scale1"=.8 )
alpha.se <- list( "scale1"=.05 ) # sample alpha with a standard deviation of .05
#***
# scale 2 (B1,...,B5)
# means and SE's of scale scores are assumed to be known
M.scale2 <- rowMeans( dat[, paste("B",1:5,sep="") ] )
# M.scale2[ is.na( m1) ] <- mean( M.scale2, na.rm=TRUE )
SE.scale2 <- rep( sqrt( stats::var(M.scale2,na.rm=T)*(1-.8) ), nrow(dat) )
#=> heterogeneous measurement errors are allowed
scale.values <- list( "scale2"=list( "M"=M.scale2, "SE"=SE.scale2 ) )
#*** Imputation Model 1: Imputation four using parallel chains
imp1 <- mice::mice( dat, predictorMatrix=predM, m=4, maxit=5,
alpha.se=alpha.se, method=impMethod, allow.na=TRUE, alpha=alpha,
scale.values=scale.values )
summary(imp1)
# extract first imputed dataset
dat11 <- mice::complete( imp, 1 )
#*** Imputation Model 2: Imputation using one long chain
imp2 <- miceadds::mice.1chain( dat, predictorMatrix=predM, burnin=10, iter=20, Nimp=4,
alpha.se=alpha.se, method=impMethod, allow.na=TRUE, alpha=alpha,
scale.values=scale.values )
summary(imp2)
#-------------
#*** Imputation Model 3: Imputation including group level variables
# use group indicator for plausible value estimation
predM[ "scale1", "group" ] <- -2
# V7 and B1 should be aggregated at the group level
predM[ "scale1", c("V7","B1") ] <- 2
predM[ "scale2", "group" ] <- -2
predM[ "scale2", c("V7","A1") ] <- 2
# perform single imputation (m=1)
imp <- mice::mice( dat, predictorMatrix=predM, m=1, maxit=10,
method=impMethod, allow.na=TRUE, alpha=alpha,
scale.values=scale.values )
dat10 <- mice::complete(imp)
# multilevel model
library(lme4)
mod <- lme4::lmer( scale1 ~ ( 1 | group), data=dat11 )
summary(mod)
mod <- lme4::lmer( scale1 ~ ( 1 | group), data=dat10)
summary(mod)
#############################################################################
# EXAMPLE 2: Plausible value imputation with chained equations
#############################################################################
# - simulate a latent variable theta and dichotomous item responses
# - two covariates X in which the second covariate has measurement error
library(sirt)
library(TAM)
library(lavaan)
set.seed(7756)
N <- 2000 # number of persons
I <- 10 # number of items
# simulate covariates
X <- MASS::mvrnorm( N, mu=c(0,0), Sigma=matrix( c(1,.5,.5,1),2,2 ) )
colnames(X) <- paste0("X",1:2)
# second covariate with measurement error with variance var.err
var.err <- .3
X.err <- X
X.err[,2] <- X[,2] + stats::rnorm(N, sd=sqrt(var.err) )
# simulate theta
theta <- .5*X[,1] + .4*X[,2] + stats::rnorm( N, sd=.5 )
# simulate item responses
itemdiff <- seq( -2, 2, length=I) # item difficulties
dat <- sirt::sim.raschtype( theta, b=itemdiff )
#***********************
#*** Model 0: Regression model with true variables
mod0 <- stats::lm( theta ~ X )
summary(mod0)
#**********************
# plausible value imputation for abilities and error-prone
# covariates using the mice package
# creating the likelihood for plausible value for abilities
mod11 <- TAM::tam.mml( dat )
likePV <- IRT.likelihood(mod11)
# creating the likelihood for error-prone covariate X2
# The known measurement error variance is 0.3.
lavmodel <- "
X2true=~ 1*X2
X2 ~~ 0.3*X2
"
mod12 <- lavaan::cfa( lavmodel, data=as.data.frame(X.err) )
summary(mod12)
likeX2 <- IRTLikelihood.cfa( data=X.err, cfaobj=mod12)
str(likeX2)
#-- create data input for mice package
data <- data.frame( "PVA"=NA, "X1"=X[,1], "X2"=NA )
vars <- colnames(data)
V <- length(vars)
predictorMatrix <- 1 - diag(V)
rownames(predictorMatrix) <- colnames(predictorMatrix) <- vars
method <- rep("norm", V )
names(method) <- vars
method[c("PVA","X2")] <- "plausible.values"
#-- create argument lists for plausible value imputation
# likelihood and theta grid of plausible value derived from IRT model
like <- list( "PVA"=likePV, "X2"=likeX2 )
theta <- list( "PVA"=attr(likePV,"theta"),
"X2"=attr(likeX2, "theta") )
#-- initial imputations
data.init <- data
data.init$PVA <- mod11$person$EAP
data.init$X2 <- X.err[,"X2"]
#-- imputation using the mice and miceadds package
imp1 <- mice::mice( as.matrix(data), predictorMatrix=predictorMatrix, m=4,
maxit=6, method=method, allow.na=TRUE,
theta=theta, like=like, data.init=data.init )
summary(imp1)
# compute linear regression
mod4a <- with( imp1, stats::lm( PVA ~ X1 + X2 ) )
summary( mice::pool(mod4a) )
#############################################################################
# EXAMPLE 3: Plausible value imputation with known error variance
#############################################################################
#---- simulate data
set.seed(987)
N <- 1000 # number of persons
var_err <- .4 # error variance
dat <- data.frame( x1=stats::rnorm(N), x2=stats::rnorm(N) )
dat$theta <- .3 * dat$x1 - .5*dat$x2 + stats::rnorm(N)
dat$y <- dat$theta + stats::rnorm( N, sd=sqrt(var_err) )
#-- linear regression for measurement-error-free data
mod0a <- stats::lm( theta ~ x1 + x2, data=dat )
summary(mod0a)
#-- linear regression for data with measurement error
mod0b <- stats::lm( y ~ x1 + x2, data=dat )
summary(mod0b)
#-- process data for imputation
dat1 <- dat
dat1$theta <- NA
scale.values <- list( "theta"=list( "M"=dat$y, "SE"=rep(sqrt(var_err),N )))
dat1$y <- NULL
cn <- colnames(dat1)
V <- length(cn)
method <- rep("", length(cn) )
names(method) <- cn
method["theta"] <- "plausible.values"
#-- imputation in mice
imp <- mice::mice( dat1, maxit=1, m=5, allow.na=TRUE, method=method,
scale.values=scale.values )
summary(imp)
#-- inspect first dataset
summary( mice::complete(imp, action=1) )
#-- linear regression based on imputed datasets
mod1 <- with(imp, stats::lm( theta ~ x1 + x2 ) )
summary( mice::pool(mod1) )
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.