mice.impute.plausible.values: Plausible Value Imputation using Classical Test Theory and...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

This imputation function performs unidimensional plausible value imputation if (subject-wise) measurement errors or the reliability of the scale is known (Mislevy, 1991; see also Asparouhov & Muthen, 2010; Blackwell, Honaker & King, 2011, 2016a, 2016b). The function also allows the input of an individual likelihood obtained by fitting an item response model.

Usage

1
2
3
4
5
6
mice.impute.plausible.values(y, ry, x, type, alpha = NULL, 
    alpha.se = 0, scale.values = NULL, sig.e.miss = 1e+06, 
    like=NULL , theta=NULL , normal.approx=NULL , 
    pviter = 15, imputationWeights = rep(1, length(y)), plausible.value.print = TRUE, 
    pls.facs = NULL, interactions = NULL, 
    quadratics = NULL, extract_data=TRUE, ...)

Arguments

y

Incomplete data vector of length n

ry

Vector of missing data pattern (FALSE – missing, TRUE – observed)

x

Matrix (n \times p) of complete covariates.

type

Type of predictor variables. type=3 refers to items belonging to a scale to be imputed. A cluster (grouping) variable is defined by type=-2. If for some predictors, the cluster means should also be included as predictors, then specify type=2 (see Imputation Model 3 of Example 1).

alpha

A known reliability estimate. An optional standard error of the estimate can be provided in alpha.se

alpha.se

Optional numeric value of the standard error of the alpha reliability estimate if in every iteration a new reliability should be sampled.

scale.values

A list consisting of scale values of scale values and its corresponding standard errors (see Example 1).

sig.e.miss

A standard error of measurement for cases with missing values on a scale

like

Individual likelihood evaluated at theta

theta

Grid of unidimensional latent variable

normal.approx

Logical indicating whether the individual posterior should be approximated by a normal distribution

pviter

Number of iterations in each imputation which should be run until the plausible values are drawn

imputationWeights

Optional vector of sample weights

plausible.value.print

An optional logical indicating whether some information about the plausible value imputation should be printed at the console

pls.facs

Number of PLS factors if PLS dimension reduction is used

interactions

Vector of variable names used for creating interactions

quadratics

Vector of variable names used for creating quadratic terms

extract_data

Logical indicating whether input data should be extracted from parent environment within mice::mice routine

...

Further objects to be passed

Details

The linear model is assumed for drawing plausible values of a variable Y contaminated by measurement error. Assuming Y= θ + e and a linear regression model for θ

θ = \bold{X} β + ε

(plausible value) imputations from the posterior distribution P( θ | Y , \bold{X} ) are drawn. See Mislevy (1991) for details.

Value

A vector of length nrow(x) containing imputed plausible values.

Note

Plausible value imputation is also known as multiple overimputation (Blackwell, Honaker & King, 2016a, 2016b) which is implemented in the Amelia package, see Amelia::moPrep and Amelia::amelia.

Author(s)

Alexander Robitzsch

References

Asparouhov, T., & Muthen, B. (2010). Plausible values for latent variables using Mplus. Technical Report. https://www.statmodel.com/papers.shtml

Blackwell, M., Honaker, J., & King, G. (2011). Multiple overimputation: A unified approach to measurement error and missing data. Technical Report.

Blackwell, M., Honaker, J., & King, G. (2016a). A unified approach to measurement error and missing data: Overview and applications. Sociological Methods & Research, xx, xxx-xxx.

Blackwell, M., Honaker, J., & King, G. (2016b). A unified approach to measurement error and missing data: Details and extensions. Sociological Methods & Research, xx, xxx-xxx.

Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56, 177-196.

See Also

See TAM::tam.latreg for fitting latent regression models.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
## Not run: 
#############################################################################
# EXAMPLE 1: Plausible value imputation for data.ma04 | 2 scales
#############################################################################
	
data(data.ma04)
dat <- data.ma04

# Scale 1 consists of items A1,...,A4
# Scale 2 consists of items B1,...,B5
dat$scale1 <- NA
dat$scale2 <- NA

# empty imputation
imp <- mice::mice( dat , m=0 , maxit=0 )
summary(imp)

# define predictors
predM <- imp$pred
# define imputation methods
impMethod <- imp$method
impMethod <- rep( "norm" , length(impMethod) )
names(impMethod) <- names( imp$method )

# look at missing proportions
colSums( is.na(dat) )

# redefine imputation methods for plausible value imputation
impMethod[ "scale1" ] <- "plausible.values"
predM[ "scale1" ,  ] <- 1
predM[ "scale1" , c("A1" , "A2" ,  "A3" , "A4" ) ] <- 3
    # items corresponding to a scale should be declared by a 3 in the predictor matrix
impMethod[ "scale2" ] <- "plausible.values"
predM[ ,"scale2"  ] <- 0
predM[ "scale2" ,  c("A2","A3","A4","V6","V7") ] <- 1
diag(predM) <- 0

# use imputed scale values as predictors for V5, V6 and V7
predM[ c("V5","V6","V7") , c("scale1","scale2" ) ] <- 1
# exclude for V5, V6 and V7 the items of scales A and B as predictors
predM[ c("V5","V6","V7") , c( paste0("A",2:4) , paste0("B",1:5) ) ] <- 0
# exclude 'group' as a predictor
predM[,"group"] <- 0

# look at imputation method and predictor matrix
impMethod
predM

#-------------------------------
# Parameter for imputation
#***
# scale 1 (A1,...,A4)
# known Cronbach's Alpha
alpha <- NULL
alpha <- list( "scale1" = .8 )
alpha.se <- list( "scale1" = .05 )  # sample alpha with a standard deviation of .05

#***
# scale 2 (B1,...,B5)
# means and SE's of scale scores are assumed to be known
M.scale2 <- rowMeans( dat[  , paste("B",1:5,sep="")  ] )
# M.scale2[ is.na( m1) ] <- mean( M.scale2 , na.rm=TRUE )
SE.scale2 <- rep( sqrt( stats::var(M.scale2,na.rm=T)*(1-.8) ) , nrow(dat) ) 
# => heterogeneous measurement errors are allowed
scale.values <- list( "scale2" = list( "M" = M.scale2 , "SE" = SE.scale2 ) )

#*** Imputation Model 1: Imputation four using parallel chains
imp1 <- mice::mice( dat , predictorMatrix = predM , m = 4, maxit = 5 ,  
          alpha.se = alpha.se ,  imputationMethod = impMethod ,  allow.na = TRUE  , alpha = alpha,
          scale.values = scale.values  )
summary(imp1)

# extract first imputed dataset
dat11 <- mice::complete( imp , 1 )

#*** Imputation Model 2: Imputation using one long chain
imp2 <- mice.1chain( dat , predictorMatrix = predM , burnin=10 , iter=20 , Nimp=4 , 
          alpha.se = alpha.se ,  imputationMethod = impMethod ,  allow.na = TRUE  , alpha = alpha,
          scale.values = scale.values  )
summary(imp2)	

#-------------
#*** Imputation Model 3: Imputation including  group level variables

# use group indicator for plausible value estimation
predM[ "scale1" , "group" ] <- -2
# V7 and B1 should be aggregated at the group level
predM[ "scale1" , c("V7","B1") ] <- 2
predM[ "scale2" , "group" ] <- -2
predM[ "scale2" , c("V7","A1") ] <- 2

# perform single imputation (m=1)
imp <- mice::mice( dat , predictorMatrix = predM , m = 1 , maxit=10 , 
            imputationMethod = impMethod ,  allow.na = TRUE  , alpha = alpha,
            scale.values = scale.values )
dat10 <- mice::complete(imp)

# multilevel model
library(lme4)
mod <- lme4::lmer( scale1 ~ ( 1 | group) , data = dat11 )
summary(mod)

mod <- lme4::lmer( scale1 ~ ( 1 | group) , data = dat10)
summary(mod)

#############################################################################
# SIMULATED EXAMPLE 2: Plausible value imputation with chained equations
#############################################################################

# - simulate a latent variable theta and dichotomous item responses
# - two covariates X in which the second covariate has measurement error

library(sirt)
library(TAM)
library(lavaan)

set.seed(7756)
N <- 2000    # number of persons
I <- 10     # number of items

# simulate covariates
X <- MASS::mvrnorm( N , mu=c(0,0) , Sigma = matrix( c(1,.5,.5,1) ,2 ,2 ) )
colnames(X) <- paste0("X",1:2)
# second covariate with measurement error with variance var.err
var.err <- .3
X.err <- X
X.err[,2] <- X[,2] + stats::rnorm(N, sd = sqrt(var.err) )
# simulate theta
theta <- .5*X[,1] + .4*X[,2] + stats::rnorm( N , sd = .5 )
# simulate item responses
itemdiff <- seq( -2 , 2 , length=I)  # item difficulties
dat <- sirt::sim.raschtype( theta , b = itemdiff )

#***********************
#*** Model 0: Regression model with true variables
mod0 <- stats::lm( theta ~ X )
summary(mod0)

#**********************
# plausible value imputation for abilities and error-prone
# covariates using the mice package

# creating the likelihood for plausible value for abilities
mod11 <- TAM::tam.mml( dat )
likePV <- IRT.likelihood(mod11)
# creating the likelihood for error-prone covariate X2
# The known measurement error variance is 0.3.
lavmodel <- "
  X2true =~ 1*X2
  X2 ~~ 0.3*X2
    "
mod12 <- lavaan::cfa( lavmodel , data = as.data.frame(X.err) )
summary(mod12)
likeX2 <- IRTLikelihood.cfa( data= X.err , cfaobj=mod12)
str(likeX2)

#-- create data input for mice package
data <- data.frame( "PVA" = NA , "X1" = X[,1] , "X2" = NA  ) 
vars <- colnames(data)
V <- length(vars)
predictorMatrix <- 1 - diag(V)
rownames(predictorMatrix) <- colnames(predictorMatrix) <- vars
imputationMethod <- rep("norm" , V )
names(imputationMethod) <- vars
imputationMethod[c("PVA","X2")] <- "plausible.values"

#-- create argument lists for plausible value imputation
# likelihood and theta grid of plausible value derived from IRT model
like <- list( "PVA" = likePV  , "X2" = likeX2 )
theta <- list( "PVA" = attr(likePV,"theta") ,
                "X2" = attr(likeX2 , "theta") )                     
#-- initial imputations
data.init <- data
data.init$PVA <- mod11$person$EAP
data.init$X2 <- X.err[,"X2"]

#-- imputation using the mice and miceadds package
imp1 <- mice::mice( as.matrix(data) , predictorMatrix = predictorMatrix , m = 4, 
            maxit = 6 , imputationMethod = imputationMethod ,  allow.na = TRUE ,
            theta=theta , like=like , data.init=data.init )
summary(imp1)

# compute linear regression
mod4a <- with( imp1 , stats::lm( PVA ~ X1 + X2 ) )
summary( mice::pool(mod4a) )

## End(Not run)


Search within the miceadds package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.