mice.impute.plausible.values: Plausible Value Imputation using Classical Test Theory and...

mice.impute.plausible.valuesR Documentation

Plausible Value Imputation using Classical Test Theory and Based on Individual Likelihood

Description

This imputation function performs unidimensional plausible value imputation if (subject-wise) measurement errors or the reliability of the scale is known (Mislevy, 1991; see also Asparouhov & Muthen, 2010; Blackwell, Honaker & King, 2011, 2017a, 2017b). The function also allows the input of an individual likelihood obtained by fitting an item response model.

Usage

mice.impute.plausible.values(y, ry, x, type, alpha=NULL,
    alpha.se=0, scale.values=NULL, sig.e.miss=1e+06,
    like=NULL, theta=NULL, normal.approx=NULL,
    pviter=15, imputationWeights=rep(1, length(y)), plausible.value.print=TRUE,
    pls.facs=NULL, interactions=NULL, quadratics=NULL, extract_data=TRUE,
    control_latreg=list( progress=FALSE, ridge=1e-5 ),  ...)

Arguments

y

Incomplete data vector of length n

ry

Vector of missing data pattern (FALSE – missing, TRUE – observed)

x

Matrix (n \times p) of complete covariates.

type

Type of predictor variables. type=3 refers to items belonging to a scale to be imputed. A cluster (grouping) variable is defined by type=-2. If for some predictors, the cluster means should also be included as predictors, then specify type=2 (see Imputation Model 3 of Example 1).

alpha

A known reliability estimate. An optional standard error of the estimate can be provided in alpha.se

alpha.se

Optional numeric value of the standard error of the alpha reliability estimate if in every iteration a new reliability should be sampled.

scale.values

A list consisting of scale values of scale values and its corresponding standard errors (see Example 1).

sig.e.miss

A standard error of measurement for cases with missing values on a scale

like

Individual likelihood evaluated at theta

theta

Grid of unidimensional latent variable

normal.approx

Logical indicating whether the individual posterior should be approximated by a normal distribution

pviter

Number of iterations in each imputation which should be run until the plausible values are drawn

imputationWeights

Optional vector of sample weights

plausible.value.print

An optional logical indicating whether some information about the plausible value imputation should be printed at the console

pls.facs

Number of PLS factors if PLS dimension reduction is used

interactions

Vector of variable names used for creating interactions

quadratics

Vector of variable names used for creating quadratic terms

extract_data

Logical indicating whether input data should be extracted from parent environment within mice::mice routine

control_latreg

Control arguments for TAM::tam.latreg

...

Further objects to be passed

Details

The linear model is assumed for drawing plausible values of a variable Y contaminated by measurement error. Assuming Y=\theta + e and a linear regression model for \theta

\theta=\bold{X} \beta + \epsilon

(plausible value) imputations from the posterior distribution P( \theta | Y, \bold{X} ) are drawn. See Mislevy (1991) for details.

Value

A vector of length nrow(x) containing imputed plausible values.

Note

Plausible value imputation is also known as multiple overimputation (Blackwell, Honaker & King, 2016a, 2016b) which is implemented in the Amelia package, see Amelia::moPrep and Amelia::amelia.

References

Asparouhov, T., & Muthen, B. (2010). Plausible values for latent variables using Mplus. Technical Report. https://www.statmodel.com/papers.shtml

Blackwell, M., Honaker, J., & King, G. (2011). Multiple overimputation: A unified approach to measurement error and missing data. Technical Report.

Blackwell, M., Honaker, J., & King, G. (2017a). A unified approach to measurement error and missing data: Overview and applications. Sociological Methods & Research, 46(3), 303-341.

Blackwell, M., Honaker, J., & King, G. (2017b). A unified approach to measurement error and missing data: Details and extensions. Sociological Methods & Research, 46(3), 342-369.

Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56, 177-196.

See Also

See TAM::tam.latreg for fitting latent regression models.

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: Plausible value imputation for data.ma04 | 2 scales
#############################################################################

data(data.ma04, package="miceadds")
dat <- data.ma04

# Scale 1 consists of items A1,...,A4
# Scale 2 consists of items B1,...,B5
dat$scale1 <- NA
dat$scale2 <- NA

#** inits imputation method and predictor matrix
res <- miceadds::mice_inits(dat, ignore=c("group") )
predM <- res$predictorMatrix
impMethod <- res$method
impMethod <- gsub("pmm", "norm", impMethod )

# look at missing proportions
colSums( is.na(dat) )

# redefine imputation methods for plausible value imputation
impMethod[ "scale1" ] <- "plausible.values"
predM[ "scale1",  ] <- 1
predM[ "scale1", c("A1", "A2",  "A3", "A4" ) ] <- 3
    # items corresponding to a scale should be declared by a 3 in the predictor matrix
impMethod[ "scale2" ] <- "plausible.values"
predM[,"scale2"  ] <- 0
predM[ "scale2",  c("A2","A3","A4","V6","V7") ] <- 1
diag(predM) <- 0

# use imputed scale values as predictors for V5, V6 and V7
predM[ c("V5","V6","V7"), c("scale1","scale2" ) ] <- 1
# exclude for V5, V6 and V7 the items of scales A and B as predictors
predM[ c("V5","V6","V7"), c( paste0("A",2:4), paste0("B",1:5) ) ] <- 0
# exclude 'group' as a predictor
predM[,"group"] <- 0

# look at imputation method and predictor matrix
impMethod
predM

#-------------------------------
# Parameter for imputation
#***
# scale 1 (A1,...,A4)
# known Cronbach's Alpha
alpha <- NULL
alpha <- list( "scale1"=.8 )
alpha.se <- list( "scale1"=.05 )  # sample alpha with a standard deviation of .05

#***
# scale 2 (B1,...,B5)
# means and SE's of scale scores are assumed to be known
M.scale2 <- rowMeans( dat[, paste("B",1:5,sep="")  ] )
# M.scale2[ is.na( m1) ] <- mean( M.scale2, na.rm=TRUE )
SE.scale2 <- rep( sqrt( stats::var(M.scale2,na.rm=T)*(1-.8) ), nrow(dat) )
#=> heterogeneous measurement errors are allowed
scale.values <- list( "scale2"=list( "M"=M.scale2, "SE"=SE.scale2 ) )

#*** Imputation Model 1: Imputation four using parallel chains
imp1 <- mice::mice( dat, predictorMatrix=predM, m=4, maxit=5,
          alpha.se=alpha.se, method=impMethod,  allow.na=TRUE, alpha=alpha,
          scale.values=scale.values  )
summary(imp1)

# extract first imputed dataset
dat11 <- mice::complete( imp, 1 )

#*** Imputation Model 2: Imputation using one long chain
imp2 <- miceadds::mice.1chain( dat, predictorMatrix=predM, burnin=10, iter=20, Nimp=4,
          alpha.se=alpha.se, method=impMethod,  allow.na=TRUE, alpha=alpha,
          scale.values=scale.values )
summary(imp2)

#-------------
#*** Imputation Model 3: Imputation including  group level variables

# use group indicator for plausible value estimation
predM[ "scale1", "group" ] <- -2
# V7 and B1 should be aggregated at the group level
predM[ "scale1", c("V7","B1") ] <- 2
predM[ "scale2", "group" ] <- -2
predM[ "scale2", c("V7","A1") ] <- 2

# perform single imputation (m=1)
imp <- mice::mice( dat, predictorMatrix=predM, m=1, maxit=10,
            method=impMethod,  allow.na=TRUE, alpha=alpha,
            scale.values=scale.values )
dat10 <- mice::complete(imp)

# multilevel model
library(lme4)
mod <- lme4::lmer( scale1 ~ ( 1 | group), data=dat11 )
summary(mod)

mod <- lme4::lmer( scale1 ~ ( 1 | group), data=dat10)
summary(mod)

#############################################################################
# EXAMPLE 2: Plausible value imputation with chained equations
#############################################################################

# - simulate a latent variable theta and dichotomous item responses
# - two covariates X in which the second covariate has measurement error

library(sirt)
library(TAM)
library(lavaan)

set.seed(7756)
N <- 2000    # number of persons
I <- 10     # number of items

# simulate covariates
X <- MASS::mvrnorm( N, mu=c(0,0), Sigma=matrix( c(1,.5,.5,1),2,2 ) )
colnames(X) <- paste0("X",1:2)
# second covariate with measurement error with variance var.err
var.err <- .3
X.err <- X
X.err[,2] <- X[,2] + stats::rnorm(N, sd=sqrt(var.err) )
# simulate theta
theta <- .5*X[,1] + .4*X[,2] + stats::rnorm( N, sd=.5 )
# simulate item responses
itemdiff <- seq( -2, 2, length=I)  # item difficulties
dat <- sirt::sim.raschtype( theta, b=itemdiff )

#***********************
#*** Model 0: Regression model with true variables
mod0 <- stats::lm( theta ~ X )
summary(mod0)

#**********************
# plausible value imputation for abilities and error-prone
# covariates using the mice package

# creating the likelihood for plausible value for abilities
mod11 <- TAM::tam.mml( dat )
likePV <- IRT.likelihood(mod11)
# creating the likelihood for error-prone covariate X2
# The known measurement error variance is 0.3.
lavmodel <- "
  X2true=~ 1*X2
  X2 ~~ 0.3*X2
    "
mod12 <- lavaan::cfa( lavmodel, data=as.data.frame(X.err) )
summary(mod12)
likeX2 <- IRTLikelihood.cfa( data=X.err, cfaobj=mod12)
str(likeX2)

#-- create data input for mice package
data <- data.frame( "PVA"=NA, "X1"=X[,1], "X2"=NA  )
vars <- colnames(data)
V <- length(vars)
predictorMatrix <- 1 - diag(V)
rownames(predictorMatrix) <- colnames(predictorMatrix) <- vars
method <- rep("norm", V )
names(method) <- vars
method[c("PVA","X2")] <- "plausible.values"

#-- create argument lists for plausible value imputation
# likelihood and theta grid of plausible value derived from IRT model
like <- list( "PVA"=likePV, "X2"=likeX2 )
theta <- list( "PVA"=attr(likePV,"theta"),
                "X2"=attr(likeX2, "theta") )
#-- initial imputations
data.init <- data
data.init$PVA <- mod11$person$EAP
data.init$X2 <- X.err[,"X2"]

#-- imputation using the mice and miceadds package
imp1 <- mice::mice( as.matrix(data), predictorMatrix=predictorMatrix, m=4,
            maxit=6, method=method,  allow.na=TRUE,
            theta=theta, like=like, data.init=data.init )
summary(imp1)

# compute linear regression
mod4a <- with( imp1, stats::lm( PVA ~ X1 + X2 ) )
summary( mice::pool(mod4a) )

#############################################################################
# EXAMPLE 3: Plausible value imputation with known error variance
#############################################################################

#---- simulate data
set.seed(987)
N <- 1000         # number of persons
var_err <- .4     # error variance
dat <- data.frame( x1=stats::rnorm(N), x2=stats::rnorm(N) )
dat$theta <- .3 * dat$x1 - .5*dat$x2 + stats::rnorm(N)
dat$y <- dat$theta + stats::rnorm( N, sd=sqrt(var_err) )

#-- linear regression for measurement-error-free data
mod0a <- stats::lm( theta ~ x1 + x2, data=dat )
summary(mod0a)
#-- linear regression for data with measurement error
mod0b <- stats::lm( y ~ x1 + x2, data=dat )
summary(mod0b)

#-- process data for imputation

dat1 <- dat
dat1$theta <- NA
scale.values <- list( "theta"=list( "M"=dat$y, "SE"=rep(sqrt(var_err),N )))
dat1$y <- NULL

cn <- colnames(dat1)
V <- length(cn)
method <- rep("", length(cn) )
names(method) <- cn
method["theta"] <- "plausible.values"

#-- imputation in mice
imp <- mice::mice( dat1, maxit=1, m=5, allow.na=TRUE, method=method,
            scale.values=scale.values )
summary(imp)

#-- inspect first dataset
summary( mice::complete(imp, action=1) )

#-- linear regression based on imputed datasets
mod1 <- with(imp, stats::lm( theta ~ x1 + x2 ) )
summary( mice::pool(mod1) )

## End(Not run)

miceadds documentation built on May 29, 2024, 11:05 a.m.