mice.impute.pmm3: Imputation by Predictive Mean Matching (in 'miceadds')

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/mice.impute.pmm3.R

Description

This function imputes values by predictive mean matching like the mice::mice.impute.pmm method in the mice package.

Usage

1
2
3
4
mice.impute.pmm3(y, ry, x, donors = 3, noise = 10^5, ridge = 10^(-5), ...)
mice.impute.pmm4(y, ry, x, donors = 3, noise = 10^5, ridge = 10^(-5), ...)
mice.impute.pmm5(y, ry, x, donors = 3, noise = 10^5, ridge = 10^(-5), ...)
mice.impute.pmm6(y, ry, x, donors = 3, noise = 10^5, ridge = 10^(-5), ...)

Arguments

y

Incomplete data vector of length n

ry

Vector of missing data pattern (FALSE – missing, TRUE – observed)

x

Matrix (n x p) of complete covariates.

donors

Number of donors used for imputation

noise

Numerical value to break ties

ridge

Ridge parameter in the diagonal of \bold{X}'\bold{X}

...

Further arguments to be passed

Details

The imputation method pmm3 imitates mice::mice.impute.pmm imputation method in mice.

The imputation method pmm4 ignores ties in predicted y values. With many predictors, this does not probably implies any substantial problem.

The imputation method pmm5 suffers from the same problem. Contrary to the other PMM methods, it searches D donors (specified by donors) smaller than the predicted value and D donors larger than the predicted value and randomly samples a value from this set of 2 \cdot D donors.

The imputation method pmm6 is just the Rcpp implementation of pmm5.

Value

A vector of length nmis=sum(!ry) with imputed values.

Author(s)

Alexander Robitzsch

See Also

See data.largescale and data.smallscale for speed comparisons of different functions for predictive mean matching.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
## Not run: 
#############################################################################
# SIMULATED EXAMPLE 1: Two variables x and y with missing y
#############################################################################	
set.seed(1413)

rho <- .6   # correlation between x and y
N <- 6800    # number of cases
x <- stats::rnorm(N)
My <- .35   # mean of y
y.com <- y <- My + rho * x + stats::rnorm(N , sd = sqrt( 1 - rho^2 ) )

# create missingness on y depending on rho.MAR parameter
rho.mar <- .4    # correlation response tendency z and x
missrate <- .25  # missing response rate
# simulate response tendency z and missings on y
z <- rho.mar * x + stats::rnorm(N , sd = sqrt( 1 - rho.mar^2 ) )
y[ z < stats::qnorm( missrate ) ] <- NA
dat <- data.frame(x , y )

# mice imputation
impmethod <- rep("pmm" , 2 )
names(impmethod) <- colnames(dat)

# pmm (in mice)
imp1 <- mice::mice( as.matrix(dat) , m=1 , maxit=1 , imputationMethod=impmethod) 
# pmm3 (in miceadds)
imp3 <- mice::mice( as.matrix(dat) , m=1 , maxit=1 , 
           imputationMethod=gsub("pmm","pmm3" ,impmethod)  )
# pmm4 (in miceadds)
imp4 <- mice::mice( as.matrix(dat) , m=1 , maxit=1 , 
           imputationMethod=gsub("pmm","pmm4" ,impmethod)  )
# pmm5 (in miceadds)
imp5 <- mice::mice( as.matrix(dat) , m=1 , maxit=1 , 
           imputationMethod=gsub("pmm","pmm5" ,impmethod)  )
# pmm6 (in miceadds)
imp6 <- mice::mice( as.matrix(dat) , m=1 , maxit=1 , 
           imputationMethod=gsub("pmm","pmm6" ,impmethod)  )

dat.imp1 <- mice::complete( imp1 , 1 )
dat.imp3 <- mice::complete( imp3 , 1 )
dat.imp4 <- mice::complete( imp4 , 1 )
dat.imp5 <- mice::complete( imp5 , 1 )
dat.imp6 <- mice::complete( imp6 , 1 )

dfr <- NULL
# means
dfr <- rbind( dfr , c( mean( y.com ) , mean( y , na.rm=TRUE ) , mean( dat.imp1$y)  ,
    mean( dat.imp3$y) , mean( dat.imp4$y)  , mean( dat.imp5$y)  ,  mean( dat.imp6$y)  ) )
# SD
dfr <- rbind( dfr , c( stats::sd( y.com ) , stats::sd( y , na.rm=TRUE ) , 
      stats::sd( dat.imp1$y), stats::sd( dat.imp3$y) , stats::sd( dat.imp4$y), 
      stats::sd( dat.imp5$y)  , stats::sd( dat.imp6$y) ) )
# correlations
dfr <- rbind( dfr , c( stats::cor( x,y.com ), 
    stats::cor( x[ ! is.na(y) ] , y[ ! is.na(y) ] ) , 
    stats::cor( dat.imp1$x , dat.imp1$y) , stats::cor( dat.imp3$x , dat.imp3$y) , 
    stats::cor( dat.imp4$x , dat.imp4$y) , stats::cor( dat.imp5$x , dat.imp5$y)  , 
    stats::cor( dat.imp6$x , dat.imp6$y)
        ) )
rownames(dfr) <- c("M_y" , "SD_y" , "cor_xy" )
colnames(dfr) <- c("compl" , "ld" , "pmm" , "pmm3" , "pmm4" , "pmm5","pmm6")
##           compl     ld    pmm   pmm3   pmm4   pmm5   pmm6
##   M_y    0.3306 0.4282 0.3314 0.3228 0.3223 0.3264 0.3310
##   SD_y   0.9910 0.9801 0.9873 0.9887 0.9891 0.9882 0.9877
##   cor_xy 0.6057 0.5950 0.6072 0.6021 0.6100 0.6057 0.6069

## End(Not run)

miceadds documentation built on Aug. 9, 2017, 5:04 p.m.