MIPCA: Multiple Imputation with PCA
In missMDA: Handling Missing Values with Multivariate Data Analysis

MIPCA

R Documentation

Multiple Imputation with PCA

Description

MIPCA performs Multiple Imputation with a PCA model. Can be used as a preliminary step to perform Multiple Imputation in PCA.

Usage

MIPCA(X, ncp = 2, scale = TRUE, method=c("Regularized","EM"), threshold = 1e-04, 
    nboot = 100,  method.mi="Boot", Lstart=1000, L=100, verbose=FALSE)

Arguments

`X`	a data.frame with continuous variables containing missing values
`ncp`	integer corresponding to the number of components used to reconstruct data with the PCA reconstruction formulae
`scale`	boolean. By default TRUE leading to a same weight for each variable
`method`	"Regularized" by default or "EM"
`threshold`	the threshold for the criterion convergence
`nboot`	the number of imputed datasets
`method.mi`	a string. If "Bayes", the uncertainty on the parameters of the imputation model is taken into account using a Bayesian treatment of PCA. By default "Boot" leading to a MI which reflect uncertainty a bootstrap procedure. See details.
`Lstart`	number of iterations for the burn-in period (only used if method.mi="Bayes")
`L`	number of skipped iterations to keep one imputed data set after the burn-in period (only used if method.mi="Bayes")
`verbose`	use verbose=TRUE for screen printing of iteration numbers

Details

MIPCA generates nboot imputed datasets from a PCA model. The observed values are the same from one dataset to the others whereas the imputed values change. The variation among the imputed values reflects the variability with which missing values can be predicted. The multiple imputation is proper in the sense of Little and Rubin (2002) since it takes into account the variability of the parameters. Two versions are available: multiple imputation using a parametric bootstrap (Josse, J., Husson, F. (2010)) and multiple imputation using a Bayesian treatment of the PCA model (Audigier et al 2015). The methods differ by the way in which the variability due to missing values is reflected. The method used is controlled by the method.mi argument. By default, MIPCA uses the parametric bootstrap method.mi="Boot". This bootstrap method is more recommended to evaluate uncertainty in PCA (through confidence ellipses). Otherwise, the Bayesian method can be used by specifying the argument method.mi="Bayes". It is based on an iterative algorithm which alternates imputation of the data set and draw of the PCA parameters in a posterior distribution. These steps are repeated Lstart times to reach a convergence. Then, one imputed data set is kept each L iterations to ensure independence between imputed values from a data set to another. The Bayesian method is more recommanded to apply a statistical method on an incomplete data set.

Value

`res.imputePCA`	A matrix corresponding to the imputed dataset obtained with the function imputePCA (the completed dataset)
`res.MI`	A list of data frames corresponding to the nboot imputed data sets
`call`	the matched call

Author(s)

Francois Husson francois.husson@institut-agro.fr, Julie Josse julie.josse@polytechnique.edu and Vincent Audigier

References

Josse, J., Husson, F. (2011). Multiple Imputation in PCA. Advances in Data Analysis and Classification.

Audigier, V. Josse, J., Husson, F. (2015). Multiple imputation for continuous variables using a Bayesian principal component analysis. Journal of Statistical Computation and Simulation.

Little R.J.A., Rubin D.B. (2002) Statistical Analysis with Missing Data. Wiley series in probability and statistics, New-York.

Examples

## Not run: 
#########################################################
## Multiple Imputation for visualization on the PCA map
#########################################################

data(orange)
## First the number of components has to be chosen 
##   (for the reconstruction step)
nb <- estim_ncpPCA(orange,ncp.max=4)

## Multiple Imputation
resMI <- MIPCA(orange,ncp=2)

## Visualization on the PCA map
plot(resMI)

#########################################################
## Multiple Imputation for applying statistical methods
(Bayesian method)
#########################################################
data(ozone)

## First the number of components has to be chosen 
nb <- estim_ncpPCA(ozone[,1:11])

## Multiple Imputation with Bayesian method
res.BayesMIPCA<-MIPCA(ozone[,1:11],ncp=2,method.mi="Bayes",verbose=TRUE)

## Regression on the multiply imputed data set and pooling with mice
require(mice)
imp<-prelim(res.mi=res.BayesMIPCA,X=ozone[,1:11])#creating a mids object
fit <- with(data=imp,exp=lm(maxO3~T9+T12+T15+Ne9+Ne12+Ne15+Vx9+Vx12+Vx15+maxO3v))#analysis
res.pool<-pool(fit);summary(res.pool)#pooling

## Diagnostics
res.over<-Overimpute(res.BayesMIPCA)

## End(Not run)

missMDA documentation built on Nov. 17, 2023, 5:07 p.m.