MIFAMD | R Documentation |
MIFAMD performs multiple imputations for mixed data (continuous and categorical) using Factorial Analysis of Mixed Data.
MIFAMD(X, ncp = 2, method = c("Regularized", "EM"), coeff.ridge = 1, threshold = 1e-06,
seed = NULL, maxiter = 1000, nboot = 20, verbose = T)
X |
a data.frame with continuous AND categorical variables containing missing values |
ncp |
integer corresponding to the number of components used to reconstruct data with the FAMD reconstruction formulae |
method |
"Regularized" by default or "EM" |
coeff.ridge |
1 by default to perform the regularized imputeFAMD algorithm. Other regularization terms can be implemented by setting the value to less than 1 in order to regularized less (to get closer to the results of an EM method) or more than 1 to regularized more (to get closer to the results of the proportion imputation) |
threshold |
the threshold for the criterion convergence |
seed |
integer, by default seed = NULL implies that missing values are initially imputed by the mean of each variable for the continuous variables and by the proportion of the category for the categorical variables coded with indicator matrices of dummy variables. Other values leads to a random initialization |
maxiter |
integer, maximum number of iterations for the algorithm |
nboot |
the number of imputed datasets |
verbose |
use verbose=TRUE for screen printing of iteration numbers |
MIFAMD generates nboot imputed data sets using FAMD. The observed values are the same from one dataset to the others, whereas the imputed values change. The algorithm is as follows: first, nboot weightings are defined for the individuals (equivalent to a non-parametric bootstrap). Then, the iterative regularized FAMD algorithm (Audigier et al., 2016) is applied according to each weighting, leading to nboot imputed tables. Dummy variables (coding for categorial variables) of these imputed tables are scaled to verify the constraint that the sum is equal to one per variable and per individual. Lastly, missing categories are drawn from the probabilities given by the imputed tables, and gaussian noise is added to the prediction of continuous variables. Thus, nboot imputed mixed data sets are obtained. The variation among the imputed values reflects the variability with which missing values can be predicted.
res.MI |
A list of data frames corresponding to the nboot imputed mixed data sets |
res.imputeFAMD |
A list corresponding to the output obtained with the function imputeFAMD (single imputation) |
call |
The matched call |
Vincent Audigier vincent.audigier@lecnam.net
Audigier, V., Husson, F. & Josse, J. (2015). A principal components method to impute mixed data. Advances in Data Analysis and Classification, 10(1), 5-26. <doi:10.1007/s11634-014-0195-1>
Audigier, V., Husson, F., Josse, J. (2017). MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis. <doi:10.1007/s11222-016-9635-4>
Little R.J.A., Rubin D.B. (2002) Statistical Analysis with Missing Data. Wiley series in probability and statistics, New-York
imputeFAMD
,MIPCA
,MIMCA
,estim_ncpFAMD
,with.mids
,pool
,summary.mira
## Not run:
data(ozone)
## First the number of components has to be chosen
## (for the reconstruction step)
nb <- estim_ncpFAMD(ozone) ## Time-consuming, nb = 2
## Multiple Imputation
res.mi<-MIFAMD(ozone,ncp = 2,nboot=50)
## First completed data matrix
head(res.mi$res.MI[[1]])
## Analysis and pooling with mice
require(mice)
imp<-prelim(res.mi,ozone)
fit <- with(data=imp,exp=lm(maxO3~T9+T12+T15+Ne9+Ne12+Ne15+Vx9+Vx12+Vx15+maxO3v+vent+pluie))
res.pool<-pool(fit)
summary(res.pool)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.