Impute missing values in categorical variables with Multiple Correspondence Analysis

Share:

Description

Impute the missing values of a categorical dataset (in the indicator matrix) with Multiple Correspondence Analysis

Usage

1
imputeMCA(don, ncp=2, row.w=NULL, coeff.ridge=1, threshold=1e-06, seed=NULL, maxiter=1000)

Arguments

don

a data.frame with categorical variables containing missing values

ncp

integer corresponding to the number of dimensions used to reconstruct data with the reconstruction formulae

row.w

an optional row weights (by default, a vector of 1 over the number of rows for uniform row weights)

coeff.ridge

a positive coefficient that permits to shrink the eigenvalues more than by the mean of the last eigenvalues (by default, 1 the eigenvalues are shrunk by the mean of the last eigenvalues; a coefficient between 1 and 2 is required)

threshold

the threshold for assessing convergence

seed

an integer to specify the seed for the initialization for the regularized iterative MCA algorithm (if seed = NULL the initialization step corresponds to the imputation of the proportion of each category)

maxiter

integer, maximum number of iterations for the regularized iterative MCA algorithm

Details

Use a Regularized Iterative Multiple Correspondence Analysis to impute missing values. The regularized iterative MCA algorithm first imputes the missing values in the indicator matrix with initial values (the proportion of each category), then performs MCA on the completed dataset, imputes the missing values with the reconstruction formulae of order ncp and iterates until convergence.

If ncp=0, the Average method (imputation with the proportion) is performed.

Value

Return the imputed indicator matrix. The imputed valued are real numbers and may be seen as degree of membership to the corresponding category.

Author(s)

Francois Husson husson@agrocampus-ouest.fr and Julie Josse Julie.Josse@agrocampus-ouest.fr

References

Josse, J., Chavent, M., Liquet, B. and Husson, F. (2010). Handling missing values with Regularized Iterative Multiple Correspondence Analysis.

See Also

estim_ncpMCA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 
data(vnf)
## First the number of components has to be chosen 
##   (for the reconstruction step)
## nb <- estim_ncpMCA(vnf,ncp.max=5) ## Time-consuming, nb = 4

## Impute indicator matrix
tab.disj <- imputeMCA(vnf, ncp=4)

## A MCA can be performed
res.mca <- MCA(vnf,tab.disj=tab.disj)

## End(Not run)