imputeCA | R Documentation |
Impute the missing entries of a contingency table using Correspondence Analysis (CA). Can be used as a preliminary step before performing CA on an incomplete dataset.
imputeCA(X, ncp = 2, threshold = 1e-08, maxiter = 1000, row.sup=NULL,
col.sup=NULL, quanti.sup=NULL, quali.sup=NULL)
X |
a data.frame that is a contingency table containing missing values |
ncp |
integer corresponding to the number of dimensions used to predict the missing entries |
threshold |
the threshold for assessing convergence |
maxiter |
integer, maximum number of iterations for the regularized iterative CA algorithm |
row.sup |
a vector indicating the indexes of the supplementary rows |
col.sup |
a vector indicating the indexes of the supplementary columns |
quanti.sup |
a vector indicating the indexes of the quantitative supplementary variables |
quali.sup |
a vector indicating the indexes of the categorical supplementary variables |
Impute the missing entries of a contingency table using a regularized CA algorithm.
The (regularized) iterative CA algorithm first consists in initializing missing values with random initial values.
The second step of the (regularized) iterative CA algorithm consists in performing CA on the completed dataset. Then, it imputes the missing values with the (regularized) reconstruction formulae of order ncp (the fitted matrix computed with ncp components for the (regularized) scores and loadings). These steps of estimation of the parameters via CA and imputation of the missing values using the (regularized) fitted matrix are iterate until convergence.
In this regularized algorithm, the singular values of the CA are shrinked.
The number of components ncp used in the algorithm should be small.
A small number of components can also be seen as a way to regularize more and consequently may be advices to get more stable predictions.
The output of the algorithm can be used as an input of the CA function of the FactoMineR package in order to perform CA on an incomplete dataset.
The imputed contingency table; the observed values are kept for the non-missing entries and the missing values are replaced by the predicted ones.
Francois Husson francois.husson@institut-agro.fr and Julie Josse julie.josse@polytechnique.edu
## Not run:
data(children)
## Impute the indicator matrix and perform a CA
res.impute <- imputeCA(children, ncp=2)
res.ca <- CA(res.impute)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.