estim_ncpMCA | R Documentation |
Estimate the number of dimensions for the Multiple Correspondence Analysis by cross-validation
estim_ncpMCA(don, ncp.min=0, ncp.max=5, method = c("Regularized","EM"),
method.cv = c("Kfold","loo"), nbsim=100, pNA=0.05, ind.sup=NULL,
quanti.sup=NULL, quali.sup=NULL, threshold=1e-4,verbose = TRUE)
don |
a data.frame with categorical variables; with missing entries or not |
ncp.min |
integer corresponding to the minimum number of components to test |
ncp.max |
integer corresponding to the maximum number of components to test |
method |
"Regularized" by default or "EM" |
method.cv |
"Kfold" for cross-validation or "loo" for leave-one-out |
nbsim |
number of simulations, useful only if method.cv="Kfold" |
pNA |
percentage of missing values added in the data set, useful only if method.cv="Kfold" |
ind.sup |
a vector indicating the indexes of the supplementary individuals |
quanti.sup |
a vector indicating the indexes of the quantitative supplementary variables |
quali.sup |
a vector indicating the indexes of the categorical supplementary variables |
threshold |
the threshold for assessing convergence |
verbose |
boolean. TRUE means that a progressbar is writtent |
For leave-one-out cross-validation (method.cv="loo"), each cell of the data matrix is alternatively removed and predicted with a MCA model using ncp.min to ncp.max dimensions. The number of components which leads to the smallest mean square error of prediction (MSEP) is retained. For the Kfold cross-validation (method.cv="Kfold"), pNA percentage of missing values is inserted at random in the data matrix and predicted with a MCA model using ncp.min to ncp.max dimensions. This process is repeated nbsim times. The number of components which leads to the smallest MSEP is retained. More precisely, for both cross-validation methods, the missing entries are predicted using the imputeMCA function, it means using it means using the regularized iterative MCA algorithm (method="Regularized") or the iterative MCA algorithm (method="EM"). The regularized version is more appropriate to avoid overfitting issues.
ncp |
the number of components retained for the MCA |
criterion |
the criterion (the MSEP) calculated for each number of components |
Francois Husson francois.husson@institut-agro.fr and Julie Josse julie.josse@polytechnique.edu
Josse, J., Chavent, M., Liquet, B. and Husson, F. (2010). Handling missing values with Regularized Iterative Multiple Correspondence Analysis, Journal of Clcassification, 29 (1), pp. 91-116.
imputeMCA
## Not run:
data(vnf)
result <- estim_ncpMCA(vnf,ncp.min=0, ncp.max=5)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.