Estimate the number of dimensions for the Multiple Correspondence Analysis by cross-validation

1 2 | ```
estim_ncpMCA(don, ncp.min=0, ncp.max=5, method = c("Regularized","EM"),
method.cv = c("Kfold","loo"), nbsim=100, pNA=0.05, threshold=1e-4)
``` |

`don` |
a data.frame with categorical variables; with missing entries or not |

`ncp.min` |
integer corresponding to the minimum number of components to test |

`ncp.max` |
integer corresponding to the maximum number of components to test |

`method` |
"Regularized" by default or "EM" |

`method.cv` |
"Kfold" for cross-validation or "loo" for leave-one-out |

`nbsim` |
number of simulations, useful only if method.cv="Kfold" |

`pNA` |
percentage of missing values added in the data set, useful only if method.cv="Kfold" |

`threshold` |
the threshold for assessing convergence |

For leave-one-out cross-validation (method.cv="loo"), each cell of the data matrix is alternatively removed and predicted with a MCA model using ncp.min to ncp.max dimensions. The number of components which leads to the smallest mean square error of prediction (MSEP) is retained. For the Kfold cross-validation (method.cv="Kfold"), pNA percentage of missing values is inserted at random in the data matrix and predicted with a MCA model using ncp.min to ncp.max dimensions. This process is repeated nbsim times. The number of components which leads to the smallest MSEP is retained. More precisely, for both cross-validation methods, the missing entries are predicted using the imputeMCA function, it means using it means using the regularized iterative MCA algorithm (method="Regularized") or the iterative MCA algorithm (method="EM"). The regularized version is more appropriate to avoid overfitting issues.

`ncp` |
the number of components retained for the MCA |

`criterion` |
the criterion (the MSEP) calculated for each number of components |

Francois Husson husson@agrocampus-ouest.fr and Julie Josse Julie.Josse@agrocampus-ouest.fr

Josse, J., Chavent, M., Liquet, B. and Husson, F. (2010). Handling missing values with Regularized Iterative Multiple Correspondence Analysis, Journal of Clcassification, 29 (1), pp. 91-116.

`imputeMCA`

1 2 3 4 5 | ```
## Not run:
data(vnf)
result <- estim_ncpMCA(vnf,ncp.min=0, ncp.max=5)
## End(Not run)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.