estim_ncpPCA: Estimate the number of dimensions for the Principal Component...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/estim_ncpPCA.R

Description

Estimate the number of dimensions for the Principal Component Analysis by cross-validation

Usage

1
estim_ncpPCA(X, ncp.min = 0, ncp.max = 5, method = "Regularized", scale = TRUE, method.cv = "loo", nbsim = 100, pNA = 0.05, threshold=1e-4)

Arguments

X

a data.frame with continuous variables; with missing entries or not

ncp.min

integer corresponding to the minimum number of components to test

ncp.max

integer corresponding to the maximum number of components to test

method

"Regularized" by default or "EM"

scale

boolean. By default TRUE leading to a same weight for each variable

method.cv

string with the values "loo" for leave-one-out or "Kfold" cross-validation

nbsim

number of simulations, useful only if method.cv="Kfold"

pNA

percentage of missing values added in the data set, useful only if method.cv="Kfold"

threshold

the threshold for assessing convergence

Details

For leave-one-out (loo) cross-validation, each value is alternatively removed and predicted with a PCA model using ncp.min to ncp.max dimensions. The number of components which leads to the smallest MSEP is retained. Each cell is predicted using the imputePCA function, it means using the regularized iterative PCA algorithm or the iterative PCA (EM cross-validation).
For the Kfold cross-validation, pNA percentage of missing values is removed and predicted with a PCA model using ncp.min to ncp.max dimensions. This process is repeated nbsim times. The leave-one-out method is time-consuming (method.cv="loo") when the number of cells is important in the data.frame.
The regularized version is more appropriate when there are many missing values in the dataset (to avoid overfitting).

Value

ncp

the number of components retained for the PCA

criterion

the criterion (the MSEP) calculated for each number of components

Author(s)

Francois Husson husson@agrocampus-ouest.fr and Julie Josse Julie.Josse@agrocampus-ouest.fr

References

Bro, R., Kjeldahl, K. Smilde, A. K. and Kiers, H. A. L. (2008) Cross-validation of component models: A critical look at current methods. Analytical and Bioanalytical Chemistry, 5, 1241-1251.
J. Josse, F. Husson et J. Pag<e8>s (2009) Gestion des donn<e9>es manquantes en Analyse en Composantes Principales. Journal de la SFdS. 150 (2), pp. 28-51.

Examples

1
2
3
4
5
## Not run: 
data(orange)
nb <- estim_ncpPCA(orange,ncp.min=0,ncp.max=4) ## Time consuming, nb = 2

## End(Not run)

Example output



missMDA documentation built on May 2, 2019, 5:46 p.m.