estim_ncpPCA: Estimate the number of dimensions for the Principal Component...
In missMDA: Handling Missing Values with Multivariate Data Analysis

estim_ncpPCA

R Documentation

Estimate the number of dimensions for the Principal Component Analysis by cross-validation

Description

Estimate the number of dimensions for the Principal Component Analysis by cross-validation

Usage

estim_ncpPCA(X, ncp.min = 0, ncp.max = 5, method = c("Regularized","EM"), 
       scale = TRUE, method.cv = c("gcv","loo","Kfold"), nbsim = 100, 
	   pNA = 0.05, ind.sup=NULL, quanti.sup=NULL, quali.sup=NULL,
	   threshold=1e-4, verbose = TRUE)

Arguments

`X`	a data.frame with continuous variables; with missing entries or not
`ncp.min`	integer corresponding to the minimum number of components to test
`ncp.max`	integer corresponding to the maximum number of components to test
`method`	"Regularized" by default or "EM"
`scale`	boolean. TRUE implies a same weight for each variable
`method.cv`	string with the values "gcv" for generalised cross-validation, "loo" for leave-one-out or "Kfold" cross-validation
`nbsim`	number of simulations, useful only if method.cv="Kfold"
`pNA`	percentage of missing values added in the data set, useful only if method.cv="Kfold"
`ind.sup`	a vector indicating the indexes of the supplementary individuals
`quanti.sup`	a vector indicating the indexes of the quantitative supplementary variables
`quali.sup`	a vector indicating the indexes of the categorical supplementary variables
`threshold`	the threshold for assessing convergence
`verbose`	boolean. TRUE means that a progressbar is writtent

Details

For leave-one-out (loo) cross-validation, each cell of the data matrix is alternatively removed and predicted with a PCA model using ncp.min to ncp.max dimensions. The number of components which leads to the smallest mean square error of prediction (MSEP) is retained. For the Kfold cross-validation, pNA percentage of missing values is inserted and predicted with a PCA model using ncp.min to ncp.max dimensions. This process is repeated nbsim times. The number of components which leads to the smallest MSEP is retained.
For both cross-validation methods, missing entries are predicted using the imputePCA function, it means using the regularized iterative PCA algorithm (method="Regularized") or the iterative PCA algorithm (method="EM"). The regularized version is more appropriate when there are already many missing values in the dataset to avoid overfitting issues.
Cross-validation (especially method.cv="loo") is time-consuming. The generalised cross-validation criterion (method.cv="gcv") can be seen as an approximation of the loo cross-validation criterion which provides a straightforward way to estimate the number of dimensions without resorting to a computationally intensive method.

This argument scale has to be chosen in agreement with the PCA that will be performed. If one wants to perform a normed PCA (where the variables are centered and scaled, i.e. divided by their standard deviation), then the argument scale has to be set to the value TRUE.

Value

`ncp`	the number of components retained for the PCA
`criterion`	the criterion (the MSEP) calculated for each number of components

Author(s)

Francois Husson francois.husson@institut-agro.fr and Julie Josse julie.josse@polytechnique.edu

References

Bro, R., Kjeldahl, K. Smilde, A. K. and Kiers, H. A. L. (2008) Cross-validation of component models: A critical look at current methods. Analytical and Bioanalytical Chemistry, 5, 1241-1251.

Josse, J. and Husson, F. (2011). Selecting the number of components in PCA using cross-validation approximations. Computational Statistics and Data Analysis. 56 (6), pp. 1869-1879.

Examples

## Not run: 
data(orange)
nb <- estim_ncpPCA(orange,ncp.min=0,ncp.max=4) 

## End(Not run)

missMDA documentation built on Nov. 17, 2023, 5:07 p.m.

missMDA index

Package overview README.md missMDA" Mulitple Imputation User Guide"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

missMDA
Handling Missing Values with Multivariate Data Analysis

estim_ncpPCA: Estimate the number of dimensions for the Principal Component...
In missMDA: Handling Missing Values with Multivariate Data Analysis

Estimate the number of dimensions for the Principal Component Analysis by cross-validation

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to estim_ncpPCA in missMDA...

R Package Documentation

Browse R Packages

We want your feedback!

missMDA Handling Missing Values with Multivariate Data Analysis

estim_ncpPCA: Estimate the number of dimensions for the Principal Component... In missMDA: Handling Missing Values with Multivariate Data Analysis

Estimate the number of dimensions for the Principal Component Analysis by cross-validation

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to estim_ncpPCA in missMDA...

R Package Documentation

Browse R Packages

We want your feedback!

missMDA
Handling Missing Values with Multivariate Data Analysis

estim_ncpPCA: Estimate the number of dimensions for the Principal Component...
In missMDA: Handling Missing Values with Multivariate Data Analysis