Determination of the number of latent components to be used for classification with PLS and LDA
Description
The function pls.lda.cv
determines the best number of latent components to be used for
classification with PLS dimension reduction and linear discriminant analysis as described in
Boulesteix (2004).
Usage
1  pls.lda.cv(Xtrain, Ytrain, ncomp, nruncv=20, alpha=2/3, priors=NULL)

Arguments
Xtrain 
a (ntrain x p) data matrix containing the predictors for the training data set.

Ytrain 
a vector of length ntrain giving the classes of the ntrain observations. The classes must be coded as 1,...,K (K>=2). 
ncomp 
the vector of integers from which the best number of latent
components has to be chosen by crossvalidation. If 
nruncv 
the number of crossvalidation iterations to be performed for the choice of the number of latent components. 
alpha 
the proportion of observations to be included in the training set at each crossvalidation iteration. 
priors 
The class priors to be used for linear discriminant analysis. If unspecified, the class proportions in the training set are used. 
Details
The crossvalidation procedure described in Boulesteix (2004) is used to
determine the best number of latent components to be used for classification.
At each crossvalidation run, Xtrain
is split into a pseudo training
set and a pseudo test set and the classification error rate is determined for each
number of latent components. Finally, the function pls.lda.cv
returns
the number of latent components for which the mean classification rate over
the nrun
partitions is minimal.
Value
The number of latent components to be used for classification.
Author(s)
AnneLaure Boulesteix (http://www.ibe.med.unimuenchen.de/organisation/mitarbeiter/ 020_professuren/boulesteix/index.html)
References
A. L. Boulesteix (2004). PLS dimension reduction for classification with microarray data, Statistical Applications in Genetics and Molecular Biology 3, Issue 1, Article 33.
A. L. Boulesteix, K. Strimmer (2007). Partial least squares: a versatile tool for the analysis of highdimensional genomic data. Briefings in Bioinformatics 7:3244.
S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251–263.
See Also
pls.lda
, pls.regression.cv
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12  # load plsgenomics library
library(plsgenomics)
# load leukemia data
data(leukemia)
# Determine the best number of components to be used for classification using the
# crossvalidation procedure
# choose the best number from 2,3,4
pls.lda.cv(Xtrain=leukemia$X,Ytrain=leukemia$Y,ncomp=2:4,nruncv=20)
# choose the best number from 1,2,3
pls.lda.cv(Xtrain=leukemia$X,Ytrain=leukemia$Y,ncomp=3,nruncv=20)
