The function pls.regression.cv
determines the best number of latent components to be used
for PLS regression using the crossvalidation approach described in Boulesteix and Strimmer (2005).
1  pls.regression.cv(Xtrain, Ytrain, ncomp, nruncv=20, alpha=2/3)

Xtrain 
a (ntrain x p) data matrix containing the predictors for the training data set.

Ytrain 
a (ntrain x m) data matrix of responses. 
ncomp 
the vector of integers from which the best number of latent
components has to be chosen by crossvalidation. If 
nruncv 
the number of crossvalidation iterations to be performed for the choice of the number of latent components. 
alpha 
the proportion of observations to be included in the training set at each crossvalidation iteration. 
The crossvalidation procedure described in Boulesteix and Strimmer (2005)
is used to determine the best number of latent components to be used for classification.
At each crossvalidation run, Xtrain
is split into a pseudo training
set and a pseudo test set and the squared error is determined for each
number of latent components. Finally, the function pls.regression.cv
returns
the number of latent components for which the mean squared error over
the nrun
partitions is minimal.
The number of latent components to be used in PLS regression, as determined by crossvalidation.
AnneLaure Boulesteix (http://www.ibe.med.unimuenchen.de/organisation/mitarbeiter/ 020_professuren/boulesteix/index.html) and Korbinian Strimmer (http://strimmerlab.org/).
A. L. Boulesteix and K. Strimmer (2005). Predicting Transcription Factor Activities from Combined Analysis of Microarray and ChIP Data: A Partial Least Squares Approach.
A. L. Boulesteix, K. Strimmer (2007). Partial least squares: a versatile tool for the analysis of highdimensional genomic data. Briefings in Bioinformatics 7:3244.
S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251–263.
pls.regression
, TFA.estimate
,
pls.lda.cv
.
1 2 3 4 5 6 7 8 9 10 11  # load plsgenomics library
library(plsgenomics)
# load Ecoli data
data(Ecoli)
# determine the best number of components for PLS regression using the crossvalidation approach
# choose the best number from 1,2,3,4
pls.regression.cv(Xtrain=Ecoli$CONNECdata,Ytrain=Ecoli$GEdata,ncomp=4,nruncv=20)
# choose the best number from 2,3
pls.regression.cv(Xtrain=Ecoli$CONNECdata,Ytrain=Ecoli$GEdata,ncomp=c(2,3),nruncv=20)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.