Classification with PLS Dimension Reduction and Linear Discriminant Analysis
Description
The function pls.lda
performs binary or multicategorical classification using the method
described in Boulesteix (2004) which consists in PLS dimension reduction and linear
discriminant analysis applied on the PLS components.
Usage
1 
Arguments
Xtrain 
a (ntrain x p) data matrix containing the predictors for the training data set. Xtrain may be a matrix or a data frame. Each row is an observation and each column is a predictor variable. 
Ytrain 
a vector of length ntrain giving the classes of the ntrain observations. The classes must be coded as 1,...,K (K>=2). 
Xtest 
a (ntest x p) data matrix containing the predictors for the test
data set. 
ncomp 
if 
nruncv 
the number of crossvalidation iterations to be performed for the choice of
the number of latent components. If 
alpha 
the proportion of observations to be included in the training set at each crossvalidation iteration. 
priors 
The class priors to be used for linear discriminant analysis. If unspecified, the class proportions in the training set are used. 
Details
The function pls.lda
proceeds as follows to predict the class of the
observations from the test data set.
First, the SIMPLS algorithm is run on Xtrain
and Ytrain
to
determine the new PLS components based on the training observations only.
The new PLS components are then computed for the test
data set. Classification is performed by applying classical linear
discriminant analysis (LDA) to the new components. Of course, the LDA
classifier is built using the training observations only.
Value
A list with the following components:
predclass 
the vector containing the predicted classes of the ntest observations from

ncomp 
the number of latent components used for classification. 
Author(s)
AnneLaure Boulesteix (http://www.ibe.med.unimuenchen.de/organisation/mitarbeiter/ 020_professuren/boulesteix/index.html)
References
A. L. Boulesteix (2004). PLS dimension reduction for classification with microarray data, Statistical Applications in Genetics and Molecular Biology 3, Issue 1, Article 33.
A. L. Boulesteix, K. Strimmer (2007). Partial least squares: a versatile tool for the analysis of highdimensional genomic data. Briefings in Bioinformatics 7:3244.
S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251–263.
See Also
pls.regression
, variable.selection
,
pls.lda.cv
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  # load plsgenomics library
library(plsgenomics)
# load leukemia data
data(leukemia)
# Classify observations 1,2,3 (test set) using observations 4 to 38 (training set),
# with 2 PLS components
pls.lda(Xtrain=leukemia$X[(1:3),],Ytrain=leukemia$Y[(1:3)],Xtest=leukemia$X[1:3,],
ncomp=2,nruncv=0)
# Classify observations 1,2,3 (test set) using observations 4 to 38 (training set),
# with the best number of components as determined by crossvalidation
pls.lda(Xtrain=leukemia$X[(1:3),],Ytrain=leukemia$Y[(1:3)],Xtest=leukemia$X[1:3,],
ncomp=1:4,nruncv=20)
