Classification with PLS Dimension Reduction and Linear Discriminant Analysis

Description

The function pls.lda performs binary or multicategorical classification using the method described in Boulesteix (2004) which consists in PLS dimension reduction and linear discriminant analysis applied on the PLS components.

Usage

1
pls.lda(Xtrain, Ytrain, Xtest=NULL, ncomp, nruncv=0, alpha=2/3, priors=NULL)

Arguments

Xtrain

a (ntrain x p) data matrix containing the predictors for the training data set. Xtrain may be a matrix or a data frame. Each row is an observation and each column is a predictor variable.

Ytrain

a vector of length ntrain giving the classes of the ntrain observations. The classes must be coded as 1,...,K (K>=2).

Xtest

a (ntest x p) data matrix containing the predictors for the test data set. Xtest may also be a vector of length p (corresponding to only one test observation). If Xtest=NULL, the training data set is considered as test data set as well.

ncomp

if nruncv=0, ncomp is the number of latent components to be used for PLS dimension reduction. If nruncv>0, the cross-validation procedure described in Boulesteix (2004) is used to choose the best number of components from the vector of integers ncomp or from 1,...,ncomp if ncomp is of length 1.

nruncv

the number of cross-validation iterations to be performed for the choice of the number of latent components. If nruncv=0, cross-validation is not performed and ncomp latent components are used.

alpha

the proportion of observations to be included in the training set at each cross-validation iteration.

priors

The class priors to be used for linear discriminant analysis. If unspecified, the class proportions in the training set are used.

Details

The function pls.lda proceeds as follows to predict the class of the observations from the test data set. First, the SIMPLS algorithm is run on Xtrain and Ytrain to determine the new PLS components based on the training observations only. The new PLS components are then computed for the test data set. Classification is performed by applying classical linear discriminant analysis (LDA) to the new components. Of course, the LDA classifier is built using the training observations only.

Value

A list with the following components:

predclass

the vector containing the predicted classes of the ntest observations from Xtest.

ncomp

the number of latent components used for classification.

Author(s)

Anne-Laure Boulesteix (http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/ 020_professuren/boulesteix/index.html)

References

A. L. Boulesteix (2004). PLS dimension reduction for classification with microarray data, Statistical Applications in Genetics and Molecular Biology 3, Issue 1, Article 33.

A. L. Boulesteix, K. Strimmer (2007). Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 7:32-44.

S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251–263.

See Also

pls.regression, variable.selection, pls.lda.cv.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# load plsgenomics library
library(plsgenomics)

# load leukemia data
data(leukemia)

# Classify observations 1,2,3 (test set) using observations 4 to 38 (training set), 
# with 2 PLS components
pls.lda(Xtrain=leukemia$X[-(1:3),],Ytrain=leukemia$Y[-(1:3)],Xtest=leukemia$X[1:3,],
		ncomp=2,nruncv=0)

# Classify observations 1,2,3 (test set) using observations 4 to 38 (training set), 
# with the best number of components as determined by cross-validation
pls.lda(Xtrain=leukemia$X[-(1:3),],Ytrain=leukemia$Y[-(1:3)],Xtest=leukemia$X[1:3,],
		ncomp=1:4,nruncv=20)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.