plsc: Classification with PLSDA
In mt: Metabolomics Data Analysis Toolbox

View source: R/mt_plsc.R

plsc	R Documentation

Classification with PLSDA

Description

Classification with partial least squares (PLS) or PLS plus linear discriminant analysis (LDA).

Usage

plsc(x, ...)

plslda(x, ...)

## Default S3 method:
plsc(x, y, pls="simpls",ncomp=10, tune=FALSE,...)

## S3 method for class 'formula'
plsc(formula, data = NULL, ..., subset, na.action = na.omit)

## Default S3 method:
plslda(x, y, pls="simpls",ncomp=10, tune=FALSE,...)

## S3 method for class 'formula'
plslda(formula, data = NULL, ..., subset, na.action = na.omit)

Arguments

`formula`	A formula of the form `groups ~ x1 + x2 + ...` That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.
`data`	Data frame from which variables specified in `formula` are preferentially to be taken.
`x`	A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.
`y`	A factor specifying the class for each observation if no formula principal argument is given.
`pls`	A method for calculating PLS scores and loadings. The following methods are supported: `simpls:` SIMPLS algorithm. `kernelpls:` kernel algorithm. `oscorespls:` orthogonal scores algorithm. For details, see `simpls.fit`, `kernelpls.fit` and `oscorespls.fit` in package pls.
`ncomp`	The number of components to be used in the classification.
`tune`	A logical value indicating whether the best number of components should be tuned.
`...`	Arguments passed to or from other methods.
`subset`	An index vector specifying the cases to be used in the training sample.
`na.action`	A function to specify the action to be taken if `NA`s are found. The default action is `na.omit`, which leads to rejection of cases with missing values on any required variable. An alternative is `na.fail`, which causes an error if `NA` cases are found.

Details

plcs implements PLS for classification. In details, the categorical response vector y is converted into a numeric matrix for regression by PLS and the output of PLS is convert to posteriors by softmax method. The classification results are obtained based on the posteriors. plslda combines PLS and LDA for classification, in which, PLS is for dimension reduction and LDA is for classification based on the data transformed by PLS.

Three PLS functions,simpls.fit, kernelpls.fit and oscorespls.fit, are implemented in package pls.

Value

An object of class plsc or plslda containing the following components:

`x`	A matrix of the latent components or scores from PLS.
`cl`	The observed class labels of training data.
`pred`	The predicted class labels of training data.
`conf`	The confusion matrix based on training data.
`acc`	The accuracy rate of training data.
`posterior`	The posterior probabilities for the predicted classes.
`ncomp`	The number of latent component used for classification.
`pls.method`	The PLS algorithm used.
`pls.out`	The output of PLS.
`lda.out`	The output of LDA used only by `plslda`.
`call`	The (matched) function call.

Note

Two functions may be called giving either a formula and optional data frame, or a matrix and grouping factor as the first two arguments.

Author(s)

Wanchang Lin

References

Martens, H. and Nas, T. (1989) Multivariate calibration. John Wiley & Sons.

Examples

library(pls)  
data(abr1)
cl   <- factor(abr1$fact$class)
dat  <- preproc(abr1$pos , y=cl, method=c("log10"),add=1)[,110:500]

## divide data as training and test data
idx <- sample(1:nrow(dat), round((2/3)*nrow(dat)), replace=FALSE) 

## construct train and test data 
train.dat  <- dat[idx,]
train.t    <- cl[idx]
test.dat   <- dat[-idx,]        
test.t     <- cl[-idx] 

## apply plsc and plslda
(res   <- plsc(train.dat,train.t, ncomp = 20, tune = FALSE))
## Estimate the mean squared error of prediction (MSEP), root mean squared error
## of prediction (RMSEP) and R^2 (coefficient of multiple determination) for 
## fitted PLSR model 
MSEP(res$pls.out)
RMSEP(res$pls.out)
R2(res$pls.out)

(res.1  <- plslda(train.dat,train.t, ncomp = 20, tune = FALSE))
## Estimate the mean squared error of prediction (MSEP), root mean squared error
## of prediction (RMSEP) and R^2 (coefficient of multiple determination) for 
## fitted PLSR model 
MSEP(res.1$pls.out)
RMSEP(res.1$pls.out)
R2(res.1$pls.out)

## Not run: 
## with function of tuning component numbers
(z.plsc   <- plsc(train.dat,train.t, ncomp = 20, tune = TRUE))
(z.plslda <- plslda(train.dat,train.t, ncomp = 20, tune = TRUE))

## check nomp tuning results
z.plsc$ncomp
plot(z.plsc$acc.tune)
z.plslda$ncomp
plot(z.plslda$acc.tune)

## plot
plot(z.plsc,dimen=c(1,2,3),main = "Training data",abbrev = TRUE)
plot(z.plslda,dimen=c(1,2,3),main = "Training data",abbrev = TRUE)

## predict test data
pred.plsc   <- predict(z.plsc, test.dat)$class
pred.plslda <- predict(z.plslda, test.dat)$class

## classification rate and confusion matrix
cl.rate(test.t, pred.plsc)
cl.rate(test.t, pred.plslda)


## End(Not run)

mt documentation built on June 22, 2024, 12:24 p.m.