plsc: Classification with PLSDA

View source: R/mt_plsc.R

plscR Documentation

Classification with PLSDA

Description

Classification with partial least squares (PLS) or PLS plus linear discriminant analysis (LDA).

Usage

plsc(x, ...)

plslda(x, ...)

## Default S3 method:
plsc(x, y, pls="simpls",ncomp=10, tune=FALSE,...)

## S3 method for class 'formula'
plsc(formula, data = NULL, ..., subset, na.action = na.omit)

## Default S3 method:
plslda(x, y, pls="simpls",ncomp=10, tune=FALSE,...)

## S3 method for class 'formula'
plslda(formula, data = NULL, ..., subset, na.action = na.omit)

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

x

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

y

A factor specifying the class for each observation if no formula principal argument is given.

pls

A method for calculating PLS scores and loadings. The following methods are supported:

  • simpls: SIMPLS algorithm.

  • kernelpls: kernel algorithm.

  • oscorespls: orthogonal scores algorithm.

For details, see simpls.fit, kernelpls.fit and oscorespls.fit in package pls.

ncomp

The number of components to be used in the classification.

tune

A logical value indicating whether the best number of components should be tuned.

...

Arguments passed to or from other methods.

subset

An index vector specifying the cases to be used in the training sample.

na.action

A function to specify the action to be taken if NAs are found. The default action is na.omit, which leads to rejection of cases with missing values on any required variable. An alternative is na.fail, which causes an error if NA cases are found.

Details

plcs implements PLS for classification. In details, the categorical response vector y is converted into a numeric matrix for regression by PLS and the output of PLS is convert to posteriors by softmax method. The classification results are obtained based on the posteriors. plslda combines PLS and LDA for classification, in which, PLS is for dimension reduction and LDA is for classification based on the data transformed by PLS.

Three PLS functions,simpls.fit, kernelpls.fit and oscorespls.fit, are implemented in package pls.

Value

An object of class plsc or plslda containing the following components:

x

A matrix of the latent components or scores from PLS.

cl

The observed class labels of training data.

pred

The predicted class labels of training data.

conf

The confusion matrix based on training data.

acc

The accuracy rate of training data.

posterior

The posterior probabilities for the predicted classes.

ncomp

The number of latent component used for classification.

pls.method

The PLS algorithm used.

pls.out

The output of PLS.

lda.out

The output of LDA used only by plslda.

call

The (matched) function call.

Note

Two functions may be called giving either a formula and optional data frame, or a matrix and grouping factor as the first two arguments.

Author(s)

Wanchang Lin

References

Martens, H. and Nas, T. (1989) Multivariate calibration. John Wiley & Sons.

See Also

kernelpls.fit, simpls.fit, oscorespls.fit, predict.plsc, plot.plsc, tune.func

Examples

library(pls)  
data(abr1)
cl   <- factor(abr1$fact$class)
dat  <- preproc(abr1$pos , y=cl, method=c("log10"),add=1)[,110:500]

## divide data as training and test data
idx <- sample(1:nrow(dat), round((2/3)*nrow(dat)), replace=FALSE) 

## construct train and test data 
train.dat  <- dat[idx,]
train.t    <- cl[idx]
test.dat   <- dat[-idx,]        
test.t     <- cl[-idx] 

## apply plsc and plslda
(res   <- plsc(train.dat,train.t, ncomp = 20, tune = FALSE))
## Estimate the mean squared error of prediction (MSEP), root mean squared error
## of prediction (RMSEP) and R^2 (coefficient of multiple determination) for 
## fitted PLSR model 
MSEP(res$pls.out)
RMSEP(res$pls.out)
R2(res$pls.out)

(res.1  <- plslda(train.dat,train.t, ncomp = 20, tune = FALSE))
## Estimate the mean squared error of prediction (MSEP), root mean squared error
## of prediction (RMSEP) and R^2 (coefficient of multiple determination) for 
## fitted PLSR model 
MSEP(res.1$pls.out)
RMSEP(res.1$pls.out)
R2(res.1$pls.out)

## Not run: 
## with function of tuning component numbers
(z.plsc   <- plsc(train.dat,train.t, ncomp = 20, tune = TRUE))
(z.plslda <- plslda(train.dat,train.t, ncomp = 20, tune = TRUE))

## check nomp tuning results
z.plsc$ncomp
plot(z.plsc$acc.tune)
z.plslda$ncomp
plot(z.plslda$acc.tune)

## plot
plot(z.plsc,dimen=c(1,2,3),main = "Training data",abbrev = TRUE)
plot(z.plslda,dimen=c(1,2,3),main = "Training data",abbrev = TRUE)

## predict test data
pred.plsc   <- predict(z.plsc, test.dat)$class
pred.plslda <- predict(z.plslda, test.dat)$class

## classification rate and confusion matrix
cl.rate(test.t, pred.plsc)
cl.rate(test.t, pred.plslda)


## End(Not run)

mt documentation built on June 22, 2024, 12:24 p.m.

Related to plsc in mt...