pcalda: Classification with PCADA
In mt: Metabolomics Data Analysis Toolbox

pcalda

R Documentation

Classification with PCADA

Description

Classification with combination of principal component analysis (PCA) and linear discriminant analysis (LDA).

Usage

pcalda(x, ...)

## Default S3 method:
pcalda(x, y, center = TRUE, scale. = FALSE, ncomp = NULL,
       tune=FALSE,...)

## S3 method for class 'formula'
pcalda(formula, data = NULL, ..., subset, na.action = na.omit)

Arguments

`formula`	A formula of the form `groups ~ x1 + x2 + ...` That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.
`data`	Data frame from which variables specified in `formula` are preferentially to be taken.
`x`	A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.
`y`	A factor specifying the class for each observation if no formula principal argument is given.
`center`	A logical value indicating whether `x` should be shifted to zero centred by column-wise.
`scale.`	A logical value indicating whether `x` should be scaled to have unit variance by column-wise before the analysis takes place.
`ncomp`	The number of principal components to be used in the classification. If `NULL` and `tune=TRUE`, it is the row number of `x` minus the number of class indicating in `y`. If `NULL` and `tune=FALSE`, it is the half of row number of `x`.
`tune`	A logical value indicating whether the best number of components should be tuned.
`...`	Arguments passed to or from other methods.
`subset`	An index vector specifying the cases to be used in the training sample.
`na.action`	A function to specify the action to be taken if `NA`s are found. The default action is `na.omit`, which leads to rejection of cases with missing values on any required variable. An alternative is `na.fail`, which causes an error if `NA` cases are found.

Details

A critical issue of applying linear discriminant analysis (LDA) is both the singularity and instability of the within-class scatter matrix. In practice, there are often a large number of features available, but the total number of training patterns is limited and commonly less than the dimension of the feature space. To tackle this issue, pcalda combines PCA and LDA for classification. It uses PCA for dimension reduction. The rotated data resulted from PCA will be the input variable to LDA for classification.

Value

An object of class pcalda containing the following components:

`x`	The rotated data on discriminant variables.
`cl`	The observed class labels of training data.
`pred`	The predicted class labels of training data.
`posterior`	The posterior probabilities for the predicted classes.
`conf`	The confusion matrix based on training data.
`acc`	The accuracy rate of training data.
`ncomp`	The number of principal components used for classification.
`pca.out`	The output of PCA.
`lda.out`	The output of LDA.
`call`	The (matched) function call.

Note

This function may be called giving either a formula and optional data frame, or a matrix and grouping factor as the first two arguments.

Author(s)

Wanchang Lin

Examples

data(abr1)
cl   <- factor(abr1$fact$class)
dat  <- abr1$pos

## divide data as training and test data
idx <- sample(1:nrow(dat), round((2/3)*nrow(dat)), replace=FALSE) 

## construct train and test data 
train.dat  <- dat[idx,]
train.t    <- cl[idx]
test.dat   <- dat[-idx,]        
test.t     <- cl[-idx] 

## apply pcalda
model    <- pcalda(train.dat,train.t)
model
summary(model)

## plot
plot(model,dimen=c(1,2),main = "Training data",abbrev = TRUE)
plot(model,main = "Training data",abbrev = TRUE)

## confusion matrix
pred.te  <- predict(model, test.dat)$class
table(test.t,pred.te)

mt documentation built on June 22, 2024, 12:24 p.m.