nlda: Linear Discriminant Analysis for High Dimensional Problems

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Linear discriminant analysis for high dimensional problems. See details for implementation.

Usage

1
2
3
4
5
6
  nlda(dat, ...)
  ## Default S3 method:
nlda(dat,cl,prior=NULL,scale=FALSE,comprank = FALSE, ...)

  ## S3 method for class 'formula'
nlda(formula, data = NULL, ..., subset, na.action = na.omit)

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

dat

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

cl

A factor specifying the class for each observation if no formula principal argument is given.

prior

The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels.

scale

A logical value indicating whether or not PCA is scaled.

comprank

A computation rank.

...

Arguments passed to or from other methods.

subset

An index vector specifying the cases to be used in the training sample.

na.action

A function to specify the action to be taken if NAs are found. The default action is na.omit, which leads to rejection of cases with missing values on any required variable. An alternative is na.fail, which causes an error if NA cases are found.

Details

A critical issue of applying linear discriminant analysis (LDA) is both the singularity and instability of the within-class scatter matrix. In practice, there are often a large number of features available, but the total number of training patterns is limited and commonly less than the dimension of the feature space. To tackle this issue, nlda combines principal components analysis (PCA) and linear discriminant analysis (LDA) for the classification problem. Because the determination of the optimal number of principal components representative for a dataset is not trivial and the number of dimensions varies from one comparison to another introducing a bias to the estimation of the separability measure, we have opted for a 2 steps procedure proposed in Thomaz, C. E. and Gillies, D. F. (2004): the number of principal components to retain is equal to the rank of the covariance matrix (usually number of training samples minus one) and the within-class scatter matrix is replaced by a version where the less reliable eigenvalues have been replaced. In addition to the proportion of explained variance in each projection, the eigenvalue is a useful diagnostic quantity (output stats).

Value

An object of class nlda containing the following components:

stats

The statistics based on the training data.

Tw

The proportion of trace.

rankmat

The rank used for LDA.

means

The means of training data.

loadings

A matrix of the coefficients of linear discriminants.

x

The rotated data on discriminant variables.

xmeans

The group means obtained from training.

pred

The predicted class labels of training data.

cl

The observed class labels of training data.

prior

The prior probabilities used.

conf

The confusion matrix based on training data.

acc

The accuracy rate of training data.

lev

The levels of class.

call

The (matched) function call.

Note

This function may be given either a formula and optional data frame, or a matrix and grouping factor as the first two arguments.

Author(s)

David Enot dle@aber.ac.uk and Wanchang Lin wll@aber.ac.uk.

References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.

Thomaz, C. E. and Gillies, D. F. (2004) A Maximum Uncertainty LDA-based approach for Limited Sample Size problems with application to Face Recognition. Technical Report. Department of Computing, Imperial College London.

Yang, J. and Yang J.-Y. (2003) Why can LDA be performed in PCA transformed space? Pattern Recognition, vol.36, 563 - 566.

See Also

predict.nlda, plot.nlda, hca.nlda

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## load abr1
data(abr1)
cl   <- factor(abr1$fact$class)
dat <- preproc(abr1$pos , y=cl, method=c("log10","TICnorm"),add=1)[,110:500]  

## define random training and test datasets
idx <- sample(1:nrow(dat), round((2/3)*nrow(dat)), replace=FALSE) 
train.dat  <- dat[idx,]
train.t    <- cl[idx]
test.dat   <- dat[-idx,]        
test.t     <- cl[-idx] 

## build nlda on the training data
model    <- nlda(train.dat,train.t)
## print summary
summary(model)

## map samples on the first 2 DFs
plot(model,dimen=c(1,2),main = "Training data",abbrev = TRUE)
## map samples on all the DFs
plot(model,main = "Training data",abbrev = TRUE)

## predict test sample membership
pred.te  <- predict(model, test.dat)$class
## confusion matrix and error rates
table(test.t,pred.te)

aberHRML/FIEmspro documentation built on May 16, 2019, 6:56 p.m.