nlda: Linear Discriminant Analysis for High Dimensional Problems
In aberHRML/FIEmspro: Flow In-jection Electrospray Mass Spectrometry Processing: \\ data processing, classification modelling and variable selection in metabolite fingerprinting

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Linear discriminant analysis for high dimensional problems. See details for implementation.

  nlda(dat, ...)
  ## Default S3 method:
nlda(dat,cl,prior=NULL,scale=FALSE,comprank = FALSE, ...)

  ## S3 method for class 'formula'
nlda(formula, data = NULL, ..., subset, na.action = na.omit)

`formula`	A formula of the form `groups ~ x1 + x2 + ...` That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.
`data`	Data frame from which variables specified in `formula` are preferentially to be taken.
`dat`	A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.
`cl`	A factor specifying the class for each observation if no formula principal argument is given.
`prior`	The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels.
`scale`	A logical value indicating whether or not PCA is scaled.
`comprank`	A computation rank.
`...`	Arguments passed to or from other methods.
`subset`	An index vector specifying the cases to be used in the training sample.
`na.action`	A function to specify the action to be taken if `NA`s are found. The default action is `na.omit`, which leads to rejection of cases with missing values on any required variable. An alternative is `na.fail`, which causes an error if `NA` cases are found.

A critical issue of applying linear discriminant analysis (LDA) is both the singularity and instability of the within-class scatter matrix. In practice, there are often a large number of features available, but the total number of training patterns is limited and commonly less than the dimension of the feature space. To tackle this issue, nlda combines principal components analysis (PCA) and linear discriminant analysis (LDA) for the classification problem. Because the determination of the optimal number of principal components representative for a dataset is not trivial and the number of dimensions varies from one comparison to another introducing a bias to the estimation of the separability measure, we have opted for a 2 steps procedure proposed in Thomaz, C. E. and Gillies, D. F. (2004): the number of principal components to retain is equal to the rank of the covariance matrix (usually number of training samples minus one) and the within-class scatter matrix is replaced by a version where the less reliable eigenvalues have been replaced. In addition to the proportion of explained variance in each projection, the eigenvalue is a useful diagnostic quantity (output stats).

An object of class nlda containing the following components:

`stats`	The statistics based on the training data.
`Tw`	The proportion of trace.
`rankmat`	The rank used for LDA.
`means`	The means of training data.
`loadings`	A matrix of the coefficients of linear discriminants.
`x`	The rotated data on discriminant variables.
`xmeans`	The group means obtained from training.
`pred`	The predicted class labels of training data.
`cl`	The observed class labels of training data.
`prior`	The prior probabilities used.
`conf`	The confusion matrix based on training data.
`acc`	The accuracy rate of training data.
`lev`	The levels of class.
`call`	The (matched) function call.

This function may be given either a formula and optional data frame, or a matrix and grouping factor as the first two arguments.

David Enot dle@aber.ac.uk and Wanchang Lin wll@aber.ac.uk.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.

Thomaz, C. E. and Gillies, D. F. (2004) A Maximum Uncertainty LDA-based approach for Limited Sample Size problems with application to Face Recognition. Technical Report. Department of Computing, Imperial College London.

Yang, J. and Yang J.-Y. (2003) Why can LDA be performed in PCA transformed space? Pattern Recognition, vol.36, 563 - 566.

predict.nlda, plot.nlda, hca.nlda

## load abr1
data(abr1)
cl   <- factor(abr1$fact$class)
dat <- preproc(abr1$pos , y=cl, method=c("log10","TICnorm"),add=1)[,110:500]  

## define random training and test datasets
idx <- sample(1:nrow(dat), round((2/3)*nrow(dat)), replace=FALSE) 
train.dat  <- dat[idx,]
train.t    <- cl[idx]
test.dat   <- dat[-idx,]        
test.t     <- cl[-idx] 

## build nlda on the training data
model    <- nlda(train.dat,train.t)
## print summary
summary(model)

## map samples on the first 2 DFs
plot(model,dimen=c(1,2),main = "Training data",abbrev = TRUE)
## map samples on all the DFs
plot(model,main = "Training data",abbrev = TRUE)

## predict test sample membership
pred.te  <- predict(model, test.dat)$class
## confusion matrix and error rates
table(test.t,pred.te)