Description Usage Arguments Details Value Note Author(s) References See Also Examples
Linear discriminant analysis for high dimensional problems. See details for implementation.
1 2 3 4 5 6 |
formula |
A formula of the form |
data |
Data frame from which variables specified in |
dat |
A matrix or data frame containing the explanatory variables if no formula is given as the principal argument. |
cl |
A factor specifying the class for each observation if no formula principal argument is given. |
prior |
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels. |
scale |
A logical value indicating whether or not PCA is scaled. |
comprank |
A computation rank. |
... |
Arguments passed to or from other methods. |
subset |
An index vector specifying the cases to be used in the training sample. |
na.action |
A function to specify the action to be taken if |
A critical issue of applying linear discriminant analysis (LDA) is both the
singularity and instability of the within-class scatter matrix. In practice,
there are often a large number of features available, but the total number of
training patterns is limited and commonly less than the dimension of the feature
space. To tackle this issue, nlda
combines principal components analysis
(PCA) and linear discriminant analysis (LDA) for the classification problem.
Because the determination of the optimal number of principal components
representative for a dataset is not trivial and the number of dimensions varies
from one comparison to another introducing a bias to the estimation of the
separability measure, we have opted for a 2 steps procedure proposed in
Thomaz, C. E. and Gillies, D. F. (2004): the number of principal components to
retain is equal to the rank of the covariance matrix (usually number of
training samples minus one) and the within-class scatter matrix is replaced
by a version where the less reliable eigenvalues have been replaced. In
addition to the proportion of explained variance in each projection, the
eigenvalue is a useful diagnostic quantity (output stats
).
An object of class nlda
containing the following components:
stats |
The statistics based on the training data. |
Tw |
The proportion of trace. |
rankmat |
The rank used for LDA. |
means |
The means of training data. |
loadings |
A matrix of the coefficients of linear discriminants. |
x |
The rotated data on discriminant variables. |
xmeans |
The group means obtained from training. |
pred |
The predicted class labels of training data. |
cl |
The observed class labels of training data. |
prior |
The prior probabilities used. |
conf |
The confusion matrix based on training data. |
acc |
The accuracy rate of training data. |
lev |
The levels of class. |
call |
The (matched) function call. |
This function may be given either a formula and optional data frame, or a matrix and grouping factor as the first two arguments.
David Enot dle@aber.ac.uk and Wanchang Lin wll@aber.ac.uk.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.
Thomaz, C. E. and Gillies, D. F. (2004) A Maximum Uncertainty LDA-based approach for Limited Sample Size problems with application to Face Recognition. Technical Report. Department of Computing, Imperial College London.
Yang, J. and Yang J.-Y. (2003) Why can LDA be performed in PCA transformed space? Pattern Recognition, vol.36, 563 - 566.
predict.nlda
, plot.nlda
, hca.nlda
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ## load abr1
data(abr1)
cl <- factor(abr1$fact$class)
dat <- preproc(abr1$pos , y=cl, method=c("log10","TICnorm"),add=1)[,110:500]
## define random training and test datasets
idx <- sample(1:nrow(dat), round((2/3)*nrow(dat)), replace=FALSE)
train.dat <- dat[idx,]
train.t <- cl[idx]
test.dat <- dat[-idx,]
test.t <- cl[-idx]
## build nlda on the training data
model <- nlda(train.dat,train.t)
## print summary
summary(model)
## map samples on the first 2 DFs
plot(model,dimen=c(1,2),main = "Training data",abbrev = TRUE)
## map samples on all the DFs
plot(model,main = "Training data",abbrev = TRUE)
## predict test sample membership
pred.te <- predict(model, test.dat)$class
## confusion matrix and error rates
table(test.t,pred.te)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.