Mlda: Maximum uncertainty Linear Discriminant Analysis.
In HiDimDA: High Dimensional Discriminant Analysis

View source: R/Mlda.R

Mlda	R Documentation

Maximum uncertainty Linear Discriminant Analysis.

Description

‘Mlda’ finds the coefficients of a linear discriminant rule based on the “Maximum uncertainty Linear Discriminant Analysis” approach of Thomaz, Kitani and Gillies (2006).

Usage


## Default S3 method:
Mlda(data, grouping, prior = "proportions", StddzData=TRUE, 
VSelfunct = SelectV, ldafun=c("canonical","classification"), 
PCAstep=FALSE, ...)

## S3 method for class 'data.frame'
Mlda(data, ...)

Arguments

`data`	Matrix or data frame of observations.
`grouping`	Factor specifying the class for each observation.
`prior`	The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels.
`StddzData`	A boolean flag indicating whether the data should be standardized first (default) or used in their original scales.
`VSelfunct`	Variable selection function. Either the string “none” (no selection is to be performed) or a function that takes ‘data’ and ‘grouping’ as its first two arguments and returns a list with two components: (i) ‘nvkpt’ - the number of variables to be used in the Discriminant rule; and (ii) ‘vkptInd’ - the indices of the variables to be used in the Discriminant rule. The default is the ‘SelectV’ function that, by default, selects variables by the Expanded HC scheme described in Duarte Silva (2011).
`ldafun`	Type of discriminant linear functions computed. The alternatives are “canonical” for maximum-discrimination canonical linear functions and “classification” for direct-classification linear functions.
`PCAstep`	A flag indicating if data should be first projected into the space spanned by its first nrow(data)-1 Principal Components in problems where nrow(data)-1 is less than the number of selected variables. In applications with a very large number of useful variables seting PCAstep to TRUE avoids many potential memory problems and tends to substantially increase the size of the data sets that can be analyzed by Mlda.
`...`	Further arguments passed to or from other methods.

Value

If algument ‘ldafun’ is set to “canonical” an object of class ‘canldaRes’ with the following components:

`prior`	The prior probabilities used.
`means`	The class means.
`scaling`	A matrix which transforms observations to discriminant functions, normalized so that the within groups covariance matrix is spherical.
`svd`	The singular values, which give the ratio of the between- and within-group standard deviations on the linear discriminant variables. Their squares are the canonical F-statistics.
`vkpt`	A vector with the indices of the variables kept in the discriminant rule if the number of variables kept is less than ‘ncol(data)’. NULL otherwise.
`nvkpt`	The number of variables kept in the discriminant rule if this number is less than‘ncol(data)’. NULL otherwise.
`N`	The number of observations used.
`call`	The (matched) function call.

If algument ‘ldafun’ is set to “classification” an object of class ‘clldaRes’ with the following components:

`prior`	The prior probabilities used.
`means`	The class means.
`coef`	A matrix with the coefficients of the k-1 classification functions.
`cnst`	A vector with the thresholds (2nd members of linear classification rules) used in classification rules that assume equal priors.
`vkpt`	A vector with the indices of the variables kept in the discriminant rule if the number of variables kept is less than ‘ncol(data)’. NULL otherwise.
`nvkpt`	The number of variables kept in the discriminant rule if this number is less than‘ncol(data)’. NULL, otherwise.
`N`	The number of observations used.
`call`	The (matched) function call.

Author(s)

A. Pedro Duarte Silva

References

Pedro Duarte Silva, A. (2011) “Two Group Classification with High-Dimensional Correlated Data: A Factor Model Approach”, Computational Statistics and Data Analysis, 55 (1), 2975-2990.

Thomaz, Kitani and Gillies (2006) “A maximum uncertainty LDA-based approach for limited sample size problems - with application to face recognition”, Journal of the Brazilian Computer Society, 12 (2), 7-18.

Examples


# train classifier on Alon's Colon Cancer Data Set 
# (after a logarithmic transformation). 

log10genes <- log10(AlonDS[,-1])

ldarule <- Mlda(log10genes,AlonDS$grouping)     

# show classification rule

print(ldarule)

# get in-sample classification results

predict(ldarule,log10genes,grpcodes=levels(AlonDS$grouping))$class           	       

# compare classifications with true assignments

cat("Original classes:\n")
print(AlonDS$grouping)             		 

# Estimate error rates by four-fold cross-validation.
# Note: In cross-validation analysis it is recommended to set 
# the argument 'ldafun' to "classification", in order to speed up 
# computations by avoiding unecessary eigen-decompositions 

## Not run: 

CrosValRes <- DACrossVal(log10genes,AlonDS$grouping,TrainAlg=Mlda,
ldafun="classification",kfold=4,CVrep=1)
summary(CrosValRes[,,"Clerr"])
 

## End(Not run)

HiDimDA documentation built on Oct. 6, 2024, 5:07 p.m.