Mixture Discriminant Analysis
Description
Mixture discriminant analysis.
Usage
1 2 
Arguments
formula 
of the form 
data 
data frame containing the variables in the formula (optional). 
subclasses 
Number of subclasses per class, default is 3. Can be a vector with a number for each class. 
sub.df 
If subclass centroid shrinking is performed, what is the effective degrees of freedom of the centroids per class. Can be a scalar, in which case the same number is used for each class, else a vector. 
tot.df 
The total df for all the centroids can be specified rather than separately per class. 
dimension 
The dimension of the reduced model. If we know our final model will be confined to a discriminant subspace (of the subclass centroids), we can specify this in advance and have the EM algorithm operate in this subspace. 
eps 
A numerical threshold for automatically truncating the dimension. 
iter 
A limit on the total number of iterations, default is 5. 
weights 
NOT observation weights! This is a special
weight structure, which for each class assigns a weight (prior
probability) to each of the observations in that class of belonging
to one of the subclasses. The default is provided by a call to

method 
regression method used in optimal scaling. Default is
linear regression via the function 
keep.fitted 
a logical variable, which determines whether the
(sometimes large) component 
trace 
if 
... 
additional arguments to 
Value
An object of class c("mda", "fda")
. The most useful extractor
is predict
, which can make many types of predictions from this
object. It can also be plotted, and any functions useful for fda
objects will work here too, such as confusion
and coef
.
The object has the following components:
percent.explained 
the percent betweengroup variance explained by each dimension (relative to the total explained.) 
values 
optimal scaling regression sumofsquares for each dimension (see reference). 
means 
subclass means in the discriminant space. These are also
scaled versions of the final theta's or class scores, and can be
used in a subsequent call to 
theta.mod 
(internal) a class scoring matrix which allows

dimension 
dimension of discriminant space. 
sub.prior 
subclass membership priors, computed in the fit. No effort is currently spent in trying to keep these above a threshold. 
prior 
class proportions for the training data. 
fit 
fit object returned by 
call 
the call that created this object (allowing it to be

confusion 
confusion matrix when classifying the training data. 
weights 
These are the subclass membership probabilities for each member of the training set; see the weights argument. 
assign.theta 
a pointer list which identifies which elements of certain lists belong to individual classes. 
deviance 
The multinomial loglikelihood of the fit. Even though
the full loglikelihood drives the iterations, we cannot in general
compute it because of the flexibility of the 
The method
functions are required to take arguments x
and y
where both can be matrices, and should produce a matrix
of fitted.values
the same size as y
. They can take
additional arguments weights
and should all have a ...
for safety sake. Any arguments to method() can be passed on via the
...
argument of mda
. The default method
polyreg
has a degree
argument which allows polynomial
regression of the required total degree. See the documentation for
predict.fda
for further requirements of method
.
The package earth
is suggested for this package as well;
earth
is a more detailed implementation of the mars model, and
works as a method
argument.
The function mda.start
creates the starting weights; it takes
additional arguments which can be passed in via the ...
argument to mda
. See the documentation for mda.start
.
Author(s)
Trevor Hastie and Robert Tibshirani
References
“Flexible Disriminant Analysis by Optimal Scoring” by Hastie, Tibshirani and Buja, 1994, JASA, 12551270.
“Penalized Discriminant Analysis” by Hastie, Buja and Tibshirani, 1995, Annals of Statistics, 73102
“Discriminant Analysis by Gaussian Mixtures” by Hastie and Tibshirani, 1996, JRSSB, 155176.
“Elements of Statisical Learning  Data Mining, Inference and Prediction” (2nd edition, Chapter 12) by Hastie, Tibshirani and Friedman, 2009, Springer
See Also
predict.mda
,
mars
,
bruto
,
polyreg
,
gen.ridge
,
softmax
,
confusion
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33  data(iris)
irisfit < mda(Species ~ ., data = iris)
irisfit
## Call:
## mda(formula = Species ~ ., data = iris)
##
## Dimension: 4
##
## Percent BetweenGroup Variance Explained:
## v1 v2 v3 v4
## 96.02 98.55 99.90 100.00
##
## Degrees of Freedom (per dimension): 5
##
## Training Misclassification Error: 0.02 ( N = 150 )
##
## Deviance: 15.102
data(glass)
# random sample of size 100
samp < c(1, 3, 4, 11, 12, 13, 14, 16, 17, 18, 19, 20, 27, 28, 31,
38, 42, 46, 47, 48, 49, 52, 53, 54, 55, 57, 62, 63, 64, 65,
67, 68, 69, 70, 72, 73, 78, 79, 83, 84, 85, 87, 91, 92, 94,
99, 100, 106, 107, 108, 111, 112, 113, 115, 118, 121, 123,
124, 125, 126, 129, 131, 133, 136, 139, 142, 143, 145, 147,
152, 153, 156, 159, 160, 161, 164, 165, 166, 168, 169, 171,
172, 173, 174, 175, 177, 178, 181, 182, 185, 188, 189, 192,
195, 197, 203, 205, 211, 212, 214)
glass.train < glass[samp,]
glass.test < glass[samp,]
glass.mda < mda(Type ~ ., data = glass.train)
predict(glass.mda, glass.test, type="post") # abbreviations are allowed
confusion(glass.mda,glass.test)
