Mixture Discriminant Analysis

Description

Mixture discriminant analysis.

Usage

1
2
mda(formula, data, subclasses, sub.df, tot.df, dimension, eps,
    iter, weights, method, keep.fitted, trace, ...)

Arguments

formula

of the form y~x it describes the response and the predictors. The formula can be more complicated, such as y~log(x)+z etc (see formula for more details). The response should be a factor representing the response variable, or any vector that can be coerced to such (such as a logical variable).

data

data frame containing the variables in the formula (optional).

subclasses

Number of subclasses per class, default is 3. Can be a vector with a number for each class.

sub.df

If subclass centroid shrinking is performed, what is the effective degrees of freedom of the centroids per class. Can be a scalar, in which case the same number is used for each class, else a vector.

tot.df

The total df for all the centroids can be specified rather than separately per class.

dimension

The dimension of the reduced model. If we know our final model will be confined to a discriminant subspace (of the subclass centroids), we can specify this in advance and have the EM algorithm operate in this subspace.

eps

A numerical threshold for automatically truncating the dimension.

iter

A limit on the total number of iterations, default is 5.

weights

NOT observation weights! This is a special weight structure, which for each class assigns a weight (prior probability) to each of the observations in that class of belonging to one of the subclasses. The default is provided by a call to mda.start(x, g, subclasses, trace, ...) (by this time x and g are known). See the help for mda.start. Arguments for mda.start can be provided via the ... argument to mda, and the weights argument need never be accessed. A previously fit mda object can be supplied, in which case the final subclass responsibility weights are used for weights. This allows the iterations from a previous fit to be continued.

method

regression method used in optimal scaling. Default is linear regression via the function polyreg, resulting in the usual mixture model. Other possibilities are mars and bruto. For penalized mixture discriminant models gen.ridge is appropriate.

keep.fitted

a logical variable, which determines whether the (sometimes large) component "fitted.values" of the fit component of the returned mda object should be kept. The default is TRUE if n * dimension < 5000.

trace

if TRUE, iteration information is printed. Note that the deviance reported is for the posterior class likelihood, and not the full likelihood, which is used to drive the EM algorithm under mda. In general the latter is not available.

...

additional arguments to mda.start and to method.

Value

An object of class c("mda", "fda"). The most useful extractor is predict, which can make many types of predictions from this object. It can also be plotted, and any functions useful for fda objects will work here too, such as confusion and coef.

The object has the following components:

percent.explained

the percent between-group variance explained by each dimension (relative to the total explained.)

values

optimal scaling regression sum-of-squares for each dimension (see reference).

means

subclass means in the discriminant space. These are also scaled versions of the final theta's or class scores, and can be used in a subsequent call to mda (this only makes sense if some columns of theta are omitted—see the references)

theta.mod

(internal) a class scoring matrix which allows predict to work properly.

dimension

dimension of discriminant space.

sub.prior

subclass membership priors, computed in the fit. No effort is currently spent in trying to keep these above a threshold.

prior

class proportions for the training data.

fit

fit object returned by method.

call

the call that created this object (allowing it to be update-able).

confusion

confusion matrix when classifying the training data.

weights

These are the subclass membership probabilities for each member of the training set; see the weights argument.

assign.theta

a pointer list which identifies which elements of certain lists belong to individual classes.

deviance

The multinomial log-likelihood of the fit. Even though the full log-likelihood drives the iterations, we cannot in general compute it because of the flexibility of the method used. The deviance can increase with the iterations, but generally does not.

The method functions are required to take arguments x and y where both can be matrices, and should produce a matrix of fitted.values the same size as y. They can take additional arguments weights and should all have a ... for safety sake. Any arguments to method() can be passed on via the ... argument of mda. The default method polyreg has a degree argument which allows polynomial regression of the required total degree. See the documentation for predict.fda for further requirements of method. The package earth is suggested for this package as well; earth is a more detailed implementation of the mars model, and works as a method argument.

The function mda.start creates the starting weights; it takes additional arguments which can be passed in via the ... argument to mda. See the documentation for mda.start.

Author(s)

Trevor Hastie and Robert Tibshirani

References

“Flexible Disriminant Analysis by Optimal Scoring” by Hastie, Tibshirani and Buja, 1994, JASA, 1255-1270.

“Penalized Discriminant Analysis” by Hastie, Buja and Tibshirani, 1995, Annals of Statistics, 73-102

“Discriminant Analysis by Gaussian Mixtures” by Hastie and Tibshirani, 1996, JRSS-B, 155-176.

“Elements of Statisical Learning - Data Mining, Inference and Prediction” (2nd edition, Chapter 12) by Hastie, Tibshirani and Friedman, 2009, Springer

See Also

predict.mda, mars, bruto, polyreg, gen.ridge, softmax, confusion

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
data(iris)
irisfit <- mda(Species ~ ., data = iris)
irisfit
## Call:
## mda(formula = Species ~ ., data = iris)
##
## Dimension: 4
##
## Percent Between-Group Variance Explained:
##     v1     v2     v3     v4
##  96.02  98.55  99.90 100.00
##
## Degrees of Freedom (per dimension): 5
##
## Training Misclassification Error: 0.02 ( N = 150 )
##
## Deviance: 15.102

data(glass)
# random sample of size 100
samp <- c(1, 3, 4, 11, 12, 13, 14, 16, 17, 18, 19, 20, 27, 28, 31,
          38, 42, 46, 47, 48, 49, 52, 53, 54, 55, 57, 62, 63, 64, 65,
          67, 68, 69, 70, 72, 73, 78, 79, 83, 84, 85, 87, 91, 92, 94,
          99, 100, 106, 107, 108, 111, 112, 113, 115, 118, 121, 123,
          124, 125, 126, 129, 131, 133, 136, 139, 142, 143, 145, 147,
          152, 153, 156, 159, 160, 161, 164, 165, 166, 168, 169, 171,
          172, 173, 174, 175, 177, 178, 181, 182, 185, 188, 189, 192,
          195, 197, 203, 205, 211, 212, 214) 
glass.train <- glass[samp,]
glass.test <- glass[-samp,]
glass.mda <- mda(Type ~ ., data = glass.train)
predict(glass.mda, glass.test, type="post") # abbreviations are allowed
confusion(glass.mda,glass.test)