MclustDA: MclustDA discriminant analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/mclustda.R

Description

Discriminant analysis based on Gaussian finite mixture modeling.

Usage

1
2
3
4
5
6
7
8
MclustDA(data, class, G = NULL, modelNames = NULL, 
         modelType = c("MclustDA", "EDDA"), 
         prior = NULL, 
         control = emControl(), 
         initialization = NULL, 
         warn = mclust.options("warn"), 
         verbose = interactive(),
         ...)

Arguments

data

A data frame or matrix giving the training data.

class

A vector giving the class labels for the observations in the training data.

G

An integer vector specifying the numbers of mixture components (clusters) for which the BIC is to be calculated within each class. The default is G = 1:5.
A different set of mixture components for each class can be specified by providing this argument with a list of integers for each class. See the examples below.

modelNames

A vector of character strings indicating the models to be fitted by EM within each class (see the description in mclustModelNames). A different set of mixture models for each class can be specified by providing this argument with a list of character strings. See the examples below.

modelType

A character string specifying whether the models given in modelNames should fit a different number of mixture components and covariance structures for each class ("MclustDA", the default) or should be constrained to have a single component for each class with the same covariance structure among classes ("EDDA"). See Details section and the examples below.

prior

The default assumes no prior, but this argument allows specification of a conjugate prior on the means and variances through the function priorControl.

control

A list of control parameters for EM. The defaults are set by the call emControl().

initialization

A list containing zero or more of the following components:

hcPairs

A matrix of merge pairs for hierarchical clustering such as produced by function hc. The default is to compute a hierarchical clustering tree by applying function hc with modelName = "E" to univariate data and modelName = "VVV" to multivariate data or a subset as indicated by the subset argument. The hierarchical clustering results are used as starting values for EM.

subset

A logical or numeric vector specifying a subset of the data to be used in the initial hierarchical clustering phase.

warn

A logical value indicating whether or not certain warnings (usually related to singularity) should be issued when estimation fails. The default is controlled by mclust.options.

verbose

A logical controlling if a text progress bar is displayed during the fitting procedure. By default is TRUE if the session is interactive, and FALSE otherwise..

...

Further arguments passed to or from other methods.

Details

The "EDDA" method for discriminant analysis is described in Bensmail and Celeux (1996), while "MclustDA" in Fraley and Raftery (2002).

Value

An object of class 'MclustDA' providing the optimal (according to BIC) mixture model.

The details of the output components are as follows:

call

The matched call.

data

The input data matrix.

class

The input class labels.

type

A character string specifying the modelType estimated.

models

A list of Mclust objects containing information on fitted model for each class.

n

The total number of observations in the data.

d

The dimension of the data.

bic

Optimal BIC value.

loglik

Log-likelihood for the selected model.

df

Number of estimated parameters.

Author(s)

Luca Scrucca

References

Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, The R Journal, 8/1, pp. 205-233.

Fraley C. and Raftery A. E. (2002) Model-based clustering, discriminant analysis and density estimation, Journal of the American Statistical Association, 97/458, pp. 611-631.

Fraley C., Raftery A. E., Murphy T. B. and Scrucca L. (2012) mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation. Technical Report No. 597, Department of Statistics, University of Washington.

Bensmail, H., and Celeux, G. (1996) Regularized Gaussian Discriminant Analysis Through Eigenvalue Decomposition.Journal of the American Statistical Association, 91, 1743-1748.

See Also

summary.MclustDA, plot.MclustDA, predict.MclustDA, classError

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
odd <- seq(from = 1, to = nrow(iris), by = 2)
even <- odd + 1
X.train <- iris[odd,-5]
Class.train <- iris[odd,5]
X.test <- iris[even,-5]
Class.test <- iris[even,5]

# common EEE covariance structure (which is essentially equivalent to linear discriminant analysis)
irisMclustDA <- MclustDA(X.train, Class.train, modelType = "EDDA", modelNames = "EEE")
summary(irisMclustDA, parameters = TRUE)
summary(irisMclustDA, newdata = X.test, newclass = Class.test)

# common covariance structure selected by BIC
irisMclustDA <- MclustDA(X.train, Class.train, modelType = "EDDA")
summary(irisMclustDA, parameters = TRUE)
summary(irisMclustDA, newdata = X.test, newclass = Class.test)

# general covariance structure selected by BIC
irisMclustDA <- MclustDA(X.train, Class.train)
summary(irisMclustDA, parameters = TRUE)
summary(irisMclustDA, newdata = X.test, newclass = Class.test)

plot(irisMclustDA)
plot(irisMclustDA, dimens = 3:4)
plot(irisMclustDA, dimens = 4)

plot(irisMclustDA, what = "classification")
plot(irisMclustDA, what = "classification", newdata = X.test)
plot(irisMclustDA, what = "classification", dimens = 3:4)
plot(irisMclustDA, what = "classification", newdata = X.test, dimens = 3:4)
plot(irisMclustDA, what = "classification", dimens = 4)
plot(irisMclustDA, what = "classification", dimens = 4, newdata = X.test)

plot(irisMclustDA, what = "train&test", newdata = X.test)
plot(irisMclustDA, what = "train&test", newdata = X.test, dimens = 3:4)
plot(irisMclustDA, what = "train&test", newdata = X.test, dimens = 4)

plot(irisMclustDA, what = "error")
plot(irisMclustDA, what = "error", dimens = 3:4)
plot(irisMclustDA, what = "error", dimens = 4)
plot(irisMclustDA, what = "error", newdata = X.test, newclass = Class.test)
plot(irisMclustDA, what = "error", newdata = X.test, newclass = Class.test, dimens = 3:4)
plot(irisMclustDA, what = "error", newdata = X.test, newclass = Class.test, dimens = 4)

## Not run: 
# simulated 1D data
n <- 250 
set.seed(1)
triModal <- c(rnorm(n,-5), rnorm(n,0), rnorm(n,5))
triClass <- c(rep(1,n), rep(2,n), rep(3,n))
odd <- seq(from = 1, to = length(triModal), by = 2)
even <- odd + 1
triMclustDA <- MclustDA(triModal[odd], triClass[odd])
summary(triMclustDA, parameters = TRUE)
summary(triMclustDA, newdata = triModal[even], newclass = triClass[even])
plot(triMclustDA, what = "scatterplot")
plot(triMclustDA, what = "classification")
plot(triMclustDA, what = "classification", newdata = triModal[even])
plot(triMclustDA, what = "train&test", newdata = triModal[even])
plot(triMclustDA, what = "error")
plot(triMclustDA, what = "error", newdata = triModal[even], newclass = triClass[even])

# simulated 2D cross data
data(cross)
odd <- seq(from = 1, to = nrow(cross), by = 2)
even <- odd + 1
crossMclustDA <- MclustDA(cross[odd,-1], cross[odd,1])
summary(crossMclustDA, parameters = TRUE)
summary(crossMclustDA, newdata = cross[even,-1], newclass = cross[even,1])
plot(crossMclustDA, what = "scatterplot")
plot(crossMclustDA, what = "classification")
plot(crossMclustDA, what = "classification", newdata = cross[even,-1])
plot(crossMclustDA, what = "train&test", newdata = cross[even,-1])
plot(crossMclustDA, what = "error")
plot(crossMclustDA, what = "error", newdata =cross[even,-1], newclass = cross[even,1])

## End(Not run)

Example output

Package 'mclust' version 5.3
Type 'citation("mclust")' for citing this R package in publications.
------------------------------------------------
Gaussian finite mixture model for classification 
------------------------------------------------

EDDA model summary:

 log.likelihood  n df       BIC
       -125.443 75 22 -345.8707
            
Classes       n Model G
  setosa     25   EEE 1
  versicolor 25   EEE 1
  virginica  25   EEE 1

Estimated parameters:

Class = setosa

Mixing probabilities: 1 

Means:
              [,1]
Sepal.Length 5.024
Sepal.Width  3.480
Petal.Length 1.456
Petal.Width  0.228

Variances:
[,,1]
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   0.26418133  0.06244800   0.15935467  0.03141333
Sepal.Width    0.06244800  0.09630933   0.03326933  0.03222400
Petal.Length   0.15935467  0.03326933   0.18236800  0.04091733
Petal.Width    0.03141333  0.03222400   0.04091733  0.03891200

Class = versicolor

Mixing probabilities: 1 

Means:
              [,1]
Sepal.Length 5.992
Sepal.Width  2.776
Petal.Length 4.308
Petal.Width  1.352

Variances:
[,,1]
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   0.26418133  0.06244800   0.15935467  0.03141333
Sepal.Width    0.06244800  0.09630933   0.03326933  0.03222400
Petal.Length   0.15935467  0.03326933   0.18236800  0.04091733
Petal.Width    0.03141333  0.03222400   0.04091733  0.03891200

Class = virginica

Mixing probabilities: 1 

Means:
              [,1]
Sepal.Length 6.504
Sepal.Width  2.936
Petal.Length 5.564
Petal.Width  2.076

Variances:
[,,1]
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   0.26418133  0.06244800   0.15935467  0.03141333
Sepal.Width    0.06244800  0.09630933   0.03326933  0.03222400
Petal.Length