MclustDR: Dimension reduction for model-based clustering and...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/mclustdr.R

Description

A dimension reduction method for visualizing the clustering or classification structure obtained from a finite mixture of Gaussian densities.

Usage

1
2
MclustDR(object, normalized = TRUE, Sigma, lambda = 0.5, 
         tol = sqrt(.Machine$double.eps))

Arguments

object

An object of class 'Mclust' or 'MclustDA' resulting from a call to, respectively, Mclust or MclustDA.

normalized

Logical. If TRUE directions are normalized to unit norm.

Sigma

Marginal covariance matrix of data. If not provided is estimated by the MLE of observed data.

lambda

A tuning parameter in the range [0,1] described in Scrucca (2014). The default 0.5 gives equal importance to differences in means and covariances among clusters/classes. To recover the directions that mostly separate the estimated clusters or classes set this parameter to 1.

tol

A tolerance value.

Details

The method aims at reducing the dimensionality by identifying a set of linear combinations, ordered by importance as quantified by the associated eigenvalues, of the original features which capture most of the clustering or classification structure contained in the data.

Information on the dimension reduction subspace is obtained from the variation on group means and, depending on the estimated mixture model, on the variation on group covariances (see Scrucca, 2010).

Observations may then be projected onto such a reduced subspace, thus providing summary plots which help to visualize the underlying structure.

The method has been extended to the supervised case, i.e. when the true classification is known (see Scrucca, 2013).

This implementation doesn't provide a formal procedure for the selection of dimensionality. A future release will include one or more methods.

Value

An object of class 'MclustDR' with the following components:

call

The matched call

type

A character string specifying the type of model for which the dimension reduction is computed. Currently, possible values are "Mclust" for clustering, and "MclustDA" or "EDDA" for classification.

x

The data matrix.

Sigma

The covariance matrix of the data.

mixcomp

A numeric vector specifying the mixture component of each data observation.

class

A factor specifying the classification of each data observation. For model-based clustering this is equivalent to the corresponding mixture component. For model-based classification this is the known classification.

G

The number of mixture components.

modelName

The name of the parameterization of the estimated mixture model(s). See mclustModelNames.

mu

A matrix of means for each mixture component.

sigma

An array of covariance matrices for each mixture component.

pro

The estimated prior for each mixture component.

M

The kernel matrix.

lambda

The tuning parameter.

evalues

The eigenvalues from the generalized eigen-decomposition of the kernel matrix.

raw.evectors

The raw eigenvectors from the generalized eigen-decomposition of the kernel matrix, ordered according to the eigenvalues.

basis

The basis of the estimated dimension reduction subspace.

std.basis

The basis of the estimated dimension reduction subspace standardized to variables having unit standard deviation.

numdir

The dimension of the projection subspace.

dir

The estimated directions, i.e. the data projected onto the estimated dimension reduction subspace.

Author(s)

Luca Scrucca

References

Scrucca, L. (2010) Dimension reduction for model-based clustering. Statistics and Computing, 20(4), pp. 471-484.

Scrucca, L. (2014) Graphical Tools for Model-based Mixture Discriminant Analysis. Advances in Data Analysis and Classification, 8(2), pp. 147-165.

See Also

summary.MclustDR, plot.MclustDR, Mclust, MclustDA.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# clustering
data(diabetes)
mod <- Mclust(diabetes[,-1])
summary(mod)

dr <- MclustDR(mod)
summary(dr)
plot(dr, what = "scatterplot")
plot(dr, what = "evalues")

# adjust the tuning parameter to show the most separating directions
dr1 <- MclustDR(mod, lambda = 1) 
summary(dr1)
plot(dr1, what = "scatterplot")
plot(dr1, what = "evalues")

# classification
data(banknote)

da <- MclustDA(banknote[,2:7], banknote$Status, modelType = "EDDA")
dr <- MclustDR(da)
summary(dr)

da <- MclustDA(banknote[,2:7], banknote$Status)
dr <- MclustDR(da)
summary(dr)

Example output

Package 'mclust' version 5.3
Type 'citation("mclust")' for citing this R package in publications.
----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm 
----------------------------------------------------

Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 3 components:

 log.likelihood   n df       BIC       ICL
      -2307.883 145 29 -4760.091 -4776.086

Clustering table:
 1  2  3 
82 33 30 
-----------------------------------------------------------------
Dimension reduction for model-based clustering and classification 
-----------------------------------------------------------------

Mixture model type: Mclust (VVV, 3)
        
Clusters  n
       1 82
       2 33
       3 30

Estimated basis vectors:
             Dir1     Dir2       Dir3
glucose -0.986054  0.24922  0.9588647
insulin  0.157645 -0.11513 -0.2837395
sspg    -0.053353 -0.96158 -0.0083946

               Dir1     Dir2      Dir3
Eigenvalues  1.3749  0.77725   0.65829
Cum. %      48.9207 76.57662 100.00000
-----------------------------------------------------------------
Dimension reduction for model-based clustering and classification 
-----------------------------------------------------------------

Mixture model type: Mclust (VVV, 3)
        
Clusters  n
       1 82
       2 33
       3 30

Estimated basis vectors:
            Dir1     Dir2
glucose  0.81116  0.92578
insulin -0.56210 -0.19371
sspg    -0.16147 -0.32467

               Dir1     Dir2
Eigenvalues  1.0574   0.3968
Cum. %      72.7144 100.0000
-----------------------------------------------------------------
Dimension reduction for model-based clustering and classification 
-----------------------------------------------------------------

Mixture model type: EDDA 
             
Classes         n Model G
  counterfeit 100   EVE 1
  genuine     100   EVE 1

Estimated basis vectors:
              Dir1     Dir2       Dir3       Dir4      Dir5      Dir6
Length   -0.020745 -0.36142  0.0011790  0.1379150 -0.280588  0.810278
Left     -0.242628 -0.20988 -0.8770222 -0.1895457  0.306472 -0.309510
Right     0.310751 -0.22412  0.4663900  0.2107356 -0.893539 -0.466528
Bottom    0.474088  0.13324 -0.0068407 -0.6435361  0.019346  0.105672
Top       0.572797  0.74715 -0.0716714 -0.0094044 -0.075745  0.130430
Diagonal -0.539703  0.44621 -0.0901519 -0.6974348 -0.151066 -0.042735

                Dir1     Dir2     Dir3      Dir4      Dir5       Dir6
Eigenvalues  0.86813  0.29011  0.12988  0.081802  0.027126 2.0193e-03
Cum. %      62.05056 82.78657 92.06996 97.916830 99.855667 1.0000e+02
-----------------------------------------------------------------
Dimension reduction for model-based clustering and classification 
-----------------------------------------------------------------

Mixture model type: MclustDA 
             
Classes         n Model G
  counterfeit 100   EVE 2
  genuine     100   XXX 1

Estimated basis vectors:
             Dir1      Dir2     Dir3      Dir4      Dir5      Dir6
Length   -0.10027 -0.327553  0.79718 -0.033721 -0.317043  0.084618
Left     -0.21760 -0.305350 -0.30266 -0.893676  0.371043 -0.565611
Right     0.29180 -0.018877 -0.49600  0.406605 -0.861020  0.481331
Bottom    0.57603  0.445501  0.12002 -0.034570  0.004359 -0.078688
Top       0.57555  0.385645  0.10093 -0.103629  0.136005  0.625416
Diagonal -0.44088  0.672251 -0.04781 -0.151473 -0.044035  0.209542

                Dir1     Dir2     Dir3     Dir4      Dir5       Dir6
Eigenvalues  0.87241  0.55372  0.48603  0.13301  0.053113   0.027239
Cum. %      41.04429 67.09530 89.96182 96.21965 98.718473 100.000000

mclust documentation built on Nov. 17, 2018, 5:04 p.m.