# em.cluster.R: Compute estimates of the parameters by Expectation and... In ClustMMDD: Variable Selection in Clustering by Mixture Models for Discrete Data

## Description

Compute an approximation of the maximum likelihood estimates of parameters using Expectation and Maximization (EM) algorithm. A maximum a posteriori classification is then derived from the estimated set of parameters.

## Usage

 1 2 3 em.cluster.R(xdata, K, S, ploidy = 1, emOptions = list(epsi = NULL, typeSmallEM = NULL, typeEM = NULL, nberSmallEM = NULL, nberIterations = NULL, nberMaxIterations = NULL, putThreshold = NULL), cte = 1) 

## Arguments

 xdata A matrix of strings with the number of columns equal to ploidy * (number of variables). K The number of clusters (or populations). S The subset of clustering variables in the form of a vector of logicals indicating the selected variables. S gathers variables that are not identically distributed in at least two clusters. ploidy The number of unordered observations represented by a string in xdata. For example, for genotypic data from diploid individual, ploidy = 2. emOptions A list of EM options (see EmOptions and setEmOptions). cte A double used as a value of λ in the penalty function pen(K,S)=λ*dim≤ft(K,S\right), where dim≤ft(K,S\right) is the number of free parameters in the model defined by ≤ft(K,S\right).

## Value

A list of

• N : The size (number of lines) of the dataset.

• K : The number of clusters (populations).

• S : A vector of logicals indicating the selected variables for clustering.

• dim : The number of free parameters.

• pi_K : The vector of mixing proportions.

• prob : A list of matrices, each matrix being the probabilities of a variable in different clusters.

• logLik : The log-likelihood.

• entropy : The entropy.

• criteria : Criteria values c(BIC, AIC, ICL, CteDim).

• Tik : A stochastic matrix given the a posteriori membership probabilities.

• mapClassif : Maximum a posteriori classification.

• NbersLevels : The numbers of observed levels of the considered categorical variables.

• levels : The observed levels.

Wilson Toussile.

## References

dataR2C for transformation of a classic data frame, backward.explorer, selectK.R, dimJump.R, model.selection.R for both model selection and classification.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 data(genotype1) head(genotype1) genotype2 = cutEachCol(genotype1[, -11], ploidy = 2) head(genotype2) #See the EM options EmOptions() # Options can be set by \code{\link{setEmOptions()}} par5 = em.cluster.R (genotype2, K = 5, S = c(rep(TRUE, 8), rep(FALSE, 2)), ploidy = 2) slotNames(par5) head(par5["membershipProba"]) par5["mixingProportions"] par5 

### Example output

Loading required package: Rcpp

ClustMMDD = Clustering by Mixture Models for Discrete Data.

Version 1.0.4

ClustMMDD is the R version of the stand alone c++ package named 'MixMoGenD'

that is available on www.u-psud.fr/math/~toussile.

L1     L2     L3     L4     L5     L6     L7     L8     L9    L10 X1
1 109103 108105 107107 109110 107101 110105 101101 102110 109102 105105  3
2 107106 105105 103104 108108 104105 104104 104105 107105 109107 101108  2
3 105103 101108 110108 106103 101106 107103 105106 108103 103109 105109  1
4 101107 107107 108101 102105 107110 110101 107109 103110 108105 108105  2
5 106107 110105 103102 109101 103103 109101 110110 101109 102104 103103  2
6 106109 108103 102106 105109 104107 103105 109101 110107 105104 103110  3
[,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] [,11] [,12]
[1,] "109" "103" "108" "105" "107" "107" "109" "110" "107" "101" "110" "105"
[2,] "107" "106" "105" "105" "103" "104" "108" "108" "104" "105" "104" "104"
[3,] "105" "103" "101" "108" "110" "108" "106" "103" "101" "106" "107" "103"
[4,] "101" "107" "107" "107" "108" "101" "102" "105" "107" "110" "110" "101"
[5,] "106" "107" "110" "105" "103" "102" "109" "101" "103" "103" "109" "101"
[6,] "106" "109" "108" "103" "102" "106" "105" "109" "104" "107" "103" "105"
[,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] "101" "101" "102" "110" "109" "102" "105" "105"
[2,] "104" "105" "107" "105" "109" "107" "101" "108"
[3,] "105" "106" "108" "103" "103" "109" "105" "109"
[4,] "107" "109" "103" "110" "108" "105" "108" "105"
[5,] "110" "110" "101" "109" "102" "104" "103" "103"
[6,] "109" "101" "110" "107" "105" "104" "103" "110"
$epsi [1] 1e-08$nberSmallEM
[1] 20

$nberIterations [1] 15$typeSmallEM
[1] 0

$typeEM [1] 0$nberMaxIterations
[1] 5000

\$putThreshold
[1] FALSE

... Running 20 small EM with 15 iterations each...
... Runing a maximum of 5000 long run of EM...
> Number of iterations = 16
***  End modelKS:validity method
[1] "N"                 "P"                 "N_levels"
[4] "levels"            "K"                 "S"
[7] "dim"               "mixingProportions" "count"
[10] "frequencies"       "proba"             "logLik"
[13] "entropy"           "membershipProba"   "mapClassification"
[,1]         [,2]         [,3]         [,4]         [,5]
[1,] 4.265713e-34 0.9822721138 5.158822e-06 5.495124e-04 1.717321e-02
[2,] 1.337790e-01 0.0002482544 1.074987e-03 5.735069e-10 8.648977e-01
[3,] 1.607593e-02 0.0002432274 9.836766e-01 2.487085e-07 4.021143e-06
[4,] 2.305048e-24 0.0002662826 1.758410e-04 1.244906e-13 9.995579e-01
[5,] 1.903202e-04 0.0678531613 1.952116e-08 2.495877e-19 9.319565e-01
[6,] 3.744519e-20 0.9741822221 2.887787e-16 1.055541e-03 2.476224e-02
[1] 0.1956215 0.2059082 0.2094024 0.1876291 0.2014388
** Print a set of paramters of modelKS class **

Size of the dataset N =  1000
Number of variables P =  10
The numbers of observed levels N_levels =  10 10 10 10 10 10 10 10 10 10

** Model (K, S) :

Number of clusters K =  5
Clustering variables S =  TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE

** Mixing proportions :
Mixing proportions mixingProportions =  0.1956215 0.2059082 0.2094024 0.1876291 0.2014388

Probabilities in clusters :
[[1]]
Cluster1   Cluster2     Cluster3     Cluster4    Cluster5
101 1.118993e-01 0.11819258 3.113672e-01 5.950570e-02 0.143503876
102 2.035490e-01 0.09901159 1.271712e-01 5.560355e-02 0.008594933
103 2.396567e-01 0.13456966 1.659102e-01 1.103218e-01 0.079266856
104 4.877754e-02 0.03571355 1.171100e-01 6.377054e-02 0.117236790
105 1.578926e-01 0.01507121 2.861614e-02 2.637530e-02 0.074804126
106 4.436663e-02 0.10301908 5.437418e-02 2.076202e-01 0.078270844
107 1.276362e-01 0.10120333 7.390013e-02 1.123897e-10 0.212065123
108 6.622201e-02 0.04376357 1.509668e-16 3.570493e-01 0.255866428
109 1.511519e-18 0.34945542 9.784385e-02 6.937672e-02 0.015083718
110 3.176776e-16 0.00000000 2.370717e-02 5.037692e-02 0.015307306

[[2]]
Cluster1     Cluster2     Cluster3   Cluster4     Cluster5
101 0.136743419 2.142604e-03 4.076637e-01 0.13338408 9.887069e-02
102 0.151036915 9.334898e-02 3.288912e-02 0.06980242 2.146984e-01
103 0.077973892 8.950867e-02 1.494533e-02 0.13656339 5.492092e-02
104 0.325055903 1.008432e-10 1.429468e-01 0.12072796 1.133177e-01
105 0.092750522 9.447719e-02 9.649595e-02 0.09311839 1.698986e-01
106 0.062414279 1.223891e-01 5.378868e-23 0.06443292 3.777350e-08
107 0.051030251 4.774826e-02 7.243411e-02 0.08445300 8.524630e-02
108 0.054735689 2.040962e-01 1.457243e-01 0.04619272 1.592808e-01
109 0.006396985 9.254936e-02 6.913505e-02 0.19136503 3.628520e-02
110 0.041862144 2.537396e-01 1.776570e-02 0.05996008 6.748136e-02

[[3]]
Cluster1   Cluster2     Cluster3    Cluster4   Cluster5
101 0.04931904 0.08967035 2.276466e-01 0.057366646 0.16856180
102 0.03019754 0.20285507 1.043959e-01 0.111716579 0.11914862
103 0.06963899 0.09533378 5.002053e-02 0.147389724 0.05519338
104 0.04097297 0.15254335 9.395891e-02 0.149071667 0.05922136
105 0.23769488 0.03736204 9.084036e-02 0.076081690 0.02487748
106 0.15835554 0.02201261 3.289370e-11 0.003668069 0.05610355
107 0.15109185 0.08577592 3.923053e-02 0.054267037 0.10863932
108 0.09635051 0.08415454 1.837191e-01 0.018412806 0.31224200
109 0.11582790 0.09454037 3.651869e-02 0.109038414 0.08572853
110 0.05055077 0.13575197 1.736694e-01 0.272987369 0.01028397

[[4]]
Cluster1   Cluster2     Cluster3     Cluster4   Cluster5
101 0.08697599 0.04904831 1.346658e-01 2.053683e-08 0.12503465
102 0.14274392 0.09150015 1.770798e-02 1.079444e-01 0.03611017
103 0.04673491 0.14623664 3.766946e-02 2.595526e-01 0.02312675
104 0.11819643 0.09238972 4.338140e-14 4.751440e-02 0.05182393
105 0.12221024 0.10859380 1.514524e-02 1.397048e-01 0.29462327
106 0.07832502 0.04817275 2.960961e-01 8.912853e-06 0.01863534
107 0.10442206 0.06287882 1.123669e-01 1.287472e-01 0.09898253
108 0.19901902 0.03612580 1.480237e-01 6.188216e-02 0.20364288
109 0.07943731 0.13631921 9.106178e-06 2.263359e-01 0.12123784
110 0.02193511 0.22873481 2.383157e-01 2.830967e-02 0.02678265

[[5]]
Cluster1   Cluster2   Cluster3   Cluster4    Cluster5
101 9.595310e-02 0.14360245 0.18476488 0.04522558 0.057013591
102 7.018035e-02 0.04271829 0.05255368 0.06579984 0.075081543
103 1.339888e-01 0.04373855 0.07954699 0.06418894 0.211387956
104 1.490179e-01 0.11106193 0.15724800 0.07327882 0.090861291
105 2.204577e-01 0.02338597 0.03236026 0.10651135 0.080905072
106 1.069730e-01 0.12380590 0.10014010 0.17890102 0.007667402
107 6.595882e-02 0.27198930 0.07443125 0.05775581 0.229198424
108 1.372767e-01 0.14298111 0.20025950 0.21394114 0.092476432
109 2.019358e-02 0.07206992 0.04259808 0.08359567 0.067538099
110 3.443309e-08 0.02464657 0.07609725 0.11080184 0.087870189

[[6]]
Cluster1    Cluster2   Cluster3   Cluster4     Cluster5
101 8.178921e-02 0.126337228 0.03254637 0.03299787 1.115959e-01
102 1.788117e-01 0.004415949 0.16761533 0.20792214 1.534604e-08
103 1.961208e-01 0.192017393 0.21144974 0.10906584 8.863533e-02
104 8.876196e-02 0.017201149 0.01527874 0.14046550 1.416786e-01
105 7.379692e-02 0.179427719 0.18716408 0.09742538 9.752670e-02
106 1.582874e-01 0.115744251 0.03977677 0.05773491 1.615418e-01
107 7.743167e-02 0.200964027 0.16425775 0.08825492 5.469358e-02
108 1.150048e-01 0.032881160 0.09069674 0.14484582 5.243618e-02
109 2.999552e-02 0.098237104 0.06191584 0.05699568 1.774486e-01
110 3.087808e-16 0.032774021 0.02929866 0.06429194 1.144432e-01

[[7]]
Cluster1   Cluster2   Cluster3   Cluster4   Cluster5
101 0.09554544 0.08269062 0.14128342 0.12787743 0.17724473
102 0.11502397 0.11997959 0.08588237 0.11572217 0.05012559
103 0.14078655 0.11120320 0.04167430 0.07318986 0.09731107
104 0.07071791 0.09121617 0.07316286 0.08941852 0.07340147
105 0.07468915 0.05926619 0.09579568 0.07936268 0.08307838
106 0.11121807 0.11538072 0.11743463 0.15293522 0.07297167
107 0.06231957 0.05342109 0.09872543 0.14923916 0.11484460
108 0.10582302 0.10848786 0.05669696 0.06939014 0.05494529
109 0.14053292 0.12735693 0.16851021 0.08273637 0.11901735
110 0.08334340 0.13099763 0.12083414 0.06012846 0.15705984

[[8]]
Cluster1   Cluster2   Cluster3   Cluster4   Cluster5
101 0.06742645 0.15817520 0.05320853 0.05531489 0.14009064
102 0.12874656 0.11386075 0.08098710 0.08476688 0.04719011
103 0.16839270 0.10164922 0.10134770 0.09753639 0.13952246
104 0.10075154 0.09048680 0.06390967 0.07474381 0.10550056
105 0.09751164 0.09767745 0.08567328 0.15653544 0.08191692
106 0.04165078 0.07942041 0.08377406 0.16923476 0.05561512
107 0.06391834 0.11539593 0.12813849 0.07199876 0.06152569
108 0.13804903 0.06418548 0.11813929 0.12755343 0.15938804
109 0.11098163 0.11893656 0.16036825 0.06470954 0.05995225
110 0.08257134 0.06021220 0.12445363 0.09760610 0.14929821

[[9]]
Cluster1 Cluster2 Cluster3 Cluster4 Cluster5
101   0.0995   0.0995   0.0995   0.0995   0.0995
102   0.1190   0.1190   0.1190   0.1190   0.1190
103   0.1140   0.1140   0.1140   0.1140   0.1140
104   0.0875   0.0875   0.0875   0.0875   0.0875
105   0.1235   0.1235   0.1235   0.1235   0.1235
106   0.1255   0.1255   0.1255   0.1255   0.1255
107   0.0645   0.0645   0.0645   0.0645   0.0645
108   0.0905   0.0905   0.0905   0.0905   0.0905
109   0.0710   0.0710   0.0710   0.0710   0.0710
110   0.1050   0.1050   0.1050   0.1050   0.1050

[[10]]
Cluster1 Cluster2 Cluster3 Cluster4 Cluster5
101   0.0470   0.0470   0.0470   0.0470   0.0470
102   0.0930   0.0930   0.0930   0.0930   0.0930
103   0.0915   0.0915   0.0915   0.0915   0.0915
104   0.0990   0.0990   0.0990   0.0990   0.0990
105   0.1165   0.1165   0.1165   0.1165   0.1165
106   0.1105   0.1105   0.1105   0.1105   0.1105
107   0.0580   0.0580   0.0580   0.0580   0.0580
108   0.1330   0.1330   0.1330   0.1330   0.1330
109   0.1395   0.1395   0.1395   0.1395   0.1395
110   0.1120   0.1120   0.1120   0.1120   0.1120

Number of free parameters dim =  382
Log likelihood =  -37843.62
Entropy =  177.7498

***  End modelKS:show method ***


ClustMMDD documentation built on May 2, 2019, 2:44 p.m.