Description Usage Arguments Details Value Note Author(s) References Examples
Cluster analysis based on copula functions
1 2 3 |
m |
a data matrix. |
dimset |
the set of dimensions for which the function tries the clustering. |
noc |
sample size of the set for selecting the number of clusters. |
copula |
a copula model. This should be one of "normal", "t", "frank", "clayton" and "gumbel". See the Details section. |
fun |
combination function of the pairwise Spearman's rho used to select the k-plets. The default is |
method.ma |
estimation method for margins. See the Details section. |
method.c |
estimation method for copula. See |
dfree |
degrees of freedom for the t copula. |
writeout |
writes a message on the number of allocated observations every writeout observations. |
penalty |
Specifies the likelihood criterion used for selecting the number of clusters. |
... |
further parameters for |
CoClust(m, nmaxmarg = 2:5, noc = 4, copula = "frank", fun = median, method.ma=c("gaussian","empirical"), method.c = "mpl", penalty ="BICk", ...)
CoClust is a clustering algorithm that, being based on copula functions, allows to group observations according to the multivariate dependence structure of the generating process without any assumptions on the margins.
For each k in dimset
the algorithm builds a sample of noc
observations (rows of the data matrix m
) by using the matrix of Spearman's rho correlation coefficients which are combined by means of the function fun
(median
by default).
The number of clusters K is selected by means of a criterion based on the likelihood of the copula fit. The switch penalty
allows to select 3 different criteria; The choice LL
corresponds to using
the likelihood without penalty terms. Then, the remaining observations are allocated to the clusters as follows:
1. selects a K-plet of observations on the basis of fun
applied to the pairwise Spearman's rho; 2. allocates or discards the K-plet on the basis of the likelihood of the copula fit.
The estimation approach for the copula fit is semiparametric: a range of nonparametric margins and parametric copula models can be selected by the user. The CoClust algorithm does not require to set a priori the number of clusters nor it needs a starting classification.
Notice that the dependence structure for the Gaussian and the t copula is set to exchangeable. Non structured dependence structures will be allowed in a future version.
An object of S4 class "CoClust", which is a list with the following elements:
Number.of.Clusters |
the number K of identified clusters. | |||||||||
Index.Matrix |
a n.obs by (K+1) matrix where n.obs is the number of observations put in each cluster. The matrix contains the row indexes of the observations of the data matrix | |||||||||
Data.Clusters |
the matrix of the final clustering. | |||||||||
Dependence |
a list containing:
| |||||||||
LogLik |
the maximized log-likelihood copula fit. | |||||||||
Est.Method |
the estimation method used for the copula fit. | |||||||||
Opt.Method |
the optimization method used for the copula fit. | |||||||||
LLC |
the value of the LogLikelihood Criterion for each k in | |||||||||
Index.dimset |
a list that, for each k in |
The final clustering is composed of K groups in which observations of the same group are independent whereas the observations that belong to different groups and that form a K-plet are dependent.
Francesca Marta Lilja Di Lascio <marta.dilascio@unibz.it>,
Simone Giannerini <simone.giannerini@unibo.it>
Di Lascio, F.M.L. (201x). "CoClust: An R Package for Copula-based Cluster Analysis". To be submitted.
Di Lascio, F.M.L., Durante, F. and Pappada', R. (2017). "Copula-based clustering methods", Copulas and Dependence Models with Applications, p.49-67. Eds Ubeda-Flores, M., de Amo, E., Durante, F. and Fernandez Sanchez, J., Springer International Publishing. ISBN: 978-3-319-64220-8.
Di Lascio, F.M.L. and Disegna, M. (2017). "A copula-based clustering algorithm to analyse EU country diets". Knowledge-Based Systems, 132, p.72-84. DOI: 10.1016/j.knosys.2017.06.004.
Di Lascio, F.M.L. and Giannerini, S. (2016). "Clustering dependent observations with copula functions". Statistical Papers, p.1-17. DOI 10.1007/s00362-016-0822-3.
Di Lascio, F.M.L. and Giannerini, S. (2012). "A Copula-Based Algorithm for Discovering Patterns of Dependent Observations", Journal of Classification, 29(1), p.50-75.
Di Lascio, F.M.L. (2008). "Analyzing the dependence structure of microarray data: a copula-based approach". PhD thesis, Dipartimento di Scienze Statistiche, Universita' di Bologna, Italy.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ## ******************************************************************
## 1. builds a 3-variate copula with different margins
## (Gaussian, Gamma, Beta)
##
## 2. generates a data matrix xm with 15 rows and 21 columns and
## builds the matrix of the true cluster indexes
##
## 3. applies the CoClust to the rows of xm and recovers the
## multivariate dependence structure of the data
## ******************************************************************
## Step 1. **********************************************************
n <- 105 # total number of observations
n.col <- 21 # number of columns of the data matrix m
n.marg <- 3 # dimension of the copula
n.row <- n*n.marg/n.col # number of rows of the data matrix m
theta <- 10
copula <- frankCopula(theta, dim = n.marg)
mymvdc <- mvdc(copula, c("norm", "gamma", "beta"),list(list(mean=7, sd=2),
list(shape=3, rate=4), list(shape1=2, shape2=1)))
## Step 2. **********************************************************
set.seed(11)
x.samp <- rMvdc(n, mymvdc)
xm <- matrix(x.samp, nrow = n.row, ncol = n.col, byrow=TRUE)
index.true <- matrix(1:15,5,3)
colnames(index.true) <- c("Cluster 1","Cluster 2", "Cluster 3")
## Step 3. **********************************************************
clust <- CoClust(xm, dimset = 2:4, noc=2, copula="frank",
method.ma="empirical", method.c="ml",writeout=1)
clust
clust@"Number.of.Clusters"
clust@"Dependence"$Param
clust@"Data.Clusters"
index.clust <- clust@"Index.Matrix"
## compare with index.true
index.clust
index.true
##
|
Loading required package: copula
Number of clusters selected: 3
Allocated observations: 3
Allocated observations: 4
Allocated observations: 5
An object of class "CoClust"
Slot "Number.of.Clusters":
[1] 3
Slot "Index.Matrix":
Cluster 1 Cluster 2 Cluster 3 LogLik
[1,] 11 1 6 29.59298
[2,] 13 3 8 59.11185
[3,] 12 2 7 87.55890
[4,] 14 4 9 118.68674
[5,] 15 5 10 148.43749
Slot "Data.Clusters":
Cluster 1 Cluster 2 Cluster 3
[1,] 0.3821719 4.724571 0.1722283
[2,] 0.3911910 5.738707 0.1548273
[3,] 0.9923630 11.130341 1.3684076
[4,] 0.6133405 6.711067 0.5572253
[5,] 0.1489788 2.763910 0.3727072
[6,] 0.8220294 9.946755 0.7025062
[7,] 0.7748810 8.849824 0.7472348
[8,] 0.7060588 6.977426 0.6037493
[9,] 0.7420107 6.979874 1.2564989
[10,] 0.8868274 8.121758 0.8366584
[11,] 0.8408346 9.083846 0.8449545
[12,] 0.9265845 7.497380 0.5735631
[13,] 0.7366919 9.083588 0.8049378
[14,] 0.1854497 4.737831 0.1091059
[15,] 0.7962249 7.314255 0.5804155
[16,] 0.5226436 6.609135 0.4938606
[17,] 0.3812399 4.108089 0.3293736
[18,] 0.6755020 7.028995 0.5640552
[19,] 0.3773079 4.926126 0.4780587
[20,] 0.2413054 4.050170 0.2756642
[21,] 0.5509937 5.554924 0.5825189
[22,] 0.5547837 6.950117 0.5624433
[23,] 0.8341096 7.723815 1.2358954
[24,] 0.1245806 5.237230 0.2698407
[25,] 0.8439165 8.633602 1.0305195
[26,] 0.8290546 8.875230 0.7430665
[27,] 0.2653793 3.818821 0.3310576
[28,] 0.1377339 3.847222 0.1881491
[29,] 0.4238974 5.987543 0.4669754
[30,] 0.7442955 7.278040 0.6419422
[31,] 0.7690821 6.851152 0.5348509
[32,] 0.5766055 5.569177 0.5740944
[33,] 0.9077906 7.246912 1.6961565
[34,] 0.5948870 5.276076 0.3273010
[35,] 0.3188433 2.466374 0.2048978
[36,] 0.8563399 8.215923 1.0216487
[37,] 0.4859458 5.125675 0.4037320
[38,] 0.5840352 5.207848 0.4522179
[39,] 0.4421058 5.275799 0.4154543
[40,] 0.7105057 6.313148 0.5755700
[41,] 0.0623775 3.980188 0.1791108
[42,] 0.4427724 4.347047 0.1617050
[43,] 0.8102775 7.328941 0.8441203
[44,] 0.9520337 9.014100 1.0392218
[45,] 0.2989031 4.227565 0.2305394
[46,] 0.9201125 8.991290 1.3522166
[47,] 0.7760083 7.578709 0.8481472
[48,] 0.5374009 6.116144 0.5733275
[49,] 0.7421856 7.733539 0.7321689
[50,] 0.8619819 8.923534 1.2041696
[51,] 0.7675674 7.438398 1.0839021
[52,] 0.9134812 12.299546 1.2930689
[53,] 0.7558059 7.901530 0.9108837
[54,] 0.3538221 4.070995 0.3053536
[55,] 0.7324060 9.486247 0.7039008
[56,] 0.8761552 9.117614 0.7884164
[57,] 0.7343466 6.498987 0.6840582
[58,] 0.3253134 6.100498 0.5853185
[59,] 0.5710626 6.738952 0.3936210
[60,] 0.5760979 6.354835 0.6985677
[61,] 0.6924759 4.877017 0.5914819
[62,] 0.8750170 8.938650 0.6987183
[63,] 0.4963767 5.154385 0.5267843
[64,] 0.9845955 10.830024 1.6332640
[65,] 0.9130914 6.378303 0.7150011
[66,] 0.6379774 7.991038 0.5805825
[67,] 0.8735105 7.361490 0.7832649
[68,] 0.6784702 7.413119 0.7021784
[69,] 0.9622892 8.632043 1.1318317
[70,] 0.9638345 9.265671 1.4903069
[71,] 0.6172756 6.299267 0.8306458
[72,] 0.8349335 8.792616 0.9115916
[73,] 0.7755230 7.888000 0.7354790
[74,] 0.6527278 6.332678 0.5341079
[75,] 0.4630175 3.717388 0.2649272
[76,] 0.8449912 8.392278 2.3192944
[77,] 0.8263662 8.620581 1.3452385
[78,] 0.8194348 7.847350 1.1544766
[79,] 0.4565947 5.809143 0.5332319
[80,] 0.8729451 7.225545 0.5724901
[81,] 0.5329342 6.176113 0.4141617
[82,] 0.4477027 5.114751 0.4234471
[83,] 0.9191833 9.377256 1.5187152
[84,] 0.2842830 3.502181 0.4284995
[85,] 0.4114047 6.398566 0.1623977
[86,] 0.9633245 9.471324 1.3984225
[87,] 0.9856805 8.968840 1.3766028
[88,] 0.6509489 7.088675 1.1073758
[89,] 0.9299693 9.482104 1.1620751
[90,] 0.6456944 6.448404 0.5497347
[91,] 0.1856461 1.937598 0.2026946
[92,] 0.9422877 6.825404 0.7958724
[93,] 0.5364393 5.751602 0.2944656
[94,] 0.6440514 6.226674 0.6087897
[95,] 0.8683983 8.995176 0.9055552
[96,] 0.6618030 5.422601 0.4316463
[97,] 0.8657631 9.153244 0.8223764
[98,] 0.8635867 8.491585 1.0403472
[99,] 0.5914044 5.790188 0.4643897
[100,] 0.9022617 8.710650 0.9941284
[101,] 0.6248820 6.794992 0.4528421
[102,] 0.3686887 3.487050 0.2717204
[103,] 0.8063894 10.786327 1.3713073
[104,] 0.4984841 5.251239 0.2757156
[105,] 0.9103522 10.797712 1.0076279
Slot "Dependence":
$Copula
[1] "frank"
$Param
[1] 10.30767
$Std.Err
[1] 0.7387905
$P.value
[1] 0
Slot "LogLik":
[1] 148.4375
Slot "Est.Method":
[1] "maximum likelihood"
Slot "Opt.Method":
[1] "ml"
Slot "LLC":
2 3 4
-63.58120 -114.48603 -40.71821
Slot "Index.dimset":
$`2`
1 2 LogLik
[1,] 11 1 18.59320
[2,] 8 3 33.65943
$`3`
1 2 3 LogLik
[1,] 11 1 6 29.59298
[2,] 13 3 8 59.11185
$`4`
1 2 3 4 LogLik
[1,] 11 1 6 12 3.370454
[2,] 7 3 13 8 22.227938
[1] 3
[1] 10.30767
Cluster 1 Cluster 2 Cluster 3
[1,] 0.3821719 4.724571 0.1722283
[2,] 0.3911910 5.738707 0.1548273
[3,] 0.9923630 11.130341 1.3684076
[4,] 0.6133405 6.711067 0.5572253
[5,] 0.1489788 2.763910 0.3727072
[6,] 0.8220294 9.946755 0.7025062
[7,] 0.7748810 8.849824 0.7472348
[8,] 0.7060588 6.977426 0.6037493
[9,] 0.7420107 6.979874 1.2564989
[10,] 0.8868274 8.121758 0.8366584
[11,] 0.8408346 9.083846 0.8449545
[12,] 0.9265845 7.497380 0.5735631
[13,] 0.7366919 9.083588 0.8049378
[14,] 0.1854497 4.737831 0.1091059
[15,] 0.7962249 7.314255 0.5804155
[16,] 0.5226436 6.609135 0.4938606
[17,] 0.3812399 4.108089 0.3293736
[18,] 0.6755020 7.028995 0.5640552
[19,] 0.3773079 4.926126 0.4780587
[20,] 0.2413054 4.050170 0.2756642
[21,] 0.5509937 5.554924 0.5825189
[22,] 0.5547837 6.950117 0.5624433
[23,] 0.8341096 7.723815 1.2358954
[24,] 0.1245806 5.237230 0.2698407
[25,] 0.8439165 8.633602 1.0305195
[26,] 0.8290546 8.875230 0.7430665
[27,] 0.2653793 3.818821 0.3310576
[28,] 0.1377339 3.847222 0.1881491
[29,] 0.4238974 5.987543 0.4669754
[30,] 0.7442955 7.278040 0.6419422
[31,] 0.7690821 6.851152 0.5348509
[32,] 0.5766055 5.569177 0.5740944
[33,] 0.9077906 7.246912 1.6961565
[34,] 0.5948870 5.276076 0.3273010
[35,] 0.3188433 2.466374 0.2048978
[36,] 0.8563399 8.215923 1.0216487
[37,] 0.4859458 5.125675 0.4037320
[38,] 0.5840352 5.207848 0.4522179
[39,] 0.4421058 5.275799 0.4154543
[40,] 0.7105057 6.313148 0.5755700
[41,] 0.0623775 3.980188 0.1791108
[42,] 0.4427724 4.347047 0.1617050
[43,] 0.8102775 7.328941 0.8441203
[44,] 0.9520337 9.014100 1.0392218
[45,] 0.2989031 4.227565 0.2305394
[46,] 0.9201125 8.991290 1.3522166
[47,] 0.7760083 7.578709 0.8481472
[48,] 0.5374009 6.116144 0.5733275
[49,] 0.7421856 7.733539 0.7321689
[50,] 0.8619819 8.923534 1.2041696
[51,] 0.7675674 7.438398 1.0839021
[52,] 0.9134812 12.299546 1.2930689
[53,] 0.7558059 7.901530 0.9108837
[54,] 0.3538221 4.070995 0.3053536
[55,] 0.7324060 9.486247 0.7039008
[56,] 0.8761552 9.117614 0.7884164
[57,] 0.7343466 6.498987 0.6840582
[58,] 0.3253134 6.100498 0.5853185
[59,] 0.5710626 6.738952 0.3936210
[60,] 0.5760979 6.354835 0.6985677
[61,] 0.6924759 4.877017 0.5914819
[62,] 0.8750170 8.938650 0.6987183
[63,] 0.4963767 5.154385 0.5267843
[64,] 0.9845955 10.830024 1.6332640
[65,] 0.9130914 6.378303 0.7150011
[66,] 0.6379774 7.991038 0.5805825
[67,] 0.8735105 7.361490 0.7832649
[68,] 0.6784702 7.413119 0.7021784
[69,] 0.9622892 8.632043 1.1318317
[70,] 0.9638345 9.265671 1.4903069
[71,] 0.6172756 6.299267 0.8306458
[72,] 0.8349335 8.792616 0.9115916
[73,] 0.7755230 7.888000 0.7354790
[74,] 0.6527278 6.332678 0.5341079
[75,] 0.4630175 3.717388 0.2649272
[76,] 0.8449912 8.392278 2.3192944
[77,] 0.8263662 8.620581 1.3452385
[78,] 0.8194348 7.847350 1.1544766
[79,] 0.4565947 5.809143 0.5332319
[80,] 0.8729451 7.225545 0.5724901
[81,] 0.5329342 6.176113 0.4141617
[82,] 0.4477027 5.114751 0.4234471
[83,] 0.9191833 9.377256 1.5187152
[84,] 0.2842830 3.502181 0.4284995
[85,] 0.4114047 6.398566 0.1623977
[86,] 0.9633245 9.471324 1.3984225
[87,] 0.9856805 8.968840 1.3766028
[88,] 0.6509489 7.088675 1.1073758
[89,] 0.9299693 9.482104 1.1620751
[90,] 0.6456944 6.448404 0.5497347
[91,] 0.1856461 1.937598 0.2026946
[92,] 0.9422877 6.825404 0.7958724
[93,] 0.5364393 5.751602 0.2944656
[94,] 0.6440514 6.226674 0.6087897
[95,] 0.8683983 8.995176 0.9055552
[96,] 0.6618030 5.422601 0.4316463
[97,] 0.8657631 9.153244 0.8223764
[98,] 0.8635867 8.491585 1.0403472
[99,] 0.5914044 5.790188 0.4643897
[100,] 0.9022617 8.710650 0.9941284
[101,] 0.6248820 6.794992 0.4528421
[102,] 0.3686887 3.487050 0.2717204
[103,] 0.8063894 10.786327 1.3713073
[104,] 0.4984841 5.251239 0.2757156
[105,] 0.9103522 10.797712 1.0076279
Cluster 1 Cluster 2 Cluster 3 LogLik
[1,] 11 1 6 29.59298
[2,] 13 3 8 59.11185
[3,] 12 2 7 87.55890
[4,] 14 4 9 118.68674
[5,] 15 5 10 148.43749
Cluster 1 Cluster 2 Cluster 3
[1,] 1 6 11
[2,] 2 7 12
[3,] 3 8 13
[4,] 4 9 14
[5,] 5 10 15
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.