rockCluster: Rock Clustering

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/rock.r

Description

Cluster a data matrix using the Rock algorithm.

Usage

1
2
3
4
rockCluster(x, n, beta = 1-theta, theta = 0.5, fun = "dist",
            funArgs = list(method="binary"), debug = FALSE)

rockLink(x, beta = 0.5)

Arguments

x

a data matrix; for rockLink an object of class dist.

n

the number of desired clusters.

beta

optional distance threshold.

theta

neighborhood parameter in the range [0,1).

fun

distance function to use.

funArgs

a list of named parameter arguments to fun.

debug

turn on/off debugging output.

Details

The intended area of application is the clustering of binary (logical) data. For instance in a preprocessing step in data mining. However, arbitrary distance metrics could be used (see dist).

According to the reference (see below) the distance threshold and the neighborhood parameter are coupled. Thus, higher values of the neighborhood parameter theta pose a tighter constraint on the neighborhood. For any two data points the latter is defined as the number of other data points that are neighbors to both. Further, points only are neighbors (or linked) if their distance is less than or equal beta.

Note that for a tight neighborhood specification the algorithm may be running out of clusters to merge, i.e. may terminate with more than the desired number of clusters.

The debug option can help in determining the proper settings by examining lines suffixed with a plus which indicates that non-singleton clusters were merged.

Note that tie-breaking is not implemented, i.e. the first max encountered is used. However, permuting the order of the data can help in determining the dependence of a solution on ties.

Function rockLink is provided for applications that need to compute link count distances efficiently. Note that NA and NaN distances are ignored but supplying such values for the threshold beta results in an error.

Value

rockCluster returns an object of class rock, a list with the following components:

x

the data matrix or a subset of it.

cl

a factor of cluster labels.

size

a vector of cluster sizes.

beta

see above.

theta

see above.

rockLink returns an object of class dist.

Author(s)

Christian Buchta

References

S. Guha, R. Rastogi, and K. Shim. ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Science, Vol. 25, No. 5, 2000.

See Also

dist for common distance functions, predict for classifying new data samples, and fitted for classifying the clustered data samples.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
### example from paper
data(Votes)
x <- as.dummy(Votes[-17])
rc <- rockCluster(x, n=2, theta=0.73, debug=TRUE)
print(rc)
rf <- fitted(rc)
table(Votes$Class, rf$cl)
## Not run: 
### large example from paper
data("Mushroom")
x <- as.dummy(Mushroom[-1])
rc <- rockCluster(x[sample(dim(x)[1],1000),], n=10, theta=0.8)
print(rc)
rp <- predict(rc, x)
table(Mushroom$class, rp$cl)

## End(Not run)
### real valued example
gdist <- function(x, y=NULL) 1-exp(-dist(x, y)^2)
xr <- matrix(rnorm(200, sd=0.6)+rep(rep(c(1,-1),each=50),2), ncol=2)
rcr <- rockCluster(xr, n=2, theta=0.75, fun=gdist, funArgs=NULL)
print(rcr)

Example output

Loading required package: grid
Loading required package: proxy

Attaching package: 'proxy'

The following objects are masked from 'package:stats':

    as.dist, dist

The following object is masked from 'package:base':

    as.matrix

Clustering:
computing distances ...
computing links ...
computing clusters ...
 #cls     clids       sizes     goodness
  435   82  133 [   1,   1]   169.739313
  434   82  346 [   2,   1]   220.408778
  433   82  412 [   3,   1]   262.095606
  432   82  111 [   4,   1]   229.634725
  431   82  121 [   5,   1]   264.331626
  430   82  125 [   6,   1]   295.630470
  429   82  171 [   7,   1]   324.511457
  428   57   82 [   1,   8]   320.762914
  427   57   58 [   9,   1]   351.069791
  426   57  211 [  10,   1]   379.609009
  425   57  233 [  11,   1]   406.703940
  424   57   59 [  12,   1]   375.335972
  423   57  304 [  13,   1]   399.180143
  422   38   57 [   1,  14]   396.398573
  421   38   67 [  15,   1]   420.734574
  420   38  158 [  16,   1]   444.202059
  419   38  217 [  17,   1]   466.902396
  418   38  330 [  18,   1]   488.918911
  417   15   38 [   1,  19]   451.172128
  416   15  191 [  20,   1]   463.610147
  415   15  190 [  21,   1]   464.688247
  414    8   15 [   1,  22]   473.316624
  413    8   33 [  23,   1]   496.205470
  412    8  106 [  24,   1]   518.524462
  411    8  146 [  25,   1]   540.319662
  410    8  230 [  26,   1]   561.631413
  409    8  253 [  27,   1]   582.495262
  408    8  327 [  28,   1]   602.942707
  407    8  374 [  29,   1]   623.001792
  406    1    8 [   1,  30]   554.895449
  405    1   30 [  31,   1]   554.048350
  404    1   35 [  32,   1]   572.071914
  403    1   61 [  33,   1]   589.789874
  402    1  154 [  34,   1]   607.219773
  401    1  250 [  35,   1]   624.377572
  400    1  251 [  36,   1]   641.277841
  399    1   86 [  37,   1]   607.072821
  398    1  135 [  38,   1]   624.047254
  397    1  279 [  39,   1]   640.779785
  396    1  305 [  40,   1]   657.282160
  395    1  308 [  41,   1]   673.565228
  394    1   49 [  42,   1]   650.620383
  393    1  126 [  43,   1]   664.715817
  392    1   84 [  44,   1]   659.196298
  391    1  113 [  45,   1]   673.144972
  390    1  132 [  46,   1]   668.726005
  389    1  278 [  47,   1]   659.141344
  388    1  283 [  48,   1]   673.865850
  387    1  378 [  49,   1]   688.426113
  386    1  379 [  50,   1]   702.828450
  385    1  405 [  51,   1]   717.078791
  384    1  134 [  52,   1]   702.469862
  383    1  195 [  53,   1]   715.238074
  382    1  225 [  54,   1]   719.202984
  381    1   83 [  55,   1]   693.982286
  380    1   87 [  56,   1]   706.451863
  379    1  119 [  57,   1]   718.807057
  378    1  303 [  58,   1]   731.051571
  377    1  306 [  59,   1]   743.188917
  376    1  356 [  60,   1]   755.222425
  375    1  357 [  61,   1]   767.155260
  374    1  409 [  62,   1]   778.990431
  373    1   89 [  63,   1]   685.827180
  372    1   18 [  64,   1]   621.103766
  371    0    1 [   1,  65]   622.560960
  370    0   99 [  66,   1]   596.344953
  369    0   51 [  67,   1]   577.935410
  368    0  228 [  68,   1]   585.356535
  367    0  224 [  69,   1]   572.121133
  366    0  434 [  70,   1]   569.184600
  365    0  403 [  71,   1]   571.845918
  364    0  399 [  72,   1]   572.506438
  363    0   56 [  73,   1]   531.873569
  362    0  148 [  74,   1]   540.716760
  361    0  324 [  75,   1]   534.715976
  360    0  359 [  76,   1]   529.046737
  359    0  150 [  77,   1]   513.218862
  358    0  214 [  78,   1]   521.103643
  357    0  404 [  79,   1]   528.933340
  356    0  313 [  80,   1]   519.171328
  355    0  231 [  81,   1]   513.349613
  354    0   55 [  82,   1]   509.983125
  353    0  401 [  83,   1]   517.563937
  352    0  247 [  84,   1]   505.524383
  351    0   53 [  85,   1]   498.542045
  350    0  410 [  86,   1]   482.527270
  349    0  407 [  87,   1]   473.417962
  348    0  432 [  88,   1]   449.117575
  347    0  369 [  89,   1]   444.876282
  346    0  163 [  90,   1]   447.345515
  345    0  347 [  91,   1]   437.231001
  344    0  223 [  92,   1]   437.006880
  343    0  276 [  93,   1]   429.313939
  342    0  310 [  94,   1]   420.567297
  341    0   79 [  95,   1]   416.627732
  340    0   14 [  96,   1]   414.300025
  339    0  388 [  97,   1]   403.744425
  338    0  335 [  98,   1]   400.841133
  337    0  302 [  99,   1]   397.306656
  336    0  122 [ 100,   1]   394.471513
  335    0  206 [ 101,   1]   399.800260
  334    0  235 [ 102,   1]   405.099303
  333    0  266 [ 103,   1]   410.369204
  332    0  120 [ 104,   1]   321.272140
  331    0   36 [ 105,   1]   264.233128
  330    0  392 [ 106,   1]   206.746215
  329    0    7 [ 107,   1]   197.627649
  328    0   66 [ 108,   1]   186.434718
  327    0  325 [ 109,   1]   181.303210
  326    0   65 [ 110,   1]   178.764792
  325    0   37 [ 111,   1]   179.428872
  324    0  282 [ 112,   1]   165.930022
  323   43  189 [   1,   1]   165.599329
  322   43  337 [   2,   1]   215.032954
  321   43  182 [   3,   1]   204.562425
  320   43  108 [   4,   1]   226.900978
  319   43  109 [   5,   1]   259.466627
  318   43  114 [   6,   1]   288.978784
  317   43  174 [   7,   1]   316.295977
  316   43  331 [   8,   1]   341.933266
  315   23   43 [   1,   9]   315.295840
  314   23   34 [  10,   1]   338.071245
  313   23   69 [  11,   1]   359.797787
  312   23  201 [  12,   1]   380.637328
  311   23  270 [  13,   1]   400.713486
  310   22   23 [   1,  14]   370.696920
  309   22  227 [  15,   1]   375.690098
  308   22   24 [  16,   1]   386.465104
  307   22  187 [  17,   1]   403.439935
  306   22  263 [  18,   1]   419.957473
  305   22  264 [  19,   1]   436.061099
  304   22  268 [  20,   1]   451.787666
  303   22  319 [  21,   1]   467.168789
  302   22   41 [  22,   1]   406.857783
  301   22   26 [  23,   1]   402.769344
  300   22  116 [  24,   1]   419.739305
  299   22  218 [  25,   1]   409.077244
  298   22   25 [  26,   1]   399.709453
  297   22   90 [  27,   1]   418.296980
  296   22  179 [  28,   1]   436.474908
  295   22  203 [  29,   1]   454.272140
  294   22  426 [  30,   1]   471.714453
  293   22   42 [  31,   1]   429.212139
  292   22  259 [  32,   1]   440.215048
  291   22  269 [  33,   1]   451.055987
  290    9   22 [   1,  34]   434.741004
  289    9   52 [  35,   1]   446.603402
  288    9  193 [  36,   1]   437.189932
  287    9  236 [  37,   1]   448.946979
  286    9  333 [  38,   1]   460.537199
  285    9  139 [  39,   1]   433.994277
  284    9  376 [  40,   1]   426.396404
  283    9  312 [  41,   1]   404.326759
  282    9  272 [  42,   1]   396.689534
  281    9   19 [  43,   1]   396.314183
  280    9   27 [  44,   1]   408.160732
  279    9   29 [  45,   1]   419.849705
  278    9   50 [  46,   1]   431.387981
  277    9  389 [  47,   1]   403.712974
  276    9   31 [  48,   1]   405.317395
  275    9   72 [  49,   1]   414.511114
  274    9  124 [  50,   1]   406.855089
  273    9  222 [  51,   1]   407.958529
  272    9  332 [  52,   1]   416.478489
  271    9   46 [  53,   1]   406.289084
  270    9  414 [  54,   1]   415.894249
  269    9  419 [  55,   1]   425.397795
  268    9   62 [  56,   1]   402.503640
  267    9   17 [  57,   1]   379.964371
  266    9   68 [  58,   1]   385.810084
  265    9  130 [  59,   1]   381.601005
  264    9  170 [  60,   1]   378.014213
  263    9  252 [  61,   1]   367.561863
  262    9  220 [  62,   1]   366.552528
  261    9  318 [  63,   1]   373.224938
  260    9   21 [  64,   1]   361.502824
  259    9   45 [  65,   1]   360.882263
  258    9   40 [  66,   1]   361.844724
  257    9   47 [  67,   1]   337.858056
  256    9   70 [  68,   1]   338.984882
  255    9  249 [  69,   1]   321.802238
  254    9  262 [  70,   1]   315.707725
  253    9  185 [  71,   1]   312.716443
  252    9   91 [  72,   1]   304.519377
  251    9  245 [  73,   1]   315.092157
  250    9   60 [  74,   1]   300.563254
  249    9  238 [  75,   1]   298.268562
  248    9  371 [  76,   1]   301.892349
  247    9   63 [  77,   1]   289.401800
  246    9  328 [  78,   1]   295.243545
  245    9  110 [  79,   1]   268.208553
  244    9  254 [  80,   1]   269.075376
  243    9  260 [  81,   1]   276.878832
  242    9  265 [  82,   1]   284.619607
  241    9  212 [  83,   1]   259.374147
  240    9  258 [  84,   1]   266.437758
  239    9  208 [  85,   1]   262.415257
  238    9  175 [  86,   1]   228.294984
  237    9  329 [  87,   1]   207.280297
  236    9  255 [  88,   1]   206.607982
  235    9  338 [  89,   1]   212.636564
  234    9  244 [  90,   1]   211.501653
  233    9  317 [  91,   1]   209.239731
  232    9   32 [  92,   1]   209.052796
  231    9  181 [  93,   1]   200.482579
  230    9  184 [  94,   1]   207.799094
  229    9  344 [  95,   1]   215.062695
  228    9   44 [  96,   1]   197.627162
  227    9  105 [  97,   1]   188.146241
  226    9  348 [  98,   1]   183.190845
  225    9  431 [  99,   1]   179.164484
  224    9  202 [ 100,   1]   162.156914
  223    9   39 [ 101,   1]   163.085098
  222    9  241 [ 102,   1]   160.287941
  221    9  209 [ 103,   1]   155.115632
  220    9  149 [ 104,   1]   154.114984
  219    9  210 [ 105,   1]   154.208186
  218    9   93 [ 106,   1]   151.714603
  217    9  200 [ 107,   1]   146.446604
  216    0  433 [ 113,   1]   143.044865
  215    0  339 [ 114,   1]   140.681848
  214    0  160 [ 115,   1]   136.453586
  213    0  340 [ 116,   1]   134.758777
  212    9   64 [ 108,   1]   130.290010
  211    9  186 [ 109,   1]   127.915928
  210    0  136 [ 117,   1]   124.344268
  209    0  375 [ 118,   1]   122.082259
  208    0  256 [ 119,   1]   116.944225
  207    0   10 [ 120,   1]   115.753982
  206    0  156 [ 121,   1]   115.189196
  205    0  416 [ 122,   1]   113.197037
  204    0  173 [ 123,   1]   110.810873
  203    9  422 [ 110,   1]   110.663919
  202    0  207 [ 124,   1]   100.506349
  201    9  429 [ 111,   1]    90.986981
  200    9  178 [ 112,   1]    88.566471
  199    9  198 [ 113,   1]    86.796001
  198    9  177 [ 114,   1]    87.348730
  197    9  285 [ 115,   1]    86.643841
  196    0  296 [ 125,   1]    84.585866
  195    9   98 [ 116,   1]    84.484992
  194    0  204 [ 126,   1]    83.727532
  193    0  161 [ 127,   1]    84.690417
  192    9   20 [ 117,   1]    82.341689
  191    0  430 [ 128,   1]    82.030022
  190    0  427 [ 129,   1]    74.976029
  189    9  301 [ 118,   1]    74.202799
  188    9  321 [ 119,   1]    72.315333
  187    9  418 [ 120,   1]    73.736523
  186    9  199 [ 121,   1]    66.321052
  185    9  169 [ 122,   1]    64.479325
  184    9  411 [ 123,   1]    61.221477
  183    9  172 [ 124,   1]    54.525712
  182    9  280 [ 125,   1]    52.536545
  181    9   74 [ 126,   1]    47.526498
  180    9  297 [ 127,   1]    45.974798
  179    0   11 [ 130,   1]    43.776740
  178    9  267 [ 128,   1]    42.422389
  177    9  284 [ 129,   1]    35.884249
  176    9  194 [ 130,   1]    34.981414
  175    0  400 [ 131,   1]    33.884950
  174    9  118 [ 131,   1]    33.486304
  173    0  345 [ 132,   1]    31.602662
  172    9  127 [ 132,   1]    31.602662
  171    9  298 [ 133,   1]    31.910298
  170    9  425 [ 134,   1]    31.030784
  169    9  428 [ 135,   1]    30.550790
  168    9  360 [ 136,   1]    30.270424
  167    9  226 [ 137,   1]    29.795977
  166    9   48 [ 138,   1]    28.542517
  165    9  417 [ 139,   1]    27.491147
  164    9  232 [ 140,   1]    26.445746
  163    9  176 [ 141,   1]    25.406222
  162    0  123 [ 133,   1]    22.396669
  161    9   12 [ 142,   1]    19.730105
  160    9  309 [ 143,   1]    18.907068
  159    0   96 [ 134,   1]    17.788348
  158    9  243 [ 144,   1]    17.318802
  157    0  364 [ 135,   1]    16.753659
  156    9  355 [ 145,   1]    16.698598
  155    0   75 [ 136,   1]    16.511140
  154    0  365 [ 137,   1]    16.858250
  153    0  155 [ 138,   1]    16.421722
  152    0  372 [ 139,   1]    16.572678
  151    0  229 [ 140,   1]    16.334138
  150    0  142 [ 141,   1]    10.666734
  149    0  382 [ 142,   1]    10.058485
  148    0    6 [ 143,   1]     9.839393
  147    9  180 [ 146,   1]     9.763925
  146    9  311 [ 147,   1]     9.739259
  145    0  353 [ 144,   1]     9.044263
  144    9  354 [ 148,   1]     8.381423
  143    9  138 [ 149,   1]     8.360551
  142    0  257 [ 145,   1]     8.061392
  141    9   71 [ 150,   1]     7.202621
  140    9  368 [ 151,   1]     6.806781
  139  141  300 [   1,   1]     6.209975
  138    9  424 [ 152,   1]     6.035718
  137    9  423 [ 153,   1]     5.832943
  136    9  145 [ 154,   1]     5.631209
  135    9  205 [ 155,   1]     5.805017
  134    9  192 [ 156,   1]     5.604431
  133    9  219 [ 157,   1]     5.404851
  132  141  273 [   2,   1]     5.375824
  131    0  384 [ 146,   1]     5.360586
  130    9  391 [ 158,   1]     4.834387
  129    9  112 [ 159,   1]     4.266630
  128  141  351 [   3,   1]     4.261717
  127   80  291 [   1,   1]     4.139983
  126  131  137 [   1,   1]     4.139983
  125   80  131 [   2,   2]     4.168170+
  124  307  402 [   1,   1]     4.139983
  123    9  341 [ 160,   1]     4.071698
  122    9  415 [ 161,   1]     4.062355
  121    9  115 [ 162,   1]     4.053095
  120    9  395 [ 163,   1]     3.676287
  119   80  234 [   4,   1]     3.644996
  118    9  246 [ 164,   1]     3.484613
  117    0  117 [ 147,   1]     3.246420
  116    9  293 [ 165,   1]     3.110841
  115    0   85 [ 148,   1]     3.047790
  114    0  162 [ 149,   1]     3.040200
  113    0  314 [ 150,   1]     3.222225
  112   80  307 [   5,   2]     2.852733+
  111    9   81 [ 166,   1]     2.556178
  110    9  292 [ 167,   1]     2.550535
  109    9  143 [ 168,   1]     2.363158
  108    0  141 [ 151,   4]     2.255211+
  107    9  299 [ 169,   1]     2.176622
  106    9   80 [ 170,   7]     2.116407+
  105    9  153 [ 177,   1]     2.318296
  104    9  196 [ 178,   1]     2.313512
  103   13  367 [   1,   1]     2.069992
  102   13  370 [   2,   1]     2.687912
  101   28  140 [   1,   1]     2.069992
  100   78  288 [   1,   1]     2.069992
   99   78  289 [   2,   1]     2.687912
   98   78  322 [   3,   1]     2.130859
   97   78  323 [   4,   1]     2.733747
   96   78  381 [   5,   1]     3.243333
   95  152  157 [   1,   1]     2.069992
   94  152  380 [   2,   1]     2.687912
   93  164  215 [   1,   1]     2.069992
   92  165  213 [   1,   1]     2.069992
   91  290  361 [   1,   1]     2.069992
   90    0  397 [ 155,   1]     2.059845
   89    9   92 [ 179,   1]     1.775973
   88    9  336 [ 180,   1]     1.772351
   87    0    5 [ 156,   1]     1.681329
   86    9  144 [ 181,   1]     1.591881
   85   78  396 [   6,   1]     1.478152
   84  165  287 [   2,   1]     1.343956
   83    9  366 [ 182,   1]     1.235633
   82    9  286 [ 183,   1]     1.056991
   81    9  320 [ 184,   1]     1.054885
   80    9  362 [ 185,   1]     1.052794
   79    0  239 [ 157,   1]     0.931871
   78    0   28 [ 158,   2]     0.777778+
   77   78  358 [   7,   1]     0.684623
   76   78  421 [   8,   1]     0.641526
   75    9  290 [ 186,   2]     0.639336+
   74    0    2 [ 160,   1]     0.555232
   73    0   76 [ 161,   1]     0.553958
   72    0  349 [ 162,   1]     0.552695
   71    9  271 [ 188,   1]     0.523309
   70    9  152 [ 189,   3]     0.498825+
   69    9  326 [ 192,   1]     0.346199
   68    0  164 [ 163,   2]     0.192128+
   67    0  274 [ 165,   1]     0.182991
   66    9  406 [ 193,   1]     0.172771
   65    9  165 [ 194,   3]     0.123440+
   64    9   78 [ 197,   9]     0.045455+
rockMerge: terminated with 63 clusters
 data: x 
 beta: 0.27 
theta: 0.73 
  fun: dist 
 args: list(method = "binary") 
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
166   1   1 206   3   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
 21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40 
  1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
 41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 
  1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
 61  62  63 
  1   1   1 
dropping 60 clusters
computing distances ...
computing classes ...
            
               1   4   5 <NA>
  democrat    22 201   3   41
  republican 144   5   0   19
Clustering:
computing distances ...
computing links ...
computing clusters ...
rockMerge: terminated with 31 clusters
 data: x 
 beta: 0.2 
theta: 0.8 
  fun: dist 
 args: list(method = "binary") 
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
 47 210 224  96  28  66  34 136  23  20   6  27  17   1   9  17   5   3   9   7 
 21  22  23  24  25  26  27  28  29  30  31 
  1   1   4   2   1   1   1   1   1   1   1 
dropping 10 clusters
computing distances ...
computing classes ...
           
               1    2    3    4    5    6    7    8    9   10   11   12   13
  edible     288    0 1728  768  192  512    0    0  187    0    0    0  188
  poisonous    0 1728    0    0    0    0  288 1295    0  190   32  256    0
           
              15   16   17   18   19   20   23   24 <NA>
  edible      90   96   16   19    0   48    0   12   64
  poisonous    0    0    0    0   72    0   34    0   21
Clustering:
computing distances ...
computing links ...
computing clusters ...
rockMerge: terminated with 7 clusters
 data: x 
 beta: 0.25 
theta: 0.75 
  fun: gdist 
 args: NULL 
 1  2  3  4  5  6  7 
88  1  1  7  1  1  1 

cba documentation built on May 29, 2017, 10:32 p.m.

Search within the cba package
Search all R packages, documentation and source code