Description Usage Arguments Details Value Author(s) See Also Examples
Similarity of two data sets is compared with a method using any of clustering comparison metrics: Adjusted Rand Index (ARI), Fowlkes-Mallows index(FM), Jaccard Index (J), or Variation of Information index (VI).
1 | dsClustCompare(data1, data2)
|
data1 |
A |
data2 |
A |
The function compares data stored in data1
with data2
by first performing partitioning around medoids (PAM)
clustering on data1
.
Instances from data2
are than assigned to the cluster with the closest medoid.
In second step PAM clustering is performed on data2
and instances from data1
are assigned to the clusters with closest medoids.
The procedure gives us two clusterings on the same instances which we can compare using any of ARI, FM, J, or VI.
The higher the value of ARI/FM/J the more similar are the two data sets, and reverse is true for VI, where two perfectly matching partitions
produce 0 score.
For random clustering ARI returns a value around zero (negative values are possible) and for perfectly matching clustering ARI is 1.
FM and J values are strictly in [0, 1].
The method returns a value of a list containing ARI and/or FM, depending on the parameters.
Marko Robnik-Sikonja
1 2 3 4 5 6 7 8 9 10 | # use iris data set
# create RBF generator
irisGenerator<- rbfDataGen(Species~.,iris)
# use the generator to create new data
irisNew <- newdata(irisGenerator, size=200)
# compare ARI computed on clustering with original and new data
dsClustCompare(iris, irisNew)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.