dissimilarity | R Documentation |
Provides the generic function dissimilarity()
and the methods to
compute and returns distances for binary data in a matrix
,
transactions or associations which
can be used for grouping and clustering. See Hahsler (2016) for an
introduction to distance-based clustering of association rules.
dissimilarity(x, y = NULL, method = NULL, args = NULL, ...)
## S4 method for signature 'matrix'
dissimilarity(x, y = NULL, method = NULL, args = NULL)
## S4 method for signature 'itemMatrix'
dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "transactions")
## S4 method for signature 'associations'
dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "associations")
x |
the set of elements (e.g., |
y |
|
method |
the distance measure to be used. Implemented measures are
(defaults to
|
args |
a list of additional arguments for the methods. |
... |
further arguments. |
which |
a character string indicating if the dissimilarity should be
calculated between transactions/associations (default) or items (use |
returns an object of class dist
.
Michael Hahsler
Aggarwal, C.C., Cecilia Procopiuc, and Philip S. Yu. (2002) Finding localized associations in market basket data. IEEE Trans. on Knowledge and Data Engineering 14(1):51–62.
Dice, L. R. (1945) Measures of the amount of ecologic association between species. Ecology 26, pages 297–302.
Gupta, G., Strehl, A., and Ghosh, J. (1999) Distance based clustering of association rules. In Intelligent Engineering Systems Through Artificial Neural Networks (Proceedings of ANNIE 1999), pages 759-764. ASME Press.
Hahsler, M. (2016) Grouping association rules using lift. In C. Iyigun, R. Moghaddess, and A. Oztekin, editors, 11th INFORMS Workshop on Data Mining and Decision Analytics (DM-DA 2016).
Sneath, P. H. A. (1957) Some thoughts on bacterial classification. Journal of General Microbiology 17, pages 184–200.
Sokal, R. R. and Michener, C. D. (1958) A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin 38, pages 1409–1438.
Toivonen, H., Klemettinen, M., Ronkainen, P., Hatonen, K. and Mannila H. (1995) Pruning and grouping discovered association rules. In Proceedings of KDD'95.
Other proximity classes and functions:
affinity()
,
predict()
,
proximity-classes
## cluster items in Groceries with support > 5%
data("Groceries")
s <- Groceries[, itemFrequency(Groceries) > 0.05]
d_jaccard <- dissimilarity(s, which = "items")
plot(hclust(d_jaccard, method = "ward.D2"), main = "Dendrogram for items")
## cluster transactions for a sample of Adult
data("Adult")
s <- sample(Adult, 500)
## calculate Jaccard distances and do hclust
d_jaccard <- dissimilarity(s)
hc <- hclust(d_jaccard, method = "ward.D2")
plot(hc, labels = FALSE, main = "Dendrogram for Transactions (Jaccard)")
## get 20 clusters and look at the difference of the item frequencies (bars)
## for the top 20 items) in cluster 1 compared to the data (line)
assign <- cutree(hc, 20)
itemFrequencyPlot(s[assign == 1], population = s, topN = 20)
## calculate affinity-based distances between transactions and do hclust
d_affinity <- dissimilarity(s, method = "affinity")
hc <- hclust(d_affinity, method = "ward.D2")
plot(hc, labels = FALSE, main = "Dendrogram for Transactions (Affinity)")
## cluster association rules
rules <- apriori(Adult, parameter = list(support = 0.3))
rules <- subset(rules, subset = lift > 2)
## use affinity to cluster rules
## Note: we need to supply the transactions (or affinities) from the
## dataset (sample).
d_affinity <- dissimilarity(rules, method = "affinity",
args = list(transactions = s))
hc <- hclust(d_affinity, method = "ward.D2")
plot(hc, main = "Dendrogram for Rules (Affinity)")
## create 4 groups and inspect the rules in the first group.
assign <- cutree(hc, k = 3)
inspect(rules[assign == 1])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.