fuzzydocs | R Documentation |
Occurence of three terms (neural networks, fuzzy, and image) in 30 documents retrieved from a Japanese article data base on fuzzy theory and systems.
data("fuzzy_docs")
fuzzy_docs
is a list of 30 fuzzy multisets, representing the
occurrence of the terms “neural networks”, “fuzzy”, and
“image” in each document. Each term appears with up to
three membership values representing weights,
depending on whether the term occurred
in the abstract (0.2), the keywords section (0.6), and/or the title
(1). The first 12 documents concern neural networks, the remaining 18
image processing. In the reference, various clustering methods have
been employed to recover the two groups in the data set.
K. Mizutani, R. Inokuchi, and S. Miyamoto (2008), Algorithms of Nonlinear Document Clustering Based on Fuzzy Multiset Model, International Journal of Intelligent Systems, 23, 176–198.
data(fuzzy_docs)
## compute distance matrix using Jaccard dissimilarity
d <- as.dist(set_outer(fuzzy_docs, gset_dissimilarity))
## apply hierarchical clustering (Ward method)
cl <- hclust(d, "ward")
## retrieve two clusters
cutree(cl, 2)
## -> clearly, the clusters are formed by docs 1--12 and 13--30,
## respectively.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.