hierarchical_cluster: Fit a Hierarchical Cluster

Description Usage Arguments Value Examples

Description

Fit a hierarchical cluster to text data. Prior to distance measures being calculated the tf-idf (see weightTfIdf) is applied to the DocumentTermMatrix. Cosine dissimilarity is used to generate the distance matrix supplied to hclust. method defaults to "ward.D2". A faster cosine dissimilarity calculation is used under the hood (see cosine_distance). Additionally, hclust is used to quickly calculate the fit. Essentially, this is a wrapper function optimized for clustering text data.

Usage

1
2
3
4
5
hierarchical_cluster(x, distance = "cosine", method = "ward.D2", ...)

## S3 method for class 'data_store'
hierarchical_cluster(x, distance = "cosine",
  method = "ward.D", ...)

Arguments

x

A data store object (see data_store).

distance

A distance measure ("cosine" or "jaccard").

method

The agglomeration method to be used. This must be (an unambiguous abbreviation of) one of "single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid", or "median".

...

ignored.

Value

Returns an object of class "hclust".

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
library(dplyr)

x <- with(
    presidential_debates_2012,
    data_store(dialogue, paste(person, time, sep = "_"))
)

hierarchical_cluster(x) %>%
    plot(k=4)

hierarchical_cluster(x) %>%
    plot(h=.7, lwd=2)

hierarchical_cluster(x) %>%
    assign_cluster(h=.7)

## Not run: 
## interactive cutting
hierarchical_cluster(x) %>%
    plot(h=TRUE)

## End(Not run)

hierarchical_cluster(x, method="complete") %>%
    plot(k=6)

hierarchical_cluster(x) %>%
    assign_cluster(k=6)

x2 <- presidential_debates_2012 %>%
    with(data_store(dialogue))

myfit2 <- hierarchical_cluster(x2)

plot(myfit2)
plot(myfit2, 55)

assign_cluster(myfit2, k = 55)

## Example from StackOverflow Question Response
## Asking fo grouping similar texts together
## http://stackoverflow.com/q/22936951/1000343
dat <- data.frame(
    person = LETTERS[1:3],
    text = c("Best way to waste money",
    "Amazing stuff. lets you stay connected all the time",
    "Instrument to waste money and time"),
    stringsAsFactors = FALSE
)


x <- with(
    dat,
    data_store(text, person)
)


hierarchical_cluster(x) %>%
    plot(h=.9, lwd=2)

hierarchical_cluster(x) %>%
    assign_cluster(h=.9)


hierarchical_cluster(x) %>%
    assign_cluster(h=.9) %>%
    get_terms()

hierarchical_cluster(x) %>%
    assign_cluster(h=.9) %>%
    get_terms() %>%
    as_topic()

hierarchical_cluster(x) %>%
    assign_cluster(h=.9) %>%
    get_documents()

trinker/clustext documentation built on May 31, 2019, 8:41 p.m.