hierarchical_cluster: Fit a Hierarchical Cluster

Description Usage Arguments Value Examples

Description

Fit a hierarchical cluster to text data. Prior to distance measures being calculated the tf-idf (see weightTfIdf) is applied to the DocumentTermMatrix. Cosine dissimilarity is used to generate the distance matrix supplied to hclust. method defaults to "ward.D2". A faster cosine dissimilarity calculation is used under the hood (see cosine_distance). Additionally, hclust is used to quickly calculate the fit. Essentially, this is a wrapper function optimized for clustering text data.

Usage

1
2
3
4
hierarchical_cluster(x, method = "ward.D2", ...)

## S3 method for class 'data_store'
hierarchical_cluster(x, method = "ward.D", ...)

Arguments

x

A data type (e.g., DocumentTermMatrix or TermDocumentMatrix).

method

The agglomeration method to be used. This must be (an unambiguous abbreviation of) one of "single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid", or "median".

...

ignored.

Value

Returns an object of class "hclust".

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
library(dplyr)

x <- with(
    presidential_debates_2012,
    data_store(dialogue, paste(person, time, sep = "_"))
)

hierarchical_cluster(x) %>%
    plot(k=4)

hierarchical_cluster(x) %>%
    plot(h=.7, lwd=2)

hierarchical_cluster(x) %>%
    assign_cluster(h=.7)

hierarchical_cluster(x, method="complete") %>%
    plot(k=6)

hierarchical_cluster(x) %>%
    assign_cluster(k=6)

x2 <- presidential_debates_2012 %>%
    with(data_store(dialogue))

myfit2 <- hierarchical_cluster(x2)

plot(myfit2)
plot(myfit2, 55)

assign_cluster(myfit2, k = 55)

trinker/hclustext documentation built on May 31, 2019, 8:50 p.m.