hierarchical_cluster: Fit a Hierarchical Cluster
In trinker/clustext: Consistent Clustering for Text Data

Description Usage Arguments Value Examples

Fit a hierarchical cluster to text data. Prior to distance measures being calculated the tf-idf (see weightTfIdf) is applied to the DocumentTermMatrix. Cosine dissimilarity is used to generate the distance matrix supplied to hclust. method defaults to "ward.D2". A faster cosine dissimilarity calculation is used under the hood (see cosine_distance). Additionally, hclust is used to quickly calculate the fit. Essentially, this is a wrapper function optimized for clustering text data.

hierarchical_cluster(x, distance = "cosine", method = "ward.D2", ...)

## S3 method for class 'data_store'
hierarchical_cluster(x, distance = "cosine",
  method = "ward.D", ...)

`x`	A data store object (see `data_store`).
`distance`	A distance measure ("cosine" or "jaccard").
`method`	The agglomeration method to be used. This must be (an unambiguous abbreviation of) one of `"single"`, `"complete"`, `"average"`, `"mcquitty"`, `"ward.D"`, `"ward.D2"`, `"centroid"`, or `"median"`.
`...`	ignored.

Returns an object of class "hclust".

library(dplyr)

x <- with(
    presidential_debates_2012,
    data_store(dialogue, paste(person, time, sep = "_"))
)

hierarchical_cluster(x) %>%
    plot(k=4)

hierarchical_cluster(x) %>%
    plot(h=.7, lwd=2)

hierarchical_cluster(x) %>%
    assign_cluster(h=.7)

## Not run: 
## interactive cutting
hierarchical_cluster(x) %>%
    plot(h=TRUE)

## End(Not run)

hierarchical_cluster(x, method="complete") %>%
    plot(k=6)

hierarchical_cluster(x) %>%
    assign_cluster(k=6)

x2 <- presidential_debates_2012 %>%
    with(data_store(dialogue))

myfit2 <- hierarchical_cluster(x2)

plot(myfit2)
plot(myfit2, 55)

assign_cluster(myfit2, k = 55)

## Example from StackOverflow Question Response
## Asking fo grouping similar texts together
## http://stackoverflow.com/q/22936951/1000343
dat <- data.frame(
    person = LETTERS[1:3],
    text = c("Best way to waste money",
    "Amazing stuff. lets you stay connected all the time",
    "Instrument to waste money and time"),
    stringsAsFactors = FALSE
)


x <- with(
    dat,
    data_store(text, person)
)


hierarchical_cluster(x) %>%
    plot(h=.9, lwd=2)

hierarchical_cluster(x) %>%
    assign_cluster(h=.9)


hierarchical_cluster(x) %>%
    assign_cluster(h=.9) %>%
    get_terms()

hierarchical_cluster(x) %>%
    assign_cluster(h=.9) %>%
    get_terms() %>%
    as_topic()

hierarchical_cluster(x) %>%
    assign_cluster(h=.9) %>%
    get_documents()