mergeTermClusters: Merge terms with a high mutual direction conditional...

Description Usage Arguments Value

Description

Merge terms that are likely to occur together. Specifically, terms of which the conditinal probability is higher than min.similarity in both directions. Or: P(A|B) >= min.similarity & P(B|A) >= min.similarity) Not that this is not always a good thing. Unrelated terms that always occur together will also be merged. Whether this makes sense depends on the type of analysis.

Usage

1
mergeTermClusters(m, min.similarity = 0.95, max.label_length = 3)

Arguments

m

A sparse matrix in which columns are terms. Can be a DocumentTermMatrix class from the tm package

min.similarity

The minimum conditional probability. The conditional probability of two terms in both directions needs to be higher than min.similarity for terms to be merged

max.label_length

Terms that are merged together will be collapsed into a single label. To prevent very long labels, this is cut of from the [max.label_length] term.

Value

a matrix (or document term matrix)


kasperwelbers/semnet documentation built on May 20, 2019, 7:38 a.m.