term_per_cluster | R Documentation |
This function processes the results of a document clustering algorithm based on the Reinert method. It computes the terms and their significance for each cluster, as well as the associated document segments.
term_per_cluster(res, cutree = NULL, k = 1, negative = TRUE)
res |
A list containing the results of the Reinert clustering algorithm. Must include at least |
cutree |
A custom cutree structure. If |
k |
A vector of integers specifying the clusters to analyze. Default is |
negative |
Logical. If |
The function integrates document-term matrix rows for missing segments, calculates term statistics for each cluster,
and filters terms based on their significance. Terms can be excluded based on their significance (signExcluded
).
A list with the following components:
terms |
A data frame of significant terms for each cluster. Columns include:
|
segments |
A data frame of document segments associated with each cluster. Columns include:
|
data(mobydick)
res <- reinert(
x = mobydick,
k = 10,
term = "token",
segment_size = 40,
min_segment_size = 5,
min_split_members = 10,
cc_test = 0.3,
tsj = 3
)
tc <- term_per_cluster(res, cutree = NULL, k = 1:10, negative = FALSE)
head(tc$segments, 10)
head(tc$terms, 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.