reinSummary: Summarize Reinert Clustering Results
In tall: Text Analysis for All

View source: R/reinert.R

reinSummary

R Documentation

Summarize Reinert Clustering Results

Description

This function summarizes the results of the Reinert clustering algorithm, including the most frequent documents and significant terms for each cluster. The input is the result returned by the term_per_cluster function.

Usage

reinSummary(tc, n = 10)

Arguments

tc

A list returned by the term_per_cluster function. The list includes:

segments: A data frame with segments information, including cluster and doc_id.
terms: A data frame with terms information, including cluster, sign, chi_square, and term.

n

Integer. The number of top terms (based on Chi-squared value) to include in the summary for each cluster and sign. Default is 10.

Details

This function performs the following steps:

Extracts the most frequent document for each cluster.
Summarizes the number of documents per cluster.
Selects the top n terms for each cluster, separated by positive and negative signs.
Combines the terms and segment information into a final summary table.

Value

A data frame summarizing the clustering results. The table includes:

cluster: The cluster ID.
Positive terms: The top n positive terms for each cluster, concatenated into a single string.
Negative terms: The top n negative terms for each cluster, concatenated into a single string.
Most frequent document: The document ID that appears most frequently in each cluster.
N. of Documents per Cluster: The number of documents in each cluster.

Examples


data(mobydick)
res <- reinert(
  x = mobydick,
  k = 10,
  term = "token",
  segment_size = 40,
  min_segment_size = 5,
  min_split_members = 10,
  cc_test = 0.3,
  tsj = 3
)

tc <- term_per_cluster(res, cutree = NULL, k = 1:10, negative = FALSE)

S <- reinSummary(tc, n = 10)

head(S, 10)

tall documentation built on June 8, 2025, 11:08 a.m.