View source: R/clustering_similarity.R
cluster_docs | R Documentation |
This function clusters documents using K-means based on their TF-IDF vectors.
cluster_docs(
text_data,
text_column = "abstract",
n_clusters = 5,
min_term_freq = 2,
max_doc_freq = 0.9,
random_seed = 42
)
text_data |
A data frame containing text data. |
text_column |
Name of the column containing text to analyze. |
n_clusters |
Number of clusters to create. |
min_term_freq |
Minimum frequency for a term to be included. |
max_doc_freq |
Maximum document frequency (as a proportion) for a term to be included. |
random_seed |
Seed for random number generation (for reproducibility). |
A data frame with the original data and cluster assignments.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.