cluster_docs: Cluster documents using K-means

View source: R/clustering_similarity.R

cluster_docsR Documentation

Cluster documents using K-means

Description

This function clusters documents using K-means based on their TF-IDF vectors.

Usage

cluster_docs(
  text_data,
  text_column = "abstract",
  n_clusters = 5,
  min_term_freq = 2,
  max_doc_freq = 0.9,
  random_seed = 42
)

Arguments

text_data

A data frame containing text data.

text_column

Name of the column containing text to analyze.

n_clusters

Number of clusters to create.

min_term_freq

Minimum frequency for a term to be included.

max_doc_freq

Maximum document frequency (as a proportion) for a term to be included.

random_seed

Seed for random number generation (for reproducibility).

Value

A data frame with the original data and cluster assignments.


LBDiscover documentation built on June 16, 2025, 5:09 p.m.