calc_doc_sim: Calculate document similarity using TF-IDF and cosine...

View source: R/clustering_similarity.R

calc_doc_simR Documentation

Calculate document similarity using TF-IDF and cosine similarity

Description

This function calculates the similarity between documents using TF-IDF weighting and cosine similarity.

Usage

calc_doc_sim(
  text_data,
  text_column = "abstract",
  min_term_freq = 2,
  max_doc_freq = 0.9
)

Arguments

text_data

A data frame containing text data.

text_column

Name of the column containing text to analyze.

min_term_freq

Minimum frequency for a term to be included.

max_doc_freq

Maximum document frequency (as a proportion) for a term to be included.

Value

A similarity matrix for the documents.


LBDiscover documentation built on June 16, 2025, 5:09 p.m.