vec_preprocess: Vectorized preprocessing of text
In LBDiscover: Literature-Based Discovery Tools for Biomedical Research

View source: R/clustering_similarity.R View source: R/performance_optimalization.R

vec_preprocess

R Documentation

Vectorized preprocessing of text

Description

This function preprocesses text data using vectorized operations for better performance.

Usage

vec_preprocess(
  text_data,
  text_column = "abstract",
  remove_stopwords = TRUE,
  custom_stopwords = NULL,
  min_word_length = 3,
  max_word_length = 50,
  chunk_size = 100
)

vec_preprocess(
  text_data,
  text_column = "abstract",
  remove_stopwords = TRUE,
  custom_stopwords = NULL,
  min_word_length = 3,
  max_word_length = 50,
  chunk_size = 100
)

Arguments

`text_data`	A data frame containing text data.
`text_column`	Name of the column containing text to process.
`remove_stopwords`	Logical. If TRUE, removes stopwords.
`custom_stopwords`	Character vector of additional stopwords to remove.
`min_word_length`	Minimum word length to keep.
`max_word_length`	Maximum word length to keep.
`chunk_size`	Number of documents to process in each chunk.