vec_preprocess: Vectorized preprocessing of text

View source: R/clustering_similarity.R View source: R/performance_optimalization.R

vec_preprocessR Documentation

Vectorized preprocessing of text

Description

This function preprocesses text data using vectorized operations for better performance.

This function preprocesses text data using vectorized operations for better performance.

Usage

vec_preprocess(
  text_data,
  text_column = "abstract",
  remove_stopwords = TRUE,
  custom_stopwords = NULL,
  min_word_length = 3,
  max_word_length = 50,
  chunk_size = 100
)

vec_preprocess(
  text_data,
  text_column = "abstract",
  remove_stopwords = TRUE,
  custom_stopwords = NULL,
  min_word_length = 3,
  max_word_length = 50,
  chunk_size = 100
)

Arguments

text_data

A data frame containing text data.

text_column

Name of the column containing text to process.

remove_stopwords

Logical. If TRUE, removes stopwords.

custom_stopwords

Character vector of additional stopwords to remove.

min_word_length

Minimum word length to keep.

max_word_length

Maximum word length to keep.

chunk_size

Number of documents to process in each chunk.

Value

A data frame with processed text.

A data frame with processed text.


LBDiscover documentation built on June 16, 2025, 5:09 p.m.