preprocess_text: Preprocess Text with Slang Handling

View source: R/data_preprocessing.R

preprocess_textR Documentation

Preprocess Text with Slang Handling

Description

This function performs multi-stage text preprocessing, including lowercasing, HTML cleaning, punctuation normalization, contraction expansion, internet slang replacement, emoticon replacement, and final standardization.

Usage

preprocess_text(text, use_textclean = TRUE, custom_slang = NULL)

Arguments

text

A character vector of input texts.

use_textclean

Logical. Whether to use textclean for internet slang and emoticon replacement. Default is TRUE.

custom_slang

A named character vector providing user-defined slang mappings. Optional.

Details

The preprocessing pipeline includes:

  • Lowercasing the text.

  • Replacing HTML entities and non-ASCII characters.

  • Expanding common English contractions (e.g., "I'm" -> "I am").

  • Replacing internet slang and emoticons if use_textclean is TRUE.

  • Handling additional slang defined by the user.

  • Normalizing repeated punctuations and whitespace.

Value

A character vector of cleaned and normalized text.

Examples

preprocess_text("I'm feeling lit rn!!!")
preprocess_text("I can't believe it... lol :)", use_textclean = TRUE)


text2emotion documentation built on June 8, 2025, 1:04 p.m.