View source: R/data_preprocessing.R
preprocess_text | R Documentation |
This function performs multi-stage text preprocessing, including lowercasing, HTML cleaning, punctuation normalization, contraction expansion, internet slang replacement, emoticon replacement, and final standardization.
preprocess_text(text, use_textclean = TRUE, custom_slang = NULL)
text |
A character vector of input texts. |
use_textclean |
Logical. Whether to use |
custom_slang |
A named character vector providing user-defined slang mappings. Optional. |
The preprocessing pipeline includes:
Lowercasing the text.
Replacing HTML entities and non-ASCII characters.
Expanding common English contractions (e.g., "I'm" -> "I am").
Replacing internet slang and emoticons if use_textclean
is TRUE
.
Handling additional slang defined by the user.
Normalizing repeated punctuations and whitespace.
A character vector of cleaned and normalized text.
preprocess_text("I'm feeling lit rn!!!")
preprocess_text("I can't believe it... lol :)", use_textclean = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.