clean_text: Clean Input Text

Description Usage Arguments Value Examples

View source: R/clean_text.R

Description

This function cleans text by:
-Setting all text to lowercase
-Removing non-ASCII characters
-Expanding contractions ("don't" –> "do not")
-Removing punctuation
-Removing symbols (if replaceSymbol is FALSE)
-Removing numbers (if replaceNumber is FALSE)

Usage

1
2
3
4
5
6
clean_text(
  inputText,
  replaceSymbol = FALSE,
  replaceNumber = FALSE,
  removeStopwords = FALSE
)

Arguments

inputText

A character string or vector of character strings

replaceSymbol

If TRUE, symbols are replaced with their equivalent (e.g. "@" becomes "at"). Defaults to FALSE.

replaceNumber

If TRUE, numbers are replaced with their equivalent (e.g. "20" becomes "twenty", "3rd" becomes "third"). Defaults to FALSE.

removeStopwords

If TRUE, stopwords are removed (see [stopwords()])

Value

A character string (or vector of character strings) with cleaned text.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
myString = "He gave his last $10 to Sally's sister because she's nice."

cleanText = clean_text(myString)
# "he gave his last to sally sister because she is nice"

cleanText = clean_text(myString, replaceNumber = TRUE)
# "he gave his last ten to sally sister because she is nice"

cleanText = clean_text(myString, replaceSymbol = TRUE)
# "he gave his last dollar to sally sister because she is nice"

nlanderson9/languagePredictR documentation built on June 10, 2021, 11 a.m.