PreprocessText | R Documentation |
This provides some elementary preprocessing for a read character vector such as lowercasing and bag-of-words normalization. The bow normalization step substitutes each element of the vector with a numeric value (its ID). This can be quite useful in non-ASCII texts or texts containing words with boundary symbols where the regular expression can fail.
PreprocessText(text, lower = FALSE, bow = TRUE)
text |
A character vector. This contains the text as returned
by |
lower |
Boolean. Whether or not to lowercase all words. |
bow |
Boolean. Whether or not to substitute each word with an ID tag (useful for non-ASCII texts) |
A character vector.
tolower
txt <- c("This", "is", "a", "Sentence", "containing", "UPPERCASE", "lowercase", "and", "sy.mb'ols")
txt.norm <- PreprocessText(txt, lower = TRUE, bow = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.