View source: R/preprocess_pipeline.R
preprocess.removeNonWordChars | R Documentation |
This function preprocesses a character vector by removing non-word characters and reports the mean number of characters before and after preprocessing.
preprocess.removeNonWordChars( text, rm.hashtags = FALSE, rm.mentions = FALSE, rm.emoji = FALSE, rm.digitwords = FALSE, join.hyphenation = FALSE )
text |
A character vector that will be preprocessed. |
rm.hashtags |
A logical, defining if #hashtags should be removed. |
rm.mentions |
A logical, defining if @mentions should be removed. |
rm.emoji |
A logical, defining if emoji should be removed. |
rm.digitwords |
A logical, defining if all digits should be removed, including digitwords (e.g. 5G, T3, etc.) |
join.hyphenation |
A logical, defining if hyphenated words should be joined. |
By default URLs, html-entities ( ), digits-words, apostrophized words, and all punctuation are removed.
Other preprocessing steps can be controlled via the arguments of the function.
A preprocessed character vector.
## Not run: preprocess.removeNonWordChars( text, rm.hashtags=FALSE, rm.mentions=FALSE, rm.emoji=FALSE, rm.digitwords=FALSE, join.hyphenation=FALSE) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.