Description Usage Arguments Value Examples
View source: R/text_remove_words.R
After unnesting the words from the full free-text column, you may want to filter out certain groups of words: stopwords like 'and' or 'the', profanity, number-only words, or a list of custom words. This function lets you do this easily.
1 2 3 4 5 6 7 8 | text_remove_words(
unnest_data,
word_col = "word",
stopwords = TRUE,
profanity = TRUE,
number_only = TRUE,
custom_words = c("")
)
|
unnest_data |
dataframe with unnested free text data (a row per word, eg. as prepared by tidytext::unnest_tokens) |
word_col |
column name containing word tokens |
stopwords |
do you want to remove stopwords? TRUE/FALSE |
profanity |
do you want to remove profanity? TRUE/FALSE Default profanity list here: https://www.cs.cmu.edu/~biglou/resources/bad-words.txt |
number_only |
do you want to remove "words" that are only numbers? These are usually years or phone numbers. TRUE/FALSE |
custom_words |
do you have a custom list of words to remove? Must be entered as a string vector. |
dataframe with rows filtered out
1 | text_remove_words(data.frame(doc_id = c(1, 2, 3, 4), word = c('1', 'test', 'the', 'function')), word_col = 'word')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.