text_remove_words: Remove words before analysis

Description Usage Arguments Value Examples

View source: R/text_remove_words.R

Description

After unnesting the words from the full free-text column, you may want to filter out certain groups of words: stopwords like 'and' or 'the', profanity, number-only words, or a list of custom words. This function lets you do this easily.

Usage

1
2
3
4
5
6
7
8
text_remove_words(
  unnest_data,
  word_col = "word",
  stopwords = TRUE,
  profanity = TRUE,
  number_only = TRUE,
  custom_words = c("")
)

Arguments

unnest_data

dataframe with unnested free text data (a row per word, eg. as prepared by tidytext::unnest_tokens)

word_col

column name containing word tokens

stopwords

do you want to remove stopwords? TRUE/FALSE

profanity

do you want to remove profanity? TRUE/FALSE Default profanity list here: https://www.cs.cmu.edu/~biglou/resources/bad-words.txt

number_only

do you want to remove "words" that are only numbers? These are usually years or phone numbers. TRUE/FALSE

custom_words

do you have a custom list of words to remove? Must be entered as a string vector.

Value

dataframe with rows filtered out

Examples

1
text_remove_words(data.frame(doc_id = c(1, 2, 3, 4), word = c('1', 'test', 'the', 'function')), word_col = 'word')

DataS-DHSC/consultations documentation built on Jan. 28, 2022, 1:56 a.m.