text_unnest_remove_stem_words: Pre-clean consultation responses

Description Usage Arguments Details Value Examples

View source: R/text_unnest_remove_stem_words.R

Description

Takes a dataframe with a free-text column as input. Unnests words from the free text, removes stopwords, profanity, and number-only words, removes an optional custom list of words, and stems words (using hunspell) so they can be grouped better.

Usage

1
2
3
4
5
6
7
text_unnest_remove_stem_words(
  data,
  text_col,
  token = "words",
  custom_words = c(""),
  ...
)

Arguments

data

dataframe of responses

text_col

name of column containing free text to be analysed

token

what text entity to be analysed? by default word-by-word, but other options are defined in tidytext::unnest_tokens

Details

Note that, although you may choose a token other than "words" (like "sentences"), the word removal and stemming expects word tokens only.

Value

dataframe with stemmed, unnested words with removed words

Examples

1

DataS-DHSC/consultations documentation built on Jan. 28, 2022, 1:56 a.m.