token_eyes: Tokenization and sentiment analysis.
In tomathon-io/wordly: Natural Language Processing (NLP) functionality and modeling

Description Usage Arguments Examples

View source: R/wordly_functions.R

Allows for creating one-word tokens, with options for sentiment and stop-word application. A wrapper for tokenizers::tokenize_words().

token_eyes(df, text_col_name = NULL, stop_word_src = NULL,
  sentiment_src = NULL, to_lower = TRUE, to_strip_punct = TRUE,
  to_strip_numeric = FALSE, remove_empty_tokens = TRUE,
  show_verbose = FALSE)

`df`	An input data.frame or tibble.
`text_col_name`	The name of the column containing text to be tokenized.
`stop_word_src`	The stop-word source. Either a vector of custom stopwords, or one of pre-built "snowball", "stopwords-iso", "misc", or "smart".
`sentiment_src`	The sentiment source. Ex: "nrc".
`to_lower`	Should tokens be forced to lowercase? Defaults to TRUE.
`to_strip_punct`	Should punctuation be removed prior to tokenization? Defaults to TRUE.
`to_strip_numeric`	Should numeric values be removed prior to tokenization? Defaults to FALSE.
`remove_empty_tokens`	Should empty-space tokens be removed after tokenization? Defaults to TRUE.
`show_verbose`	Should verbose be applied? Defaults to FALSE.

ori_dat <- data.frame(doc_main = rep(c("Book_A", "Book_B", "Book_C"), each = 10), doc_sub = rep(c("Chp_1", "Chp_2"), each = 5), doc_line = rep(1:10, 3), doc_text = stringr::sentences[1:30], stringsAsFactors = FALSE)
ret_nsns <- ori_dat %>% token_eyes("doc_text")   # no stopwords, no sentiment
ret_ysns <- ori_dat %>% token_eyes(text_col_name = "doc_text", stop_word_src = "smart" )   # yes stopwords, no sentiment
ret_ysys <- ori_dat %>% token_eyes(text_col_name = "doc_text", stop_word_src = "smart", sentiment_src = "nrc" )   # yes stopwords, yes sentiment