Description Usage Arguments See Also Examples
Tokenising, Lemmatising, Tagging and Dependency Parsing of raw text using udpipe as a backend. Counts the number of words by each level in a categorical variable. Either plots or returns the tokenised df
1 2 3 | word_freqs_by_category(df, text_col, categories_col,
number_of_words_to_plot = 10, plot = TRUE, grammer_phrase = "NOUN",
word_type = lemma)
|
df |
a data.frame or a tibble/tribble |
text_col |
the name of the text column within df |
categories_col |
the name of the factor/categorical column to calculate the words in each category or level |
number_of_words_to_plot |
how many words/terms to plot within each level of categories_col? Plots Top 10 words in each category by default |
plot |
return a ggplot2? or get a tokenised/lemmatised df created by udpipe model. TRUE by default |
grammer_phrase |
what to filter on? Possible options include all the universal parts of speech tags such as noun, verb, adj, pron, aux, num etc. more info here: https://polyglot.readthedocs.io/en/latest/POS.html |
word_type |
tokens or lemmas to plot |
1 2 3 4 5 6 7 | ## Not run:
data("text_data")
word_freqs_by_category(verbatim,text_col = text, categories_col = NPS_RATING)
word_freqs_by_category(verbatim,text_col = text, categories_col = NPS_RATING, word_type = token)
word_freqs_by_category(verbatim,text,Qtr,number_of_words_to_plot = 20)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.