word_frequencies_by_category: Return top n word frequencies in a collection of text...

Description Usage Arguments See Also Examples

Description

Splits the text verbatim into words, count the number of words by each level in a categorical variable. Either plots or returns the top n words in each level of categorical variable in a whole dataset or collection of text documents.

Usage

1
2
3
word_frequencies_by_category(df, text_col, categories_col,
  number_of_words = 1, number_of_words_to_plot = 10, plot = TRUE,
  clean_text = FALSE)

Arguments

df

a tidy dataframe/tribble.

text_col

the name of the text column within df

categories_col

the name of the factor/categorical column to calculate the words in each category or level

number_of_words

return a plot/df of single, bigram or trigrams within each category? returns single words in each category by default

number_of_words_to_plot

how many words/terms to plot within each level of categories_col? Plots Top 10 words in each category by default

plot

return a ggplot2? TRUE by default

clean_text

pre-process text? FALSE by default Lammatizes and get rid of extra spaces before and words before counting

See Also

word_frequencies

Examples

1
2
3
4
5
6
7
8
## Not run: 
data("text_data")
word_frequencies_by_category(verbatim,text_col = text,categories_col = Qtr)
word_frequencies_by_category(verbatim,text_col = text,categories_col = Qtr,clean_text = TRUE)
word_frequencies_by_category(verbatim,text,Qtr,number_of_words = 2,number_of_words_to_plot = 20)
word_frequencies_by_category(verbatim,text,Qtr,clean_text = TRUE,number_of_words = 3)

## End(Not run)

fahadshery/textsummary documentation built on May 6, 2019, 7:02 p.m.