JSTOR_freqwords: Plot the most frequent words by time intervals

Description Usage Arguments Value Examples

Description

Generates a plot of the top n words in all the documents in ranges of years. For use with JSTOR's Data for Research datasets (http://dfr.jstor.org/). For best results, repeat the function several times after adding common words to the stopword list and excluding them by re-running the JSTOR_dtmofnouns function. The location of the English stopwords list can be found by entering this at the R prompt: paste0(.libPaths()[1], "/tm/stopwords/english.dat")

Usage

1
2
JSTOR_freqwords(unpack1grams, nouns, custom_stopwords = NULL, n = 5,
  lowfreq = 300, topn = 20, biggest = 10)

Arguments

unpack1grams

object returned by the function JSTOR_unpack1grams.

nouns

the object returned by the function JSTOR_dtmofnouns. A Document Term Matrix of nouns.

custom_stopwords

character vector of stop words to use in addition to the default set supplied by the tm package

n

the number years to aggregate documents by. For example, n = 5 (the default value) will create groups of all documents published in non-overlapping five year ranges.

lowfreq

An integer for the minimum frequency of a word to be included in the plot. Default is 300.

topn

An integer for the number of top ranking words to plot. For example, topn = 20 (the default value) will plot the top 20 words for each range of years.

biggest

An integer to control the maximum size of the text in the plot

Value

Returns a plot of the most frequent words per year, with word size scaled to frequency (accessed via freqwords$plot$plot, yes twice), and a dataframe with words and counts for each year range (accessed via freqwords$freqterms).

Examples

1
2
## freqwords <- JSTOR_freqwords(unpack1grams, nouns, n = 2, biggest = 5, lowfreq = 100, topn = 5)
## freqwords <- JSTOR_freqwords(unpack1grams, nouns)

benmarwick/JSTORr documentation built on May 12, 2019, 12:59 p.m.