JSTOR_findassocs: Plot the words with the strongest correlation with a given...

Description Usage Arguments Value Examples

Description

Generates a plot of the top n words in all the documents that positively correlate with a given word, in ranges of years. For use with JSTOR's Data for Research datasets (http://dfr.jstor.org/). For best results, repeat the function after adding common words to the stopword list. To learn more about editing the stopword list, see the help for the JSTOR_dtmofnouns function.

Usage

1
2
JSTOR_findassocs(unpack1grams, nouns, word, n = 5, corlimit = 0.4,
  plimit = 0.05, topn = 20, biggest = 5, parallel = FALSE)

Arguments

unpack1grams

object returned by the function JSTOR_unpack1grams.

nouns

the object returned by the function JSTOR_dtmofnouns. A Document Term Matrix containing the documents.

word

The word to calculate the correlations with

n

the number years to aggregate documents by. For example, n = 5 (the default value) will create groups of all documents published in non-overlapping five year ranges. Note that high n values combined with high plimit and corlimit values will severly filter the output. For exploratory data analysis it's recommended to start with low n values and work up.

corlimit

The lower threshold value of the Pearson correlation statistic (default is 0.4).

plimit

The lower threshold value of the Pearson correlation statistic (default is 0.05).

topn

An integer for the number of top ranking words to plot. For example, topn = 20 (the default value) will plot the top 20 words for each range of years.

biggest

An integer to control the maximum size of the text in the plot

parallel

logical. If TRUE attempts to run the function on multiple cores. Note that this may actually be slower if you have one core, limited memory or if the data set is small due to communication of data between the cores.

Value

Returns a plot of the most frequent words per year range, with word size scaled to frequency, and a dataframe with words and counts for each year range

Examples

1
2
3
## findassocs <- JSTOR_findassocs(unpack1grams, nouns, "rouges")
## findassocs <- JSTOR_findassocs(unpack1grams, nouns, n = 10, "pirates", topn = 100)
## findassocs <- JSTOR_findassocs(unpack1grams, nouns, n = 5, "marines", corlimit=0.6, plimit=0.001)

benmarwick/JSTORr documentation built on May 12, 2019, 12:59 p.m.