Description Usage Arguments Value Examples
Generates plots visualizing the results of different clustering methods applied to the documents. For use with JSTOR's Data for Research datasets (http://dfr.jstor.org/). For best results, repeat the function several times after adding common words to the stopword list and excluding them using the JSTOR_removestopwords function.
1 | JSTOR_clusterbywords(nouns, word, custom_stopwords = NULL, f = 0.01)
|
nouns |
the object returned by the function JSTOR_dtmofnouns. A corpus containing the documents with stopwords removed. |
word |
The word or vector of words to subset the documents by, ie. use only documents containing this word (or words) in the cluster analysis |
custom_stopwords |
character vector of stop words to use in addition to the default set supplied by the tm package |
f |
A scalar value to filter the total number of words used in the cluster analyses. For each document, the count of each word is divided by the total number of words in that document, expressing a word's frequency as a proportion of all words in that document. This parameter corresponds to the summed proportions of a word in all documents (ie. the column sum for the document term matrix). If f = 0.01 then only words that constitute at least 1.0 percent of all words in all documents will be used for the cluster analyses. |
Returns plots of clusters of documents, and dataframes of affinity propogation clustering, k-means and PCA outputs. The plots can be accessed and displayed using the $ function, for example with: cl1$p or plot(cl1$cl_plot) etc.
1 2 | ## cl1 <- JSTOR_clusterbywords(nouns, "pirates")
## cl2 <- JSTOR_clusterbywords(nouns, c("pirates", "privateers"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.