JSTOR_clusterbywords: Cluster documents by similarities in word frequencies
In benmarwick/JSTORr: Simple Text Mining and Document Clustering of JSTOR Journal Articles

Description Usage Arguments Value Examples

Generates plots visualizing the results of different clustering methods applied to the documents. For use with JSTOR's Data for Research datasets (http://dfr.jstor.org/). For best results, repeat the function several times after adding common words to the stopword list and excluding them using the JSTOR_removestopwords function.

1	JSTOR_clusterbywords(nouns, word, custom_stopwords = NULL, f = 0.01)

`nouns`	the object returned by the function JSTOR_dtmofnouns. A corpus containing the documents with stopwords removed.
`word`	The word or vector of words to subset the documents by, ie. use only documents containing this word (or words) in the cluster analysis
`custom_stopwords`	character vector of stop words to use in addition to the default set supplied by the tm package
`f`	A scalar value to filter the total number of words used in the cluster analyses. For each document, the count of each word is divided by the total number of words in that document, expressing a word's frequency as a proportion of all words in that document. This parameter corresponds to the summed proportions of a word in all documents (ie. the column sum for the document term matrix). If f = 0.01 then only words that constitute at least 1.0 percent of all words in all documents will be used for the cluster analyses.

Returns plots of clusters of documents, and dataframes of affinity propogation clustering, k-means and PCA outputs. The plots can be accessed and displayed using the $ function, for example with: cl1$p or plot(cl1$cl_plot) etc.

1 2	## cl1 <- JSTOR_clusterbywords(nouns, "pirates") ## cl2 <- JSTOR_clusterbywords(nouns, c("pirates", "privateers"))

benmarwick/JSTORr documentation built on May 12, 2019, 12:59 p.m.

benmarwick/JSTORr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

benmarwick/JSTORr
Simple Text Mining and Document Clustering of JSTOR Journal Articles

JSTOR_clusterbywords: Cluster documents by similarities in word frequencies
In benmarwick/JSTORr: Simple Text Mining and Document Clustering of JSTOR Journal Articles

Description

Usage

Arguments

Value

Examples

Related to JSTOR_clusterbywords in benmarwick/JSTORr...

R Package Documentation

Browse R Packages

We want your feedback!

benmarwick/JSTORr Simple Text Mining and Document Clustering of JSTOR Journal Articles

JSTOR_clusterbywords: Cluster documents by similarities in word frequencies In benmarwick/JSTORr: Simple Text Mining and Document Clustering of JSTOR Journal Articles

Description

Usage

Arguments

Value

Examples

Related to JSTOR_clusterbywords in benmarwick/JSTORr...

R Package Documentation

Browse R Packages

We want your feedback!

benmarwick/JSTORr
Simple Text Mining and Document Clustering of JSTOR Journal Articles

JSTOR_clusterbywords: Cluster documents by similarities in word frequencies
In benmarwick/JSTORr: Simple Text Mining and Document Clustering of JSTOR Journal Articles