plotRemoved | R Documentation |
A plot function which shows the results of using different thresholds in
prepDocuments
on the size of the corpus.
plotRemoved(documents, lower.thresh)
documents |
The documents to be used for the stm model |
lower.thresh |
A vector of integers, each of which will be tested as a lower threshold for the prepDocuments function. |
For a lower threshold, prepDocuments
will drop words which appear in
fewer than that number of documents, and remove documents which contain no
more words. This function allows the user to pass a vector of lower
thresholds and observe how prepDocuments
will handle each threshold.
This function produces three plots, showing the number of words, the number
of documents, and the total number of tokens removed as a function of
threshold values. A dashed red line is plotted at the total number of
documents, words and tokens respectively.
Invisibly returns a list of
lower.thresh |
The sorted threshold values |
ndocs |
The number of documents dropped for each value of the lower threshold |
nwords |
The number of entries of the vocab dropped for each value of the lower threshold. |
ntokens |
The number of tokens dropped for each value of the lower threshold. |
prepDocuments
plotRemoved(poliblog5k.docs, lower.thresh=seq(from = 10, to = 1000, by = 10))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.