Description Usage Arguments Value Examples
View source: R/topic_modeling_utilities.R
Function extracts probable terms from a set of documents. Probable here implies more probable than in a corpus overall.
1 | GetProbableTerms(docnames, dtm, p_terms = NULL)
|
docnames |
A character vector of rownames of dtm for set of documents |
dtm |
A document term matrix of class |
p_terms |
If not NULL (the default), a numeric vector representing the probability of each term in the corpus whose names correspond to colnames(dtm). |
Returns a numeric vector of the format p_terms. The entries of the vectors correspond to the difference in the probability of drawing a term from the set of documents given by docnames and the probability of drawing that term from the corpus overall (p_terms).
1 2 3 4 5 6 7 8 9 10 | # Load a pre-formatted dtm and topic model
data(nih_sample_topic_model)
data(nih_sample_dtm)
# documents with a topic proportion of .25 or higher for topic 2
mydocs <- rownames(nih_sample_topic_model$theta)[ nih_sample_topic_model$theta[ , 2 ] >= 0.25 ]
term_probs <- Matrix::colSums(nih_sample_dtm) / sum(Matrix::colSums(nih_sample_dtm))
GetProbableTerms(docnames = mydocs, dtm = nih_sample_dtm, p_terms = term_probs)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.