GetProbableTerms: Get cluster labels using a "more probable" method of terms

Description Usage Arguments Value Examples

View source: R/topic_modeling_utilities.R

Description

Function extracts probable terms from a set of documents. Probable here implies more probable than in a corpus overall.

Usage

1
GetProbableTerms(docnames, dtm, p_terms = NULL)

Arguments

docnames

A character vector of rownames of dtm for set of documents

dtm

A document term matrix of class matrix or dgCMatrix.

p_terms

If not NULL (the default), a numeric vector representing the probability of each term in the corpus whose names correspond to colnames(dtm).

Value

Returns a numeric vector of the format p_terms. The entries of the vectors correspond to the difference in the probability of drawing a term from the set of documents given by docnames and the probability of drawing that term from the corpus overall (p_terms).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Load a pre-formatted dtm and topic model
data(nih_sample_topic_model)
data(nih_sample_dtm) 

# documents with a topic proportion of .25 or higher for topic 2
mydocs <- rownames(nih_sample_topic_model$theta)[ nih_sample_topic_model$theta[ , 2 ] >= 0.25 ] 

term_probs <- Matrix::colSums(nih_sample_dtm) / sum(Matrix::colSums(nih_sample_dtm))

GetProbableTerms(docnames = mydocs, dtm = nih_sample_dtm, p_terms = term_probs)

textmineR documentation built on June 28, 2021, 9:08 a.m.