R/topwords.R

Defines functions topwords

Documented in topwords

#' Calculates topwords for estimated model topics
#'
#' @param logbeta Estimates returned from readSits()
#' @param n Number of top words to return.
#' @param w A weight used in STM's FREX scoring algorithm.  
#' @param wordcounts A vector of wordcounts for each unique word in vocabulary.
#' 
#' @return List of highest probability words per topic and FREX words per topic.
#' 
#' @details ...
#'
#' @seealso \link[stm]{labelTopics}, \link[stm]{calcfrex}
#'
#' @export
topwords <- function(logbeta, n = 10, w = 0.5, wordcounts = NULL){
    ## logbeta is K x V matrix
    
    highest_prob <- apply(logbeta, 1, function(r) names(sort(r, decreasing = TRUE)[1:n]))
    
    frex <- stm::calcfrex(logbeta = logbeta, w = w, wordcounts = wordcounts)
    frex <- apply(frex, 2, function(c, voc) voc[c[1:n]], voc = colnames(logbeta))
    
    return(list("highest_prob" = highest_prob,
                "frex" = frex))
}
erossiter/sitsr documentation built on May 23, 2019, 7:34 a.m.