R/getWordFrequency.R

Defines functions getWordFrequency

Documented in getWordFrequency

#' getWordFrequency
#'
#' Calculate the word frequency (as a percentage of total words)
#' for the given text (i.e. email)
#' @param text Text where the word frequency will be calculated from
#' @param words Vector of words that will be included in the result
#' @return Vector of the word frequency (i.e. same length as the 'words' param)
#' @import tm
#' @examples
#' getWordFrequency("This is my text", c("this", "your"))
#'

getWordFrequency <- function(text, words) {
  corpus <- Corpus(VectorSource(text))
  dtm <- DocumentTermMatrix(corpus, list(dictionary = words, wordLengths = c(1, Inf)))
  totalWords <- rowSums(as.matrix(dtm))
  dtm <- as.matrix(dtm) * 100 / totalWords
  wordFreqVector <- vapply(words, function(word){dtm["1",word]}, double(1))
  return(wordFreqVector)
}
megahf/spamfilter documentation built on May 29, 2019, 4:42 a.m.