TermDocFreq: Get term frequencies and document frequencies from a document...

Description Usage Arguments Value Examples

View source: R/corpus_functions.R

Description

This function takes a document term matrix as input and returns a data frame with columns for term frequency, document frequency, and inverse-document frequency

Usage

1

Arguments

dtm

A document term matrix of class dgCMatrix.

Value

Returns a data.frame or tibble with 4 columns. The first column, term is a vector of token labels. The second column, term_freq is the count of times term appears in the entire corpus. The third column doc_freq is the count of the number of documents in which term appears. The fourth column, idf is the log-weighted inverse document frequency of term.

Examples

1
2
3
4
5
6
7
8
# Load a pre-formatted dtm and topic model
data(nih_sample_dtm)
data(nih_sample_topic_model) 

# Get the term frequencies 
term_freq_mat <- TermDocFreq(nih_sample_dtm)

str(term_freq_mat)

Example output

Loading required package: Matrix
'data.frame':	5120 obs. of  4 variables:
 $ term     : chr  "aaas" "abating" "abilities" "ability" ...
 $ term_freq: num  1 1 3 20 1 2 5 7 2 1 ...
 $ doc_freq : int  1 1 3 16 1 2 1 2 2 1 ...
 $ idf      : num  4.61 4.61 3.51 1.83 4.61 ...

textmineR documentation built on June 28, 2021, 9:08 a.m.