tfidf | R Documentation |
Calculate TF-IDF using a input matrix with terms in rows and documents in columns
tfidf( tdMat, tfVariant = c("raw", "binary", "frequency", "log", "doubleNorm0.5"), idfVariant = c("raw", "smooth", "probabilistic"), idfAddOne = TRUE )
tdMat |
A term-document matrix, terms in rows, documents in columns, and counts as integers (or logical values) in cells |
tfVariant |
Variant of term frequency. See details below. |
idfVariant |
Variant of inverse document frequency. See details below. |
idfAddOne |
Logical, whether one should be added to both numerator and denominator to calculate IDF. See details below. |
tfVariant
accepts following options:
The input matrix is used as it is.
The input matrix is transformed into logical values.
Term frequency per document is calculated from the input matrix.
Transformation log(1+tfMat)
Double normalisation 0.5
idfVariant
accepts following options:
log(N/Nt)
log(1+N/Nt)
log((N-nt)/nt)
, where N
represents the total number of documents in the corpus, and nt
is the number of documents where the term t
appears. If idfAddOne
is set TRUE
, both numbers with addition of 1 to prevent division-by-zero.
The Wikipedia item on TF-IDF: https://en.wikipedia.org/wiki/Tf%E2%80%93idf.
tiExample <- matrix(c(1,1,1,1,1, 1,1,0,0,0, 1,0,0,0,0, 0,1,0,0,0, 0,0,0,1,0, 1,0,1,0,1, 0,0,0,0,1), ncol=5, byrow=TRUE) colnames(tiExample) <- sprintf("D%d", 1:ncol(tiExample)) rownames(tiExample) <- sprintf("t%d", 1:nrow(tiExample)) tiRes <- tfidf(tiExample)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.