weightTfIdf | R Documentation |
Weight a term-document matrix by term frequency - inverse document frequency.
weightTfIdf(m, normalize = TRUE)
m |
A |
normalize |
A Boolean value indicating whether the term frequencies should be normalized. |
Formally this function is of class WeightingFunction
with the
additional attributes name
and acronym
.
Term frequency \mathit{tf}_{i,j}
counts the number of
occurrences n_{i,j}
of a term t_i
in a document
d_j
. In the case of normalization, the term frequency
\mathit{tf}_{i,j}
is divided by \sum_k n_{k,j}
.
Inverse document frequency for a term t_i
is defined as
\mathit{idf}_i = \log_2 \frac{|D|}{|\{d \mid t_i \in d\}|}
where
|D|
denotes the total number of documents and where |\{d
\mid t_i \in d\}|
is the number of documents where the term t_i
appears.
Term frequency - inverse document frequency is now defined as
\mathit{tf}_{i,j} \cdot \mathit{idf}_i
.
The weighted matrix.
Gerard Salton and Christopher Buckley (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24/5, 513–523.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.