Description Usage Arguments Details References See Also Examples

Weight a dfm by term frequency-inverse document frequency (*tf-idf*),
with full control over options. Uses fully sparse methods for efficiency.

1 2 |

`x` |
object for which idf or tf-idf will be computed (a document-feature matrix) |

`scheme_tf` |
scheme for |

`scheme_df` |
scheme for |

`base` |
the base for the logarithms in the |

`force` |
logical; if |

`...` |
additional arguments passed to |

`dfm_tfidf`

computes term frequency-inverse document frequency
weighting. The default is to use counts instead of normalized term
frequency (the relative term frequency within document), but this
can be overridden using `scheme_tf = "prop"`

.

Manning, C. D., Raghavan, P., & Schütze, H. (2008).
*Introduction to Information Retrieval*. Cambridge: Cambridge University Press.
https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | ```
dfmat1 <- as.dfm(data_dfm_lbgexample)
head(dfmat1[, 5:10])
head(dfm_tfidf(dfmat1)[, 5:10])
docfreq(dfmat1)[5:15]
head(dfm_weight(dfmat1)[, 5:10])
# replication of worked example from
# https://en.wikipedia.org/wiki/Tf-idf#Example_of_tf.E2.80.93idf
dfmat2 <-
matrix(c(1,1,2,1,0,0, 1,1,0,0,2,3),
byrow = TRUE, nrow = 2,
dimnames = list(docs = c("document1", "document2"),
features = c("this", "is", "a", "sample",
"another", "example"))) %>%
as.dfm()
dfmat2
docfreq(dfmat2)
dfm_tfidf(dfmat2, scheme_tf = "prop") %>% round(digits = 2)
## Not run:
# comparison with tm
if (requireNamespace("tm")) {
convert(dfmat2, to = "tm") %>% tm::weightTfIdf() %>% as.matrix()
# same as:
dfm_tfidf(dfmat2, base = 2, scheme_tf = "prop")
}
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.