document_term_frequencies_statistics | R Documentation |
Term frequency Inverse Document Frequency (tfidf) is calculated as the multiplication of
Term Frequency (tf): how many times the word occurs in the document / how many words are in the document
Inverse Document Frequency (idf): log(number of documents / number of documents where the term appears)
The Okapi BM25 statistic is calculated as the multiplication of the inverse document frequency and the weighted term frequency as defined at https://en.wikipedia.org/wiki/Okapi_BM25.
document_term_frequencies_statistics(x, k = 1.2, b = 0.75)
x |
a data.table as returned by |
k |
parameter k1 of the Okapi BM25 ranking function as defined at https://en.wikipedia.org/wiki/Okapi_BM25. Defaults to 1.2. |
b |
parameter b of the Okapi BM25 ranking function as defined at https://en.wikipedia.org/wiki/Okapi_BM25. Defaults to 0.5. |
a data.table with columns doc_id, term, freq and added to that the computed statistics tf, idf, tfidf, tf_bm25 and bm25.
data(brussels_reviews_anno) x <- document_term_frequencies(brussels_reviews_anno[, c("doc_id", "token")]) x <- document_term_frequencies_statistics(x) head(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.