Description Usage Arguments Value Examples
Measure weighted amount of information concerning the specificity of terms in a corpus. Term frequency–Inverse document frequency is one of the most frequently applied weighting schemes in information retrieval systems. The tf–idf is a statistical measure proportional to the number of times a word appears in the document, but is offset by the number of documents in the corpus that contain the word. Variations of the tf–idf are often used to estimate a document's relevance given a free-text query.
1 2 3 4 5 6 7 8 9 10 |
corpus |
Input data, with an id column and a text column. Can be of type data.frame or data.table. |
stopwords |
A character vector of stopwords. Stopwords are filtered out before calculating numerical statistics. |
id_col |
Input data column name with the ids of the documents. |
text_col |
Input data column name with the documents. |
tf_weight |
Weighting scheme of term frequency. Choices are |
idf_weight |
Weighting scheme of inverse document frequency. Choices are |
min_chars |
Words with less characters than |
norm |
Boolean value for document normalization. |
A data.table with three columns, namely class
derived from given document ids, term
and tfIdf
.
1 2 3 4 5 6 7 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.