tfidf: Tfidf re-weighting of 'dtm' and 'tdm' matrices

Description Usage Arguments Examples

View source: R/tfidf.R

Description

Tfidf re-weighting of dtm and tdm matrices

Usage

1
2
tfidf(mat, vocab, norm = c("l1", "l2", "none"), sublinear_tf = FALSE,
  extra_df_count = 1)

Arguments

mat

output of dtm() or tdm() function

vocab

output of vocab() or update_vocab()

norm

normalization to apply for each document. Either "l1", "l2" or "none"

sublinear_tf

when TRUE use 1 + log(tf) instead of the raw tf

extra_df_count

add this number to the document count; as if all terms in the vocabulary have been seen at least in this many documents.

Examples

1
2
3
4
5
6
7
8
corpus <- list(a = c("The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"),
               b = c("the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog",
                     "the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"))
v <- vocab(corpus, c(1, 2), " ")
dtm <- dtm(corpus, v)
tfidf(dtm, v)
tdm <- tdm(corpus, v)
tfidf(tdm, v)

vspinu/mlvocab documentation built on June 11, 2021, 7:37 a.m.