Description Usage Format Details Usage Methods Arguments Examples

Creates TfIdf(Latent semantic analysis) model.
"smooth" IDF (default) is defined as follows: `idf = log(1 + (# documents in the corpus) / (# documents where the term appears) )`

"non-smooth" IDF is defined as follows: `idf = log((# documents in the corpus) / (# documents where the term appears) )`

1 |

`R6Class`

object.

Term Frequency Inverse Document Frequency

For usage details see **Methods, Arguments and Examples** sections.

1 2 3 |

`$new(smooth_idf = TRUE, norm = c("l1", "l2", "none"), sublinear_tf = FALSE)`

Creates tf-idf model

`$fit_transform(x)`

fit model to an input sparse matrix (preferably in "dgCMatrix" format) and then transforms it.

`$transform(x)`

transform new data

`x`

using tf-idf from train data

- tfidf
A

`TfIdf`

object- x
An input term-co-occurence matrix. Preferably in

`dgCMatrix`

format- smooth_idf
`TRUE`

smooth IDF weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once.- norm
`c("l1", "l2", "none")`

Type of normalization to apply to term vectors.`"l1"`

by default, i.e., scale by the number of words in the document.- sublinear_tf
`FALSE`

Apply sublinear term-frequency scaling, i.e., replace the term frequency with`1 + log(TF)`

1 2 3 4 5 6 | ```
data("movie_review")
N = 100
tokens = word_tokenizer(tolower(movie_review$review[1:N]))
dtm = create_dtm(itoken(tokens), hash_vectorizer())
model_tfidf = TfIdf$new()
dtm_tfidf = model_tfidf$fit_transform(dtm)
``` |

```
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.