TfIdf | R Documentation |
Creates TfIdf(Latent semantic analysis) model.
"smooth" IDF (default) is defined as follows: idf = log(1 + (# documents in the corpus) / (# documents where the term appears) )
"non-smooth" IDF is defined as follows: idf = log((# documents in the corpus) / (# documents where the term appears) )
TfIdf
R6Class
object.
Term Frequency Inverse Document Frequency
For usage details see Methods, Arguments and Examples sections.
tfidf = TfIdf$new(smooth_idf = TRUE, norm = c('l1', 'l2', 'none'), sublinear_tf = FALSE) tfidf$fit_transform(x) tfidf$transform(x)
$new(smooth_idf = TRUE, norm = c("l1", "l2", "none"), sublinear_tf = FALSE)
Creates tf-idf model
$fit_transform(x)
fit model to an input sparse matrix (preferably in "dgCMatrix" format) and then transforms it.
$transform(x)
transform new data x
using tf-idf from train data
A TfIdf
object
An input term-co-occurence matrix. Preferably in dgCMatrix
format
TRUE
smooth IDF weights by adding one to document
frequencies, as if an extra document was seen containing every term in the
collection exactly once.
c("l1", "l2", "none")
Type of normalization to apply to term vectors.
"l1"
by default, i.e., scale by the number of words in the document.
FALSE
Apply sublinear term-frequency scaling, i.e.,
replace the term frequency with 1 + log(TF)
data("movie_review")
N = 100
tokens = word_tokenizer(tolower(movie_review$review[1:N]))
dtm = create_dtm(itoken(tokens), hash_vectorizer())
model_tfidf = TfIdf$new()
dtm_tfidf = model_tfidf$fit_transform(dtm)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.