Description Usage Arguments Value
Given annotations, this function returns the term-frequency inverse document frequency (tf-idf) matrix from the extracted lemmas.
1 2 3 4 5 6 7 8 9 | sm_text_tfidf(
object,
min_df = 0.1,
max_df = 0.9,
max_features = 10000,
doc_var = "doc_id",
token_var = "lemma",
vocabulary = NULL
)
|
object |
a data frame containing an identifier for the document
(set with |
min_df |
the minimum proportion of documents a token should be in to be included in the vocabulary |
max_df |
the maximum proportion of documents a token should be in to be included in the vocabulary |
max_features |
the maximum number of tokens in the vocabulary |
doc_var |
character vector. The name of the column in
|
token_var |
character vector. The name of the column in
|
vocabulary |
character vector. The vocabulary set to use in
constructing the matrices. Will be computed
within the function if set to |
a tibble in wide format with term frequencies and tf-idf values.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.