bind_tf_idf2 | R Documentation |
Calculates and binds the term frequency, inverse document frequency, and TF-IDF of the dataset. This function experimentally supports 4 types of term frequencies and 5 types of inverse document frequencies.
bind_tf_idf2(
tbl,
term = "token",
document = "doc_id",
n = "n",
tf = c("tf", "tf2", "tf3", "itf"),
idf = c("idf", "idf2", "idf3", "idf4", "df"),
norm = FALSE,
rmecab_compat = TRUE
)
tbl |
A tidy text dataset. |
term |
< |
document |
< |
n |
< |
tf |
Method for computing term frequency. |
idf |
Method for computing inverse document frequency. |
norm |
Logical; If passed as |
rmecab_compat |
Logical; If passed as |
Types of term frequency can be switched with tf
argument:
tf
is term frequency (not raw count of terms).
tf2
is logarithmic term frequency of which base is exp(1)
.
tf3
is binary-weighted term frequency.
itf
is inverse term frequency. Use with idf="df"
.
Types of inverse document frequencies can be switched with idf
argument:
idf
is inverse document frequency of which base is 2, with smoothed.
'smoothed' here means just adding 1 to raw values after logarithmizing.
idf2
is global frequency IDF.
idf3
is probabilistic IDF of which base is 2.
idf4
is global entropy, not IDF in actual.
df
is document frequency. Use with tf="itf"
.
A data.frame.
df <- dplyr::count(hiroba, doc_id, token)
bind_tf_idf2(df) |>
head()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.