spam_grams | R Documentation |
Function identifies posts which contain suspicious-looking n-gram patterns. Posts can then be removed, the pattern inspected, and the posts that were removed too. You can re-assign your current data frame to the 'clean' data frame through the third element of the list.
spam_grams(
data,
text_var,
n_gram = 8,
top_n = 1000,
min_freq = 5,
in_parallel = TRUE
)
data |
Data frame or tibble object |
text_var |
Name of the text variable |
n_gram |
Number of words in the n-gram i.e. n = 2 = bigram |
top_n |
Number of n-grams to keep |
min_freq |
Minimum number of |
in_parallel |
Whether to run the function with parallel processing |
A list with the suspicious-looking ngrams, removed posts, data & regex pattern
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.