spam_grams: Remove posts with suspicious n-grams

View source: R/spam_grams.R

spam_gramsR Documentation

Remove posts with suspicious n-grams

Description

Function identifies posts which contain suspicious-looking n-gram patterns. Posts can then be removed, the pattern inspected, and the posts that were removed too. You can re-assign your current data frame to the 'clean' data frame through the third element of the list.

Usage

spam_grams(
  data,
  text_var,
  n_gram = 8,
  top_n = 1000,
  min_freq = 5,
  in_parallel = TRUE
)

Arguments

data

Data frame or tibble object

text_var

Name of the text variable

n_gram

Number of words in the n-gram i.e. n = 2 = bigram

top_n

Number of n-grams to keep

min_freq

Minimum number of

in_parallel

Whether to run the function with parallel processing

Value

A list with the suspicious-looking ngrams, removed posts, data & regex pattern


jpcompartir/JPackage documentation built on March 20, 2023, 4 a.m.