limpiar_spam_grams: Remove posts containing spam-like n-grams

View source: R/limpiar_spam_grams.R

limpiar_spam_gramsR Documentation

Remove posts containing spam-like n-grams

Description

#' Function identifies posts which contain suspicious-looking n-gram patterns. Posts can then be removed, the pattern inspected, and the posts that were removed too. You can re-assign your current data frame to the 'clean' data frame through the third element of the list.

Usage

limpiar_spam_grams(data, text_var, n_gram, min_freq)

Arguments

data

Data frame or tibble object

text_var

Name of the text variable

n_gram

Number of words in the n-gram i.e. n = 2 = bigram

min_freq

Minimum number of times n-gram should be seen to be removed

Value

A list of 3 data frames 1. suspicious-looking n-grams, 2. data with them removed, 3. rows of data frame that were removed


jpcompartir/LimpiaR documentation built on April 6, 2024, 5:22 a.m.