ngrams_filter: Filter and generate N-Grams from text data

View source: R/ngrams_filter.R

ngrams_filterR Documentation

Filter and generate N-Grams from text data

Description

Filter a dataset based on a specified column and group value, generate n-grams from a specified text column, then remove standard and user-defined stopwords from the n-grams.

Usage

ngrams_filter(
  data,
  group_column,
  group_name,
  text_column,
  ngrams,
  user_defined_stopwords = NULL
)

Arguments

data

A data frame containing the dataset to be processed.

group_column

A character string specifying the name of the column used to filter the data.

group_name

A character string specifying the value within the group column to filter the data by.

text_column

A character string specifying the name of the column containing text data to be tokenized into n-grams.

ngrams

An integer specifying the number of words in the n-grams to be generated.

user_defined_stopwords

A character vector of additional stopwords to be removed from the n-grams. Default is NULL.

Value

A data frame with the filtered data and generated n-grams, excluding the specified stopwords.

Examples


library(janeaustenr)

austen_books() %>%
          ngrams_filter(group_column = "book",
                        group_name = "Pride & Prejudice",
                        text_column = "text",
                        ngrams = 2)

austen_books() %>%
          ngrams_filter(group_column = "book",
                        group_name = "Pride & Prejudice",
                        text_column = "text",
                        ngrams = 2,
                        user_defined_stopwords = c("chapter", 1:50))

le-huynh/LeRpackage documentation built on June 16, 2024, 4:46 a.m.