ngrams_filter: Filter and generate N-Grams from text data
In le-huynh/LeRpackage: Le-Huynh Truc-Ly's R Code and Templates

View source: R/ngrams_filter.R

ngrams_filter

R Documentation

Filter and generate N-Grams from text data

Description

Filter a dataset based on a specified column and group value, generate n-grams from a specified text column, then remove standard and user-defined stopwords from the n-grams.

Usage

ngrams_filter(
  data,
  group_column,
  group_name,
  text_column,
  ngrams,
  user_defined_stopwords = NULL
)

Arguments

`data`	A data frame containing the dataset to be processed.
`group_column`	A character string specifying the name of the column used to filter the data.
`group_name`	A character string specifying the value within the group column to filter the data by.
`text_column`	A character string specifying the name of the column containing text data to be tokenized into n-grams.
`ngrams`	An integer specifying the number of words in the n-grams to be generated.
`user_defined_stopwords`	A character vector of additional stopwords to be removed from the n-grams. Default is NULL.

Value

A data frame with the filtered data and generated n-grams, excluding the specified stopwords.

Examples


library(janeaustenr)

austen_books() %>%
          ngrams_filter(group_column = "book",
                        group_name = "Pride & Prejudice",
                        text_column = "text",
                        ngrams = 2)

austen_books() %>%
          ngrams_filter(group_column = "book",
                        group_name = "Pride & Prejudice",
                        text_column = "text",
                        ngrams = 2,
                        user_defined_stopwords = c("chapter", 1:50))

le-huynh/LeRpackage documentation built on June 16, 2024, 4:46 a.m.