View source: R/ngrams_filter.R
ngrams_filter | R Documentation |
Filter a dataset based on a specified column and group value, generate n-grams from a specified text column, then remove standard and user-defined stopwords from the n-grams.
ngrams_filter(
data,
group_column,
group_name,
text_column,
ngrams,
user_defined_stopwords = NULL
)
data |
A data frame containing the dataset to be processed. |
group_column |
A character string specifying the name of the column used to filter the data. |
group_name |
A character string specifying the value within the group column to filter the data by. |
text_column |
A character string specifying the name of the column containing text data to be tokenized into n-grams. |
ngrams |
An integer specifying the number of words in the n-grams to be generated. |
user_defined_stopwords |
A character vector of additional stopwords to be removed from the n-grams. Default is NULL. |
A data frame with the filtered data and generated n-grams, excluding the specified stopwords.
library(janeaustenr)
austen_books() %>%
ngrams_filter(group_column = "book",
group_name = "Pride & Prejudice",
text_column = "text",
ngrams = 2)
austen_books() %>%
ngrams_filter(group_column = "book",
group_name = "Pride & Prejudice",
text_column = "text",
ngrams = 2,
user_defined_stopwords = c("chapter", 1:50))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.