ngrams: Build N-grams and keep most frequent

View source: R/text_mining.R

ngramsR Documentation

Build N-grams and keep most frequent

Description

Build out n-grams for multiple text inputs and keep the n most frequent combinations.

Usage

ngrams(text, ngram = c(2, 3), top = 10, stop_words = NULL, ...)

Arguments

text

Character vector

ngram

Integer vector. Number of continuous n items in text.

top

Integer. Keep n most frequent ngrams only.

stop_words

Character vector. Words to exclude from text. Example: if you want to exclude "a", whenever that word appears it will be excluded, but when the letter "a" appears in a word, it will remain.

...

Additional parameters passed to remove_stopwords.

Value

data.frame with ngrams and counters, sorted by frequency.

See Also

Other Text Mining: cleanText(), remove_stopwords(), replaceall(), sentimentBreakdown(), textCloud(), textFeats(), textTokenizer(), topics_rake()

Examples

# You must have "tidytext" library to use this auxiliary function:
## Not run: 
women <- read.csv("https://bit.ly/3mXJOOi")
x <- women$description
ngrams(x, ngram = c(2, 3), top = 3)
ngrams(x, ngram = 2, top = 6, stop_words = c("a", "is", "of", "the"))

## End(Not run)

laresbernardo/lares documentation built on Oct. 23, 2024, 12:05 p.m.