ptstem: Stem Words
In ptstem: Stemming Algorithms for the Portuguese Language

Description Usage Arguments Details Examples

Stem a character vector of words using the selected algorithm.

ptstem_words(words, algorithm = "rslp", complete = T, ...)

ptstem(texts, algorithm = "rslp", n_char = 3, complete = T,
  ignore = NULL, ...)

`words, texts`	character vector of words.
`algorithm`	string with the name of the algorithm to be used. One of `"hunspell"`, `"rslp"`, `"porter"` and `modified-hunspell`.
`complete`	wheter to complete words or not i.e. change all words with the same stem by the word that appears the most with that stem.
`...`	other arguments passed to the algorithms.
`n_char`	minimum number of characters of words to be stemmed. Not used by `ptstem_words`.
`ignore`	vector of words and regex's to igore. Words are wrapped around `stringr::fixed()` for words like 'banana' dont't get excluded when you ignore 'ana'. Also elements are considered a regex when they contain at least one punctuation symbol.

You can choose wheter to complete words or not using the complete argument. By default all algorithms are completing stems. For hunspell, it's better to always use complete = TRUE since even when using complete = FALSE it will complete words.

Complete finds the stem that appears the most in the full corpus. That's why it should not be used when you are stemming in parallel.

words <- c("balões", "aviões", "avião", "gostou", "gosto", "gostaram")
ptstem_words(words, "hunspell")
ptstem_words(words)
ptstem_words(words, algorithm = "porter", complete = FALSE)

texts <- c("coma frutas pois elas fazem bem para a saúde.",
"não coma doces, eles fazem mal para os dentes.")
ptstem(texts, "hunspell")
ptstem(texts, n_char = 5)
ptstem(texts, "porter", n_char = 4, complete = FALSE)
ptstem(words, ignore = "av.*") # words starting with "av" are not stemmed