Description Usage Arguments Details Examples
Stem a character vector of words using the selected algorithm.
1 2 3 4 | ptstem_words(words, algorithm = "rslp", complete = T, ...)
ptstem(texts, algorithm = "rslp", n_char = 3, complete = T,
ignore = NULL, ...)
|
words, texts |
character vector of words. |
algorithm |
string with the name of the algorithm to be used. One of |
complete |
wheter to complete words or not i.e. change all words with the same stem by the word that appears the most with that stem. |
... |
other arguments passed to the algorithms. |
n_char |
minimum number of characters of words to be stemmed. Not used by |
ignore |
vector of words and regex's to igore. Words are wrapped around |
You can choose wheter to complete words or not using the complete
argument. By default all
algorithms are completing stems. For hunspell, it's better to always use complete = TRUE
since even
when using complete = FALSE it will complete words.
Complete finds the stem that appears the most in the full corpus. That's why it should not be used when you are stemming in parallel.
1 2 3 4 5 6 7 8 9 10 11 | words <- c("balões", "aviões", "avião", "gostou", "gosto", "gostaram")
ptstem_words(words, "hunspell")
ptstem_words(words)
ptstem_words(words, algorithm = "porter", complete = FALSE)
texts <- c("coma frutas pois elas fazem bem para a saúde.",
"não coma doces, eles fazem mal para os dentes.")
ptstem(texts, "hunspell")
ptstem(texts, n_char = 5)
ptstem(texts, "porter", n_char = 4, complete = FALSE)
ptstem(words, ignore = "av.*") # words starting with "av" are not stemmed
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.