Description Usage Arguments Details Value See Also Examples
Make a stemmer from a set of (term, stem) pairs.
1 2 | new_stemmer(term, stem, default = NULL, duplicates = "first",
vectorize = TRUE)
|
term |
character vector of terms to stem. |
stem |
character vector the same length as |
default |
if non- |
duplicates |
action to take for duplicates in the |
.
vectorize |
whether to produce a vectorized stemmer that accepts and returns vector arguments. |
Giving a list of terms and a corresponding list of stems, this produces a
function that maps terms to their corresponding entry. If
default = NULL
, then values absent from the term
argument
get left as-is; otherwise, they get replaced by the default
value.
The duplicates
argument indicates the action to take if
there are duplicate entries in the term
argument:
duplicates = "first"
take the first matching entry in the
stem
list.
duplicates = "last"
take the last matching entry in the
stem
list.
duplicates = "omit"
use the default
value for
duplicated terms.
duplicates = "fail"
raise an error if there are duplicated
terms.
By default, with vectorize = TRUE
, the resulting stemmer accepts a
character vector as input and returns a character vector of the same length
with entries giving the stems of the corresponding input entries.
Setting vectorize = FALSE
gives a function that accepts a single input
and returns a single output. This can be more efficient when used as part of
a text_filter
.
stem_snowball, text_filter
, text_tokens
.
1 2 3 4 5 6 7 | # map uppercase to lowercase, leave others unchanged
stemmer <- new_stemmer(LETTERS, letters)
stemmer(c("A", "E", "I", "O", "U", "1", "2", "3"))
# map uppercase to lowercase, drop others
stemmer <- new_stemmer(LETTERS, letters, default = NA)
stemmer(c("A", "E", "I", "O", "U", "1", "2", "3"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.