View source: R/tokens_replace.R
tokens_replace | R Documentation |
Substitute token types based on vectorized one-to-one matching. Since this
function is created for lemmatization or user-defined stemming. It supports
substitution of multi-word features by multi-word features, but substitution
is fastest when pattern
and replacement
are character vectors
and valuetype = "fixed"
as the function only substitute types of
tokens. Please use tokens_lookup()
with exclusive = FALSE
to replace dictionary values.
tokens_replace(
x,
pattern,
replacement,
valuetype = "glob",
case_insensitive = TRUE,
apply_if = NULL,
verbose = quanteda_options("verbose")
)
x |
tokens object whose token elements will be replaced |
pattern |
a character vector or list of character vectors. See pattern for more details. |
replacement |
a character vector or (if |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
apply_if |
logical vector of length |
verbose |
print status messages if |
tokens_lookup
toks1 <- tokens(data_corpus_inaugural, remove_punct = TRUE)
# lemmatization
taxwords <- c("tax", "taxing", "taxed", "taxed", "taxation")
lemma <- rep("TAX", length(taxwords))
toks2 <- tokens_replace(toks1, taxwords, lemma, valuetype = "fixed")
kwic(toks2, "TAX") |>
tail(10)
# stemming
type <- types(toks1)
stem <- char_wordstem(type, "porter")
toks3 <- tokens_replace(toks1, type, stem, valuetype = "fixed", case_insensitive = FALSE)
identical(toks3, tokens_wordstem(toks1, "porter"))
# multi-multi substitution
toks4 <- tokens_replace(toks1, phrase(c("Supreme Court")),
phrase(c("Supreme Court of the United States")))
kwic(toks4, phrase(c("Supreme Court of the United States")))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.