Description Usage Arguments Value Note See Also Examples
View source: R/replace_tokens.R
Replace tokens with a single substring. This is much faster than
mgsub
if one wants to replace fixed tokens
with a single value or remove them all together. This can be useful
for quickly replacing tokens like names in string with a single
value in order to reduce noise.
1 | replace_tokens(x, tokens, replacement = NULL, ignore.case = FALSE, ...)
|
x |
A character vector. |
tokens |
A vector of token to be replaced. |
replacement |
A single character string to replace the tokens with.
The default, |
ignore.case |
logical. If |
... |
ignored. |
Returns a vector of strings with tokens replaced.
The function splits the string apart into tokens for speed optimization. After the replacement occurs the strings are pasted back together. The strings are not guaranteed to retain exact spacing of the original.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | replace_tokens(DATA$state, c('No', 'what', "it's"))
replace_tokens(DATA$state, c('No', 'what', "it's"), "<<TOKEN>>")
replace_tokens(
DATA$state,
c('No', 'what', "it's"),
"<<TOKEN>>",
ignore.case = TRUE
)
## Not run:
## Now let's see the speed
## Set up data
library(textshape)
data(hamlet)
set.seed(11)
tokens <- sample(unique(unlist(split_token(hamlet$dialogue))), 2000)
tic <- Sys.time()
head(replace_tokens(hamlet$dialogue, tokens))
(toc <- Sys.time() - tic)
tic <- Sys.time()
head(mgsub(hamlet$dialogue, tokens, ""))
(toc <- Sys.time() - tic)
## Amp it up 20x more data
tic <- Sys.time()
head(replace_tokens(rep(hamlet$dialogue, 20), tokens))
(toc <- Sys.time() - tic)
## Replace names example
library(lexicon)
library(textshape)
nms <- gsub("(^.)(.*)", "\\U\\1\\L\\2", common_names, perl = TRUE)
x <- split_portion(
sample(c(sample(grady_augmented, 5000), sample(nms, 10000, TRUE))),
n.words = 12
)
x$text.var <- paste0(
x$text.var,
sample(c('.', '!', '?'), length(x$text.var), TRUE)
)
replace_tokens(x$text.var, nms, 'NAME')
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.