Description Usage Arguments See Also Examples
View source: R/unnest_ngrams.R
These functions are wrappers around unnest_tokens( token = "ngrams" )
and unnest_tokens( token = "skip_ngrams" )
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | unnest_ngrams(
tbl,
output,
input,
n = 3L,
n_min = n,
ngram_delim = " ",
format = c("text", "man", "latex", "html", "xml"),
to_lower = TRUE,
drop = TRUE,
collapse = NULL,
...
)
unnest_skip_ngrams(
tbl,
output,
input,
n_min = 1,
n = 3,
k = 1,
format = c("text", "man", "latex", "html", "xml"),
to_lower = TRUE,
drop = TRUE,
collapse = NULL,
...
)
|
tbl |
A data frame |
output |
Output column to be created as string or symbol. |
input |
Input column that gets split as string or symbol. The output/input arguments are passed by expression and support quasiquotation; you can unquote strings and symbols. |
n |
The number of words in the n-gram. This must be an integer greater than or equal to 1. |
n_min |
This must be an integer greater than or equal to 1, and less
than or equal to |
ngram_delim |
The separator between words in an n-gram. |
format |
Either "text", "man", "latex", "html", or "xml". If not text, this uses the hunspell tokenizer, and can tokenize only by "word" |
to_lower |
Whether to convert tokens to lowercase. If tokens include
URLS (such as with |
drop |
Whether original input column should get dropped. Ignored if the original input and new output column have the same name. |
collapse |
Whether to combine text with newlines first in case tokens (such as sentences or paragraphs) span multiple lines. If NULL, collapses when token method is "ngrams", "skip_ngrams", "sentences", "lines", "paragraphs", or "regex". |
... |
Extra arguments passed on to tokenizers |
k |
For the skip n-gram tokenizer, the maximum skip distance between
words. The function will compute all skip n-grams between |
unnest_tokens()
1 2 3 4 5 6 7 8 9 10 | library(dplyr)
library(janeaustenr)
d <- tibble(txt = prideprejudice)
d %>%
unnest_ngrams(word, txt, n = 2)
d %>%
unnest_skip_ngrams(word, txt, n = 3, k = 1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.