ngram | R Documentation |
The function aims is to create the ngram tokens for each document in a corpora.
a shortcuts for ngram
using
n_min = n_max = 2
a shortcuts for ngram
using
n_min = n_max = 3
ngram(corpus, n_min = 1, n_max = 2, ..., parallel = FALSE, ncores = parallel::detectCores() - 1) ## S3 method for class 'list' ngram(corpus, n_min = 1, n_max = 2, ..., parallel = FALSE, ncores = parallel::detectCores() - 1) ## S3 method for class 'VCorpus' ngram(corpus, n_min = 1, n_max = 2, ..., parallel = FALSE, ncores = parallel::detectCores() - 1) ## S3 method for class 'character' ngram(corpus, n_min = 1, n_max = 2, ..., parallel = FALSE, ncores = parallel::detectCores() - 1, docs_or_tokens = c("docs", "tokens")) ## Default S3 method: ngram(corpus, n_min = 1, n_max = 2, ..., parallel = FALSE, ncores = parallel::detectCores() - 1) bigram(corpus, ..., parallel = FALSE, ncores = parallel::detectCores() - 1) trigram(corpus, ..., parallel = FALSE, ncores = parallel::detectCores() - 1)
corpus |
a compatible object storing documents (actually, list (and
corpus-list of (tokened) documents,
character vectors and |
n_min |
(num) minimum number of words to include in the grams |
n_max |
(num) maximum number of words to include into the grams |
... |
further option passed to the function |
parallel |
(lgl) if |
ncores |
(int) number of core to use in the parallel computation (default is number of machine cores minus one) |
docs_or_tokens |
character vector to explain if the vector is a vector of documents (to be tokened) or is already a vector of tokens (of a single document) |
an object of the same class of input (except for
character vector
input, for which the output is a list
)
with documents tokenized in ngram.
(list) of character vectors containing the nGrammed documents
(list) of character vectors containing the nGrammed documents
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.