ngram: n-Gram creators

ngramR Documentation

n-Gram creators

Description

The function aims is to create the ngram tokens for each document in a corpora.

a shortcuts for ngram using n_min = n_max = 2

a shortcuts for ngram using n_min = n_max = 3

Usage

ngram(corpus, n_min = 1, n_max = 2, ..., parallel = FALSE,
  ncores = parallel::detectCores() - 1)

## S3 method for class 'list'
ngram(corpus, n_min = 1, n_max = 2, ..., parallel = FALSE,
  ncores = parallel::detectCores() - 1)

## S3 method for class 'VCorpus'
ngram(corpus, n_min = 1, n_max = 2, ...,
  parallel = FALSE, ncores = parallel::detectCores() - 1)

## S3 method for class 'character'
ngram(corpus, n_min = 1, n_max = 2, ...,
  parallel = FALSE, ncores = parallel::detectCores() - 1,
  docs_or_tokens = c("docs", "tokens"))

## Default S3 method:
ngram(corpus, n_min = 1, n_max = 2, ...,
  parallel = FALSE, ncores = parallel::detectCores() - 1)

bigram(corpus, ..., parallel = FALSE, ncores = parallel::detectCores() - 1)

trigram(corpus, ..., parallel = FALSE, ncores = parallel::detectCores() - 1)

Arguments

corpus

a compatible object storing documents (actually, list (and corpus-list of (tokened) documents, character vectors and VCorpus)

n_min

(num) minimum number of words to include in the grams

n_max

(num) maximum number of words to include into the grams

...

further option passed to the function

parallel

(lgl) if TRUE perform the computation in parallel using the parallel package functionality. Default is FALSE.

ncores

(int) number of core to use in the parallel computation (default is number of machine cores minus one)

docs_or_tokens

character vector to explain if the vector is a vector of documents (to be tokened) or is already a vector of tokens (of a single document)

Value

an object of the same class of input (except for character vector input, for which the output is a list) with documents tokenized in ngram.

(list) of character vectors containing the nGrammed documents

(list) of character vectors containing the nGrammed documents


UBESP-DCTV/costumer documentation built on Feb. 1, 2023, 4:52 a.m.