ngramTokens: Ngram Tagger
In myeomans/DTMtools: Adjustable Feature Counting for Text Analysis

Description Usage Arguments Value

View source: R/ngramTokens.R

Tally bag-of-words ngram features

ngramTokens(
  texts,
  wstem = "all",
  ngrams = 1,
  language = "english",
  punct = TRUE,
  stop.words = TRUE,
  overlap = 1,
  sparse = 0.99,
  verbose = FALSE,
  mc.cores = 1
)

`texts`	a character vector of texts.
`wstem`	character what words should be stemmed?
`ngrams`	numeric vector of ngram sizes (max = 1:3)
`language`	character what language are you parsing?
`punct`	logical should exclamation points and question marks be included as features?
`stop.words`	logical should stop words be included? default is TRUE
`overlap`	numeric How dissimilar (in cossine distance) must an ngram be from all (n-1)grams to be added to feature set?
`sparse`	maximum feature sparsity for inclusion (1 = include all features)
`verbose`	logical - report interim steps during processing