ngramTokens: Ngram Tagger

Description Usage Arguments Value

View source: R/ngramTokens.R

Description

Tally bag-of-words ngram features

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
ngramTokens(
  texts,
  wstem = "all",
  ngrams = 1,
  language = "english",
  punct = TRUE,
  stop.words = TRUE,
  overlap = 1,
  sparse = 0.99,
  verbose = FALSE,
  mc.cores = 1
)

Arguments

texts

a character vector of texts.

wstem

character what words should be stemmed?

ngrams

numeric vector of ngram sizes (max = 1:3)

language

character what language are you parsing?

punct

logical should exclamation points and question marks be included as features?

stop.words

logical should stop words be included? default is TRUE

overlap

numeric How dissimilar (in cossine distance) must an ngram be from all (n-1)grams to be added to feature set?

sparse

maximum feature sparsity for inclusion (1 = include all features)

verbose

logical - report interim steps during processing

Value

a matrix of feature counts


myeomans/DTMtools documentation built on March 2, 2020, 8:57 p.m.