Description Usage Arguments Value
Tally bag-of-words ngram features
1 2 3 4 5 6 7 8 9 10 11 12 | ngramTokens(
texts,
wstem = "all",
ngrams = 1,
language = "english",
punct = TRUE,
stop.words = TRUE,
overlap = 1,
sparse = 0.99,
verbose = FALSE,
mc.cores = 1
)
|
texts |
a character vector of texts. |
wstem |
character what words should be stemmed? |
ngrams |
numeric vector of ngram sizes (max = 1:3) |
language |
character what language are you parsing? |
punct |
logical should exclamation points and question marks be included as features? |
stop.words |
logical should stop words be included? default is TRUE |
overlap |
numeric How dissimilar (in cossine distance) must an ngram be from all (n-1)grams to be added to feature set? |
sparse |
maximum feature sparsity for inclusion (1 = include all features) |
verbose |
logical - report interim steps during processing |
a matrix of feature counts
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.